[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign
On 10.02.22 11:22, Jan Beulich wrote: > On 10.02.2022 09:21, Oleksandr Andrushchenko wrote: >> >> On 08.02.22 13:25, Oleksandr Andrushchenko wrote: >>> On 08.02.22 13:00, Jan Beulich wrote: >>>> On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: >>>>> This smells like we first need to fix the existing code, so >>>>> pdev->domain is not assigned by specific IOMMU implementations, >>>>> but instead controlled by the code which relies on that, assign_device. >>>> Feel free to come up with proposals how to cleanly do so. Moving the >>>> assignment to pdev->domain may even be possible now, but if you go >>>> back you may find that the code was quite different earlier on. >>> I do understand that as the code evolves new use cases bring >>> new issues. >>>>> I can have something like: >>>>> >>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>>>> index 88836aab6baf..cc7790709a50 100644 >>>>> --- a/xen/drivers/passthrough/pci.c >>>>> +++ b/xen/drivers/passthrough/pci.c >>>>> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 >>>>> devfn) >>>>> static int assign_device(struct domain *d, u16 seg, u8 bus, u8 >>>>> devfn, u32 flag) >>>>> { >>>>> const struct domain_iommu *hd = dom_iommu(d); >>>>> + struct domain *old_owner; >>>>> struct pci_dev *pdev; >>>>> int rc = 0; >>>>> >>>>> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, >>>>> u8 bus, u8 devfn, u32 flag) >>>>> ASSERT(pdev && (pdev->domain == hardware_domain || >>>>> pdev->domain == dom_io)); >>>>> >>>>> + /* We need to restore the old owner in case of an error. */ >>>>> + old_owner = pdev->domain; >>>>> + >>>>> vpci_deassign_device(pdev->domain, pdev); >>>>> >>>>> rc = pdev_msix_assign(d, pdev); >>>>> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 >>>>> seg, u8 bus, u8 devfn, u32 flag) >>>>> >>>>> done: >>>>> if ( rc ) >>>>> + { >>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>> d, &PCI_SBDF3(seg, bus, devfn), rc); >>>>> + /* We failed to assign, so restore the previous owner. */ >>>>> + pdev->domain = old_owner; >>>>> + } >>>>> /* The device is assigned to dom_io so mark it as quarantined */ >>>>> else if ( d == dom_io ) >>>>> pdev->quarantine = true; >>>>> >>>>> But I do not think this belongs to this patch >>>> Indeed. Plus I'm sure you understand that it's not that simple. Assigning >>>> to pdev->domain is only the last step of assignment. Restoring the original >>>> owner would entail putting in place the original IOMMU table entries as >>>> well, which in turn can fail. Hence why you'll find a number of uses of >>>> domain_crash() in places where rolling back is far from easy. >>> So, why don't we just rely on the toolstack to do the roll back then? >>> This way we won't add new domain_crash() calls. >>> I do understand though that we may live Xen in a wrong state though. >>> So, do you think it is possible if we just call deassign_device from >>> assign_device on the error path? This is just like I do in >>> vpci_assign_device: >>> I call vpci_deassign_device if the former fails. >> With the following addition: >> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> index c4ae22aeefcd..d6c00449193c 100644 >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -1511,6 +1511,12 @@ static int assign_device(struct domain *d, u16 seg, >> u8 bus, u8 devfn, u32 flag) >> } >> >> rc = vpci_assign_device(pdev); >> + if ( rc ) >> + /* >> + * Ignore the return code as we want to preserve the one from the >> + * failed assign operation. >> + */ >> + deassign_device(d, seg, bus, devfn); This needs devfn to be preserved as it can be modified by the loop above: for ( ; pdev->phantom_stride; rc = 0 ) { devfn += pdev->phantom_stride; >> >> done: >> if ( rc ) >> >> I see the following logs (PV Dom0): >> >> (XEN) assign_device seg 0 bus 3 devfn 0 >> (XEN) [VT-D]d[IO]:PCIe: unmap 0000:03:00.0 >> (XEN) [VT-D]d4:PCIe: map 0000:03:00.0 >> (XEN) assign_device vpci_assign rc -22 from d[IO] to d4 >> (XEN) deassign_device current d4 to d[IO] >> (XEN) [VT-D]d4:PCIe: unmap 0000:03:00.0 >> (XEN) [VT-D]d[IO]:PCIe: map 0000:03:00.0 >> (XEN) deassign_device ret 0 >> (XEN) d4: assign (0000:03:00.0) failed (-22) >> libxl: error: libxl_pci.c:1498:pci_add_dm_done: Domain 4:xc_assign_device >> failed: Invalid argument >> libxl: error: libxl_pci.c:1781:device_pci_add_done: Domain >> 4:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3) >> libxl: error: libxl_create.c:1895:domcreate_attach_devices: Domain 4:unable >> to add pci devices >> libxl: error: libxl_domain.c:1183:libxl__destroy_domid: Domain >> 4:Non-existant domain >> libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 4:Unable >> to destroy guest >> libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 4:Destruction of >> domain failed >> >> So, it seems to properly solve the issue with pdev->domain left >> set to the domain we couldn't create. >> >> @Jan, will this address your concern? > Partly: For one I'd have to think through what further implications there > are from going this route. And then completely ignoring the return value > is unlikely to be correct: You certainly want to retain the original > error code for returning to the caller, but you can't leave the error > unhandled. That's likely another case where the "best" choice is to crash > the guest. Ok, then I'll crash the domain... > > Jan > > Thank you, Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |