|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: vpci: Need for vpci_cancel_pending
On 28.10.21 13:17, Roger Pau Monné wrote:
> On Thu, Oct 28, 2021 at 10:04:20AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, all!
>>
>> While working on PCI passthrough on Arm I stepped onto a crash
>> with the following call chain:
>>
>> pci_physdev_op
>> pci_add_device
>> init_bars -> modify_bars -> defer_map ->
>> raise_softirq(SCHEDULE_SOFTIRQ)
>> iommu_add_device <- FAILS
>> vpci_remove_device -> xfree(pdev->vpci)
>>
>> Then:
>> leave_hypervisor_to_guest
>> vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
>>
>> Which results in the crash below:
>>
>> (XEN) Data Abort Trap. Syndrome=0x6
>> (XEN) Walking Hypervisor VA 0x10 on CPU0 via TTBR 0x00000000481dd000
>> (XEN) 0TH[0x0] = 0x00000000481dcf7f
>> (XEN) 1ST[0x0] = 0x00000000481d9f7f
>> (XEN) 2ND[0x0] = 0x0000000000000000
>> (XEN) CPU0: Unexpected Trap: Data Abort
>> ...
>> (XEN) Xen call trace:
>> (XEN) [<00000000002246d8>] _spin_lock+0x40/0xa4 (PC)
>> (XEN) [<00000000002246c0>] _spin_lock+0x28/0xa4 (LR)
>> (XEN) [<000000000024f6d0>] vpci_process_pending+0x78/0x128
>> (XEN) [<000000000027f7e8>] leave_hypervisor_to_guest+0x50/0xcc
>> (XEN) [<0000000000269c5c>] entry.o#guest_sync_slowpath+0xa8/0xd4
>>
>> So, it seems that if pci_add_device fails and calls vpci_remove_device
>> the later needs to cancel any pending work.
> Indeed, you will need to check that v->vpci.pdev == pdev before
> canceling the pending work though, or else you could be canceling
> pending work from a different device.
How about:
void vpci_cancel_pending(struct pci_dev *pdev)
{
struct vcpu *v = current;
if ( v->vpci.mem && v->vpci.pdev == pdev)
{
rangeset_destroy(v->vpci.mem);
v->vpci.mem = NULL;
}
}
This will effectively prevent the pending work from running
>
>> If this is a map operation it seems to be straightforward: destroy
>> the range set and do not map anything.
>>
>> If vpci_remove_device is called and unmap operation was scheduled
>> then it can be that:
>> - guest is being destroyed for any reason and skipping unmap is ok
>> as all the mappings for the whole domain will be destroyed anyways
>> - guest is still going to stay alive and then unmapping must be done
>>
>> I would like to hear your thought what would be the right approach
>> to take in order to solve the issue.
> For the hardware domain it's likely better to do nothing, and just try
> to continue execution. The worse that could happen is that MMIO mappings
> are left in place when the device has been deassigned.
>
> For unprivileged domains that get a failure in the middle of a vPCI
> {un}map operation we need to destroy them, as we don't know in which
> state the p2m is. This can only happen in vpci_process_pending for
> domUs I think, as they won't be allowed to call pci_add_device. Please
> see the FIXME in vpci_process_pending related to this topic.
Agree
>
> Regards, Roger.
>
Thank you,
Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |