[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [BUG] XHCI_NO_64BIT_SUPPORT on ASM1042A USB controller breaks PCIE passthrough
Hi all. Many thanks for Xen. I'm attempting to perform PCI passthrough of my RocketU 1144D USB controller from an XCP-ng host (XCP-ng 8.3.0, kernel 4.19.0+1) to a Linux guest. This card uses a PLX PCIe switch IC and four ASM1042A USB controller ICs, of which I forward a single ASM1042A. The ASM1042A is detected in the guest VM and initially appears to work OK, but after I dd some gigabytes to an attached USB disk device, the controller appears to go away: [ 81.076381] xhci_hcd 0000:00:09.0: xHCI host not responding to stop endpoint command [ 81.079319] xhci_hcd 0000:00:09.0: xHCI host controller not responding, assume dead [ 81.081503] xhci_hcd 0000:00:09.0: HC died; cleaning up [ 81.083388] usb 5-1: USB disconnect, device number 2 At this point, the controller is unusable until I reset it (via /sys/bus/pci/devices/../remove and /sys/bus/pci/rescan). I am able to trigger this behavior reliably, although sometimes some 30GB must be transferred before symptoms appear. The guest is running a 6.12.50 kernel I built from vanilla sources. After much head-scratching, I discovered that some older guest kernels function correctly, and do not exhibit the bug, allowing sustained use of the controller. I then proceeded to bisect my way to the following Linux kernel patch (see https://lists-ec2.96boards.org/archives/list/linux-stable-mirror@xxxxxxxxxxxxxxxx/thread/WEVQDDJC72LMLPQY37JOZZNKMJ7OHHFL/): > I've confirmed that both the ASMedia ASM1042A and ASM3242 have the same > problem as the ASM1142 and ASM2142/ASM3142, where they lose some of the > upper bits of 64-bit DMA addresses. As with the other chips, this can > cause problems on systems where the upper bits matter, and adding the > XHCI_NO_64BIT_SUPPORT quirk completely fixes the issue. > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Forest Crossman cyrozap@xxxxxxxxx > Signed-off-by: Mathias Nyman mathias.nyman@xxxxxxxxxxxxxxx > --- > drivers/usb/host/xhci-pci.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > > diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c > index 1f989a49c8c6..5bbccc9a0179 100644 > --- a/drivers/usb/host/xhci-pci.c > +++ b/drivers/usb/host/xhci-pci.c > @@ -66,6 +66,7 @@ > #define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142 > #define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242 > #define PCI_DEVICE_ID_ASMEDIA_2142_XHCI 0x2142 > +#define PCI_DEVICE_ID_ASMEDIA_3242_XHCI 0x3242 > > > static const char hcd_name[] = "xhci_hcd"; > > > @@ -276,11 +277,14 @@ static void xhci_pci_quirks(struct device *dev, struct > xhci_hcd *xhci) > pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) > xhci->quirks |= XHCI_BROKEN_STREAMS; > if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && > - pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI) > + pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI) { > xhci->quirks |= XHCI_TRUST_TX_LENGTH; > + xhci->quirks |= XHCI_NO_64BIT_SUPPORT; > + } > if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && > (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI || > - pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI)) > + pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI || > + pdev->device == PCI_DEVICE_ID_ASMEDIA_3242_XHCI)) > xhci->quirks |= XHCI_NO_64BIT_SUPPORT; > > > if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && Reverting this patch fixes my immediate issue - the USB controller now functions as expected. However, I am way out of my depth here and strongly suspect that doing so will break things in subtle ways, and so this is where I hand off to the experts for proper analysis. In particular, I'd be interested to learn under which circumstances reverting this patch is dangerous - does 'systems where the upper bits matter' apply only to something relatively exotic? I ask in order to determine if it is safe to revert this patch in my homelab-grade setup. In case it is useful, here are further details of my set-up: * Dell R710 with BIOS 6.0.0 * 2x E5630 CPU and 64GB RAM * XCP-ng 8.3.0 on the host * Guest OS is Linux 6.12.0, built from vanilla kernel.org sources * Guest runs in PVHVM mode * PCI controller is the RocketU 1144D, which uses a PLX PEX8609 PCIe switch IC connected to four ASM1042A controllers (allowing me to forward each controller to a seperate VM) * The firmware on the ASM1042A is up-to-date AFAICT * The forwarded PCI device is connected to a JMS578-based disk array containing three mechanical disks * The problem exhibits in the guest VM after I run 'dd if=/dev/urandom of=/dev/<disk> bs=1M count=10240 conv=sync', although it sometimes needs up to three invokations * After reverting the patch, I can run the above command without problems ten times * The same hardware works OK in ESXi. I'm happy to provide further details, and please accept my apologies in advance for any breach of etiquette - I don't report this kind of bug very often.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |