[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] XHCI_NO_64BIT_SUPPORT on ASM1042A USB controller breaks PCIE passthrough



Hi all. Many thanks for Xen.

I'm attempting to perform PCI passthrough of my RocketU 1144D USB
controller from an XCP-ng host (XCP-ng 8.3.0, kernel 4.19.0+1) to a
Linux guest. This card uses a PLX PCIe switch IC and four ASM1042A USB
controller ICs, of which I forward a single ASM1042A.

The ASM1042A is detected in the guest VM and initially appears to work
OK, but after I dd some gigabytes to an attached USB disk device, the
controller appears to go away:

[   81.076381] xhci_hcd 0000:00:09.0: xHCI host not responding to stop
endpoint command
[   81.079319] xhci_hcd 0000:00:09.0: xHCI host controller not
responding, assume dead
[   81.081503] xhci_hcd 0000:00:09.0: HC died; cleaning up
[   81.083388] usb 5-1: USB disconnect, device number 2

At this point, the controller is unusable until I reset it (via
/sys/bus/pci/devices/../remove and /sys/bus/pci/rescan). I am able to
trigger this behavior reliably, although sometimes some 30GB must be
transferred before symptoms appear.

The guest is running a 6.12.50 kernel I built from vanilla sources.

After much head-scratching, I discovered that some older guest kernels
function correctly, and do not exhibit the bug, allowing sustained use
of the controller.

I then proceeded to bisect my way to the following Linux kernel patch
(see 
https://lists-ec2.96boards.org/archives/list/linux-stable-mirror@xxxxxxxxxxxxxxxx/thread/WEVQDDJC72LMLPQY37JOZZNKMJ7OHHFL/):

> I've confirmed that both the ASMedia ASM1042A and ASM3242 have the same
> problem as the ASM1142 and ASM2142/ASM3142, where they lose some of the
> upper bits of 64-bit DMA addresses. As with the other chips, this can
> cause problems on systems where the upper bits matter, and adding the
> XHCI_NO_64BIT_SUPPORT quirk completely fixes the issue.
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Forest Crossman cyrozap@xxxxxxxxx
> Signed-off-by: Mathias Nyman mathias.nyman@xxxxxxxxxxxxxxx
> ---
>  drivers/usb/host/xhci-pci.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
>
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 1f989a49c8c6..5bbccc9a0179 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -66,6 +66,7 @@
> #define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142
>  #define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242
>  #define PCI_DEVICE_ID_ASMEDIA_2142_XHCI 0x2142
> +#define PCI_DEVICE_ID_ASMEDIA_3242_XHCI 0x3242
>
>
> static const char hcd_name[] = "xhci_hcd";
>
>
> @@ -276,11 +277,14 @@ static void xhci_pci_quirks(struct device *dev, struct 
> xhci_hcd *xhci)
>      pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
>      xhci->quirks |= XHCI_BROKEN_STREAMS;
>     if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
> - pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
> + pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI) {
>      xhci->quirks |= XHCI_TRUST_TX_LENGTH;
> + xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
> + }
>     if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
>         (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI ||
> -      pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI))
> +      pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI ||
> +      pdev->device == PCI_DEVICE_ID_ASMEDIA_3242_XHCI))
>      xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
>
>
> if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&

Reverting this patch fixes my immediate issue - the USB controller now
functions as expected. However, I am way out of my depth here and
strongly suspect that doing so will break things in subtle ways, and
so this is where I hand off to the experts for proper analysis. In
particular, I'd be interested to learn under which circumstances
reverting this patch is dangerous - does 'systems where the upper bits
matter' apply only to something relatively exotic? I ask in order to
determine if it is safe to revert this patch in my homelab-grade
setup.

In case it is useful, here are further details of my set-up:

* Dell R710 with BIOS 6.0.0
* 2x E5630 CPU and 64GB RAM
* XCP-ng 8.3.0 on the host
* Guest OS is Linux 6.12.0, built from vanilla kernel.org sources
* Guest runs in PVHVM mode
* PCI controller is the RocketU 1144D, which uses a PLX PEX8609 PCIe
switch IC connected to four ASM1042A controllers (allowing me to
forward each controller to a seperate VM)
* The firmware on the ASM1042A is up-to-date AFAICT
* The forwarded PCI device is connected to a JMS578-based disk array
containing three mechanical disks
* The problem exhibits in the guest VM after I run 'dd if=/dev/urandom
of=/dev/<disk> bs=1M count=10240 conv=sync', although it sometimes
needs up to three invokations
* After reverting the patch, I can run the above command without
problems ten times
* The same hardware works OK in ESXi.

I'm happy to provide further details, and please accept my apologies
in advance for any breach of etiquette - I don't report this kind of
bug very often.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.