[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
On 11/02/2020 15:00, Roger Pau Monné wrote: > On Mon, Feb 10, 2020 at 09:49:30PM +0100, Sander Eikelenboom wrote: >> On 03/02/2020 14:21, Roger Pau Monné wrote: >>> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote: >>>> On 03/02/2020 13:41, Roger Pau Monné wrote: >>>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote: >>>>>> On 03/02/2020 13:23, Roger Pau Monné wrote: >>>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote: >>>>>>>> Hi Roger, >>>>>>>> >>>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB >>>>>>>> controller. >>>>>>>> In the guest I get: >>>>>>>> [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to >>>>>>>> stop endpoint command. >>>>>>>> [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not >>>>>>>> responding, assume dead >>>>>>>> [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up >>>>>>>> [ 1143.356407] usb 1-2: USB disconnect, device number 2 >>>>>>>> >>>>>>>> Bisection turned up as the culprit: >>>>>>>> commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>>>>>>> x86/smp: use APIC ALLBUT destination shorthand when possible >>>>>>> >>>>>>> Sorry to hear that, let see if we can figure out what's wrong. >>>>>> >>>>>> No problem, that is why I test stuff :) >>>>>> >>>>>>>> I verified by reverting that commit and now it works fine again. >>>>>>> >>>>>>> Does the same controller work fine when used in dom0? >>>>>> >>>>>> Will test that, but as all other pci devices in dom0 work fine, >>>>>> I assume this controller would also work fine in dom0 (as it has also >>>>>> worked fine for ages with PCI-passthrough to that guest and still works >>>>>> fine when reverting the referenced commit). >>>>> >>>>> Is this the only device that fails to work when doing pci-passthrough, >>>>> or other devices also don't work with the mentioned change applied? >>>>> >>>>> Have you tested on other boxes? >>>>> >>>>>> I don't know if your change can somehow have a side effect >>>>>> on latency around the processing of pci-passthrough ? >>>>> >>>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't >>>>> see how it could slow down other interrupts. Also I would think the >>>>> domain is not receiving interrupts from the device, rather than >>>>> interrupts being slow. >>>>> >>>>> Can you also paste the output of lspci -v for that xHCI device from >>>>> dom0? >>>>> >>>>> Thanks, Roger. >>>> >>>> Will do this evening including the testing in dom0 etc. >>>> Will also see if there is any pattern when observing /proc/interrupts in >>>> the guest. >>> >>> Thanks! I also have some trivial patch that I would like you to try, >>> just to discard send_IPI_mask clearing the scratch_cpumask under >>> another function feet. >>> >>> Roger. >> >> Hi Roger, >> >> Took a while, but I was able to run some tests now. >> >> I also forgot a detail in the first report (probably still a bit tired from >> FOSDEM), >> namely: the device passedthrough works OK for a while before I get the >> kernel message. >> >> I tested the patch and it looks like it makes the issue go away, >> I tested for a day, while without the patch (or revert of the commit) the >> device >> will give problems within a few hours. > > Thanks, I have another patch for you to try, which will likely make > your system crash. Could you give it a try and paste the log output? > > Thanks, Roger. Applied the patch, rebuild, rebooted and braced for impact ... However the device bugged again, but no xen panic occured, so nothing special in the logs. I only had time to try it once, so I could retry this evening. -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |