[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i))
On Tue, Jan 07, 2014 at 12:42:17PM +0000, Gordan Bobic wrote: > On 2014-01-07 12:15, Jan Beulich wrote: > >>>>On 07.01.14 at 12:35, Gordan Bobic <gordan@xxxxxxxxxx> wrote: > >>On 2014-01-07 11:26, Wu, Feng wrote: > >>>>-----Original Message----- > >>>>From: xen-devel-bounces@xxxxxxxxxxxxx > >>>>[mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Gordan Bobic > >>>>Sent: Tuesday, January 07, 2014 6:44 PM > >>>>To: Andrew Cooper > >>>>Cc: xen-devel@xxxxxxxxxxxxx > >>>>Subject: Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re: > >>>>iommuu/vt-d > >>>>issues with LSI MegaSAS (PERC5i)) > >>>> > >>>>On 2014-01-07 10:38, Andrew Cooper wrote: > >>>>> On 07/01/14 10:35, Gordan Bobic wrote: > >>>>>> On 2014-01-07 03:17, Zhang, Yang Z wrote: > >>>>>>> Konrad Rzeszutek Wilk wrote on 2014-01-07: > >>>>>>>>> Which would look like this: > >>>>>>>>> > >>>>>>>>> C220 ---> Tundra Bridge -----> (HB6 PCI bridge -> Brooktree BDFs) > >>>>>>>>> on the card > >>>>>>>>> \--------------> IEEE-1394a > >>>>>>>>> > >>>>>>>>> I am actually wondering if this 07:00.0 device is the one that > >>>>>>>>> reports itself as 08:00.0 (which I think is what you alluding to > >>>>>>>>> Jan) > >>>>>>>>> > >>>>>>>> > >>>>>>>> And to double check that theory I decided to pass in the IEEE-1394a > >>>>>>>> to a guest: > >>>>>>>> > >>>>>>>> +-1c.5-[07-08]----00.0-[08]----03.0 Texas Instruments > >>>>>>>> TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] > >>>>>>>> > >>>>>>>> > >>>>>>>> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow (XEN) > >>>>>>>> [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault (XEN) > >>>>>>>> [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] > >>>>>>>> fault > >>>>>>>> addr 370f1000, iommu reg = ffff82c3ffd53000 (XEN) DMAR:[fault reason > >>>>>>>> 02h] Present bit in context entry is clear (XEN) print_vtd_entries: > >>>>>>>> iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 370f1 (XEN) > >>>>>>>> root_entry > >>>>>>>> = ffff83083d47f000 (XEN) root_entry[8] = 72569b001 (XEN) > >>>>>>>> context > >>>>>>>> = ffff83072569b000 (XEN) context[0] = 0_0 (XEN) > >>>>>>>> ctxt_entry[0] > >>>>>>>> not present > >>>>>>>> > >>>>>>>> So, capture card OK - Likely the Tundra bridge has an issue: > >>>>>>>> > >>>>>>>> 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) > >>>>>>>> (prog-if 01 [Subtractive decode]) > >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- > >>>>VGASnoop- > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ > >>>>>>>> 66MHz- > >>>>>>>> UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- > >>>><MAbort+ > >>>>>>>> >SERR- <PERR- INTx- Latency: 0 Bus: primary=07, > >>>>>>>> secondary=08, > >>>>>>>> subordinate=08, sec-latency=32 Memory behind bridge: > >>>>>>>> f0600000-f06fffff Secondary status: 66MHz+ FastB2B+ ParErr- > >>>>>>>> DEVSEL=medium TAbort- <TAbort- <MAbort+ <SERR- <PERR- > >>>>>>>> BridgeCtl: > >>>>>>>> Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > >>>>>>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- > >>>>DiscTmrSERREn- > >>>>>>>> Capabilities: [60] Subsystem: Super Micro Computer Inc > >>>>>>>> Device 0805 > >>>>>>>> Capabilities: [a0] Power Management version 3 > >>>>>>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 > >>>>>>>> NoSoftRst+ > >>>>>>>> PME-Enable- DSel=0 DScale=0 PME- > >>>>>>>> > >>>>>>>> or there is some unknown bridge in the motherboard. > >>>>>>> > >>>>>>> According your description above, the upstream Linux should also have > >>>>>>> the same problem. Did you see it with upstream Linux? I did not even think to test. I sadly won't be able to do much of reboot/shutdown as this is a production machine. > >>>>>> > >>>>>> The problem I was seeing with LSI cards (phantom device doing DMA) > >>>>>> does, indeed, also occur in upstream Linux. If I enable intel-iommu on > >>>>>> bare metal Linux, the same problem occurs as with Xen. > >>>>>> > >>>>>>> There may be some buggy device that generate DMA request with > >>>>>>> internal > >>>>>>> BDF but it didn't expose it(not like Phantom device). For those > >>>>>>> devices, I think we need to setup the VT-d page table manually. > >>>>>> > >>>>>> I think what is needed is a pci-phantom style override that tells the > >>>>>> hypervisor to tell the IOMMU to allow DMA traffic from a specific > >>>>>> invisible device ID. > >>>>>> > >>>>>> Gordan > >>>>> > >>>>> There is. See "pci-phantom" in > >>>>> http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html > >>>> > >>>>I thought this was only applicable to phantom _functions_ (number > >>>>after > >>>>the > >>>>dot) rather than whole phantom _devices_. Is that not the case? > >>> > >>>I think that's right. I go through the related code for the pci > >>>phantom device just now, I find that > >>>the information of command line 'pci-phantom' is stored in variable ' > >>>phantom_devs[8] ' > >>>with type of s truct phantom_dev{}. This variable is used in function > >>>alloc_pdev() as follow: > >>> > >>> > >>> for ( i = 0; i < nr_phantom_devs; ++i ) > >>> if ( phantom_devs[i].seg == pseg->nr && > >>> phantom_devs[i].bus == bus && > >>> phantom_devs[i].slot == PCI_SLOT(devfn) && > >>> phantom_devs[i].stride > PCI_FUNC(devfn) ) > >>> { > >>> pdev->phantom_stride = > >>>phantom_devs[i].stride; > >>> break; > >>> } > >>> > >>>So from the code, we can see this command line only works for phantom > >>>_function_, not for whole phantom _devices_. > >> > >>What would it take to make it work for a whole phantom device? > > > >First and foremost a definition of what a phantom device is and > >how one would behave. Once again - phantom functions are part > >of the PCIe specification, so those don't require a definition. > > Konrad's patch from a while back seemed to do the required thing to > allow an otherwise invisible/undetected device to do DMA transfers > without freaking out the IOMMU that doesn't know about it. Except it didn't work :-) That was the first thing I tried with this motherboard. And it looks like there are extra things I would need to modify in the hypervisor for it to work (like make the hypervisor create an fake PCI device with BARs and such). Which is actually what I was going try out - see if I can make it (hypervisor) add a PCI device for a non-existent PCI device (does not show in the PCI configuration scan). That requires knowing the MMIO BARs the 'fake' device has, and .. well, whatever else the Intel VT-d code requires. For reference, here is the code that Gordan was mentioning: #include <linux/module.h> #include <linux/string.h> #include <linux/types.h> #include <linux/init.h> #include <linux/stat.h> #include <linux/err.h> #include <linux/ctype.h> #include <linux/slab.h> #include <linux/limits.h> #include <linux/device.h> #include <linux/pci.h> #include <linux/device.h> #include <linux/pci.h> #include <xen/interface/xen.h> #include <xen/interface/physdev.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #define LSI_HACK "0.1" MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>"); MODULE_DESCRIPTION("lsi hack"); MODULE_LICENSE("GPL"); MODULE_VERSION(LSI_HACK); static int __init lsi_hack_init(void) { int r = 0; struct physdev_manage_pci manage_pci = { .bus = 0x8, .devfn = PCI_DEVFN(0,0), }; r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, &manage_pci); return r; } static void __exit lsi_hack_exit(void) { int r = 0; struct physdev_manage_pci manage_pci; manage_pci.bus = 0x8; manage_pci.devfn = PCI_DEVFN(0,0); r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, &manage_pci); if (r) printk(KERN_ERR "%s: %d\n", __FUNCTION__, r); } module_init(lsi_hack_init); module_exit(lsi_hack_exit); _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |