[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i))



On Wed, 11 Sep 2013 13:31:06 +0100, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
On 11.09.13 at 14:14, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
On Wed, 11 Sep 2013 12:53:09 +0100, "Jan Beulich" <JBeulich@xxxxxxxx>
 wrote:
On 11.09.13 at 13:05, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
I found this:

http://lists.xen.org/archives/html/xen-devel/2010-06/msg00093.html

 while looking for a solution to a similar problem. I am
 facing a similar issue with LSI (8408E, 3081E-R) and
 Adaptec (31605) SAS cards. Was there ever a proper, more general
 fix or workaround for this issue?

 These SAS cards experience these problems in dom0. When running
 a vanilla kernel on bare metal, they work OK without intel_iommu
 set. As soon as I set intel_iommu, the same thing happens (on
 bare metal, not dom0).

 Clearly there is something badly broken with multiple layers
 of bridges when it comes to IOMMU in my setup (Intel 5520 PCIe
 root hub -> NF200 bridge -> Intel 80333 Bridge -> SAS controller)

The link above has some (hackish) workarounds - did you try
them?

 Not yet. The thing that bothers me is that he workaround
 involves hard-coding the PCI device ID which is _nasty_
 and unstable.

I said "hackish", didn't I? Of course such a change would not
have the slightest chance of going into any repo. But knowing
whether it helps may allow thinking of an acceptable
workaround.

Indeed.

In any event, seeing a hypervisor log with "iommu=debug" might
shed further light on this: For one, we might be able to see which
exact devices are present in the ACPI tables. And we would see
which device(s) eventual faults originate from.

 The thing that bothers me is that this happens in dom0 even
 with iommu=dom0-passthrough being set.
 iommu=dom0-passthrough,workaround_bios_bug doesn't help,
 either

They're not meant to deal with this sort of an impossible (in
theory) situation.

It turns out that theoretically impossible things happen a lot
more often than expected. :)

 And lo and behold, I do have phantom PCI devices after all!
 lspci shows no device with ID 0000:0f:01.0

Not exactly: Phantom functions can't be at function 0. Irrespective
of that - do the device coordinates somehow correlate with the
problematic controller (IOW: lspci output and a full log would help)?

Not necessarily a function of the same device - as I said, there is
a PCIe bridge on the card, so there could be multiple devices hiding
there.

dmesg, xl dmesg, lspci -vvvnn and lspci -tvnn output is attached.

I'll try adding one of my LSI cards and see the comparative
behaviour. Right now I don't even know if the phantom device
is on the SAS card or the motherboard.

Gordan

Attachment: dmesg.log
Description: Text Data

Attachment: xl-dmesg.log
Description: Text Data

Attachment: lspci.log
Description: Text document

Attachment: lspci-t.log
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.