[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0



On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 16.01.17 at 10:25, <firemeteor@users.sourceforge.net> wrote:
> Here are some relevant logs, please help comment what's going on here and
> what's the next step of diagnose.
> It appears that the fault address 0xcfxxxxxx falls within the host RMRR
> region.

Might be a problem in the RMRR setup itself, when the guest gets
the device assigned. But I'm not sure, as you've provided only
fragments of the log, instead of the full one (allowing to see in
which order the messages got logged). In any event the addresses
are, as you say, properly within the device's RMRR range.
Thanks for your quick reply, Jan.
I meant to provide full log through third party service like pastebin but my network at work just get it blocked.
Note that the log here is before the fault issue shows up.
As I already mentioned, there are two domUs in the log and the suffering one is dom2.

The fault log itself is really flooding. With a small 4MB ring buffer, I wasn't able to capture how it begins.
From what I can tell, some one is scanning through the region in a fixed pace. (in general, with some ping-pong occasionally)
The content from print_vtd_entries if fairly stable. This is what I get from 'sort|uniq -c' post-processing, after removing line with fault address:
   7219 (XEN)     context[10] = 1_2215f6001
   7219 (XEN)     context = ffff830251bcb000
   5259 (XEN)     l2[7d] = 0
   5259 (XEN)     l2[7d] not present
   1961 (XEN)     l2[7e] = 0
   1961 (XEN)     l2[7e] not present
   7219 (XEN)     l2 = ffff830221476000
   5258 (XEN)     l2_index = 7d
   1961 (XEN)     l2_index = 7e
   7219 (XEN)     l3[3] = 221476003
   7219 (XEN)     l3 = ffff8302215f6000
   7219 (XEN)     l3_index = 3
   7219 (XEN)     root_entry[0] = 251bcb001
   7219 (XEN)     root_entry = ffff8304152e9000
   7219 (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set

The fault address pattern could be found here: http://pastebin.com/rWWH3QUG
(Note that I dropped redundant columns to fit the size limitation...)

And here is a list of my host PCI devices:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

That RMRR setup has changed dramatically (from being basically
non-existent in the older versions), especially for USB devices (I
don't think I can conclude what type of device 0000:02:00.0 is).
There are messages logged with various failures in that process,
but some would be issued by debug hypervisors only. A good
first step (before possibly doing actual code instrumentation)
would therefore be to retry with a debug hypervisor, and post
the full log (huge amounts of trailing IOMMU fault messages may
of course be stripped as long as they're sufficiently similar, to
keep the overall log size manageable).
I can give it a try when I get some spare time.
Could you show me the flow to build a debug hypervisor and the most relevant debug knobs to avoid log flooding?
 

> However, the hvmloader is setting up memory region starting from address
> 0xe0000000.
> Is the hvmloader memory map relevant here?

No, it shouldn't be.

> Unfortunately the iommu.c does not provide detailed log on the mapping
> except a simple 'd2:PCI: map 0000:00:02.0'

If we made it so, it would become unreasonably verbose.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.