[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] amd iommu: Dump flags of IO page faults
Monday, September 24, 2012, 2:24:16 PM, you wrote: > On 09/24/2012 10:38 AM, Sander Eikelenboom wrote: >> >> Friday, September 7, 2012, 10:54:40 AM, you wrote: >> >>> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >>>> >>>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>>>> >>>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>>>> >>>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>>>> >>>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>>>> >>>>>>>>> Hi Jan, >>>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA >>>>>>>>> fault. >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Wei >>>>>>>> >>>>>>>>> signed-off-by: Wei Wang<wei.wang2@xxxxxxx> >>>>>>>> >>>>>>>> >>>>>>>> I have applied the patch and the flags seem to differ between the >>>>>>>> faults: >>>>>>>> >>>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = >>>>>>>> 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device >>>>>>>> id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>>> id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>>> id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>>>> >>>>>>> OK, so they are not interrupt requests. I guess further information from >>>>>>> your system would be helpful to debug this issue: >>>>>>> 1) xl info >>>>>>> 2) xl list >>>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>>>> >>>>>> dom14 is not a HVM guest,it's a PV guest. >>>> >>>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>>>> as io page tables. So no-sharept option does not work in this case. PV >>>>> guests always use separated io page tables. There might be some >>>>> incorrect mappings on the page tables. I will check this on my side. >>>> >>>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept >>>> everything else the same. >>>> I haven't seen any IO PAGE FAULTS after that. >>>> >>>> I did spot some differences in the output from lspci between xen 4.1 and >>>> 4.2, related to MSI enabled or not for the IOMMU device. >>>> Have attached the xl/xm dmesg and lspci from booting with both versions. >>>> >>>> lspci: >>>> >>>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O >>>> Memory Management Unit (IOMMU) [1002:5a23] >>>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit >>>> (IOMMU) [1002:5a23] >>>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>> ParErr- Stepping- SERR- FastB2B- DisINTx- >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- >>>> DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> Latency: 0 >>>> Interrupt: pin A routed to IRQ 10 >>>> Capabilities: [40] Secure device<?> >>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >> >>> Eh... That is interesting. So which dom0 are you using? There is a c/s >>> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >>> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >>> PAGE faults. You could try to revert dom0 to an old version like 2.6 >>> pv_ops to see if you really have no io page faults on 4.1 >> >> Ok i finally got the time to do some more testing, tested 4.2 around that >> changeset, and made a copy of the guest using HVM instead of PV. >> >> The results: >> - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device >> passed through works fine, both in a HVM as a PV guest, i don't see IO page >> faults getting reported. >> - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the >> video device passed through works fine, both in a HVM as a PV guest, i don't >> see IO page faults getting reported. >> - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the >> video device passed through works fine for a short while (around 5 to 10 >> minutes) in a PV guest, after that IO page faults get reported and the video >> freezes, i don't see any errors in the guest though. >> - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): >> PV: the video device >> passed through works fine for a short while (around 5 to 10 minutes), after >> that IO page faults get reported and the video freezes, i don't see any >> errors in the guest though. >> HVM: the video device >> passed through doesn't work from the start: >> - The >> device is there according to lspci >> - The >> video application start fine, but delivers a green image, so the device is >> not working properly. I don't see IO page faults though. >> >> Attached are (all with xen-unstable tip and the guest as HVM (domain 15): >> - xl dmesg >> - Patch which adds some more info, but all values reported seem to be zero >> (see xl dmesg) >> - lspci dom0 >> - lspci HVM guest > HI, > Thanks for the information, very very helpful for debugging. I hope I > could start to look at this right after sending my next iommu patch > queue upstream...another question is: Did you see this issue on a single > pv/hvm guest system or you only saw it on a system with about 16 running > VMs? I have an update on this one... The green screen when using a HVM guest was due to the driver no being able to communicate with the device via I2C. This problem disappeared when updating to the latest xen-unstable and 3.6-rc7 kernel with additionally the linux-next branch from konrad's tree pulled in. At the moment the HVM guest works: it shows video, it doesn't give IO PAGE FAULT's. Will try and see if it's also miraculously fixed for PV as well. > Thanks, > Wei >> >> >> >>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>>> Address: 00000000fee0100c Data: 4128 >>>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >>>> >>>> Although it seems enabled, shouldn't the IRQ number used be much higher >>>> than 10 for MSI interrupts ? >> >>> The IRQ number is fine. MSI vector is stored at Data: 4128 >> >>>> >>>> There is another difference in the bridge device that's in front of the >>>> 0a:00.6 device that faults before the kernel is even booted. >>>> >>>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge >>>> (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>> ParErr- Stepping- SERR+ FastB2B- DisINTx+ >>>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- >>>> DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- >>>> DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >>>> Latency: 0, Cache Line Size: 64 bytes >>>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >>>> I/O behind bridge: 0000f000-00000fff >>>> Memory behind bridge: f9f00000-f9ffffff >>>> Prefetchable memory behind bridge: >>>> 00000000fff00000-00000000000fffff >>>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- >>>> DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >>>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- >>>> DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >>>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >>>> Capabilities: [50] Power Management version 3 >>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >>>> PME(D0+,D1-,D2-,D3hot+,D3cold+) >>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency >>>> L0s<64ns, L1<1us >>>> ExtTag+ RBE+ FLReset- >>>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- >>>> Unsupported- >>>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >>>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- >>>> TransPend- >>>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, >>>> Latency L0<1us, L1<8us >>>> ClockPM- Surprise- LLActRep+ BwNot+ >>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- >>>> CommClk- >>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ >>>> DLActive+ BWMgmt+ ABWMgmt- >>>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- >>>> Surprise- >>>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >>>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- >>>> HPIrq- LinkChg- >>>> Control: AttnInd Unknown, PwrInd Unknown, Power- >>>> Interlock- >>>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- >>>> PresDet+ Interlock- >>>> Changed: MRL- PresDet+ LinkState+ >> >>> The probably because of the IO_PAGE_FAULT. >> >>> Thanks, >>> Wei >> >>>> serveerstertje:~# lspci -t >>>> -[0000:00]-+-00.0 >>>> +-00.2 >>>> +-02.0-[0b]----00.0 >>>> +-03.0-[0a]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-05.0-[09]----00.0 >>>> +-06.0-[08]----00.0 >>>> +-0a.0-[07]----00.0 >>>> +-0b.0-[06]--+-00.0 >>>> | \-00.1 >>>> +-0c.0-[05]----00.0 >>>> +-0d.0-[04]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-11.0 >>>> +-12.0 >>>> +-12.2 >>>> +-13.0 >>>> +-13.2 >>>> +-14.0 >>>> +-14.3 >>>> +-14.4-[03]----06.0 >>>> +-14.5 >>>> +-15.0-[02]-- >>>> +-16.0 >>>> +-16.2 >>>> +-18.0 >>>> +-18.1 >>>> +-18.2 >>>> +-18.3 >>>> \-18.4 >>>> >>>> >>>> >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>>> I will try to make a complete package, and try with one pv domain only >>>>>> where the devices are being passed through just to simplify the setup. >>>>>> >>>>>> >>>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>>>> happened. Did it stop working? >>>>>> >>>>>> Yes it stops working, the video capture just freezes, but the driver >>>>>> doesn't bail out. >>>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open >>>>>> in the guest. >>>>>> >>>>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>>>> my RD890 system >>>>>> >>>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>>>> >>>>>>> * so, what OEM board you have?) >>>>>> >>>>>> MSI 890FXA-GD70 >>>>>> >>>>>>> Also from your log, these lines looks very strange: >>>>>> >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8300 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8340 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8380 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f83c0 >>>>>> >>>>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>>>> they from? Your video card driver maybe? >>>>>> >>>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also >>>>>> occur without this domain being started. >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>> >>>>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>>>> >>>>>>>> Thx >>>>>>>> >>>>>>>> -- >>>>>>>> Sander >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |