[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled
On Jun 6, 2013, at 11:13 AM, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote: > > >> -----Original Message----- >> From: Ben Guthro [mailto:ben.guthro@xxxxxxxxx] >> Sent: Thursday, June 06, 2013 11:08 PM >> To: Zhang, Xiantao >> Cc: Jan Beulich; Ben Guthro; Andrew Cooper; xen-devel >> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled >> >> On Jun 6, 2013, at 11:06 AM, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> >> wrote: >> >>> >>> >>>> -----Original Message----- >>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >>>> Sent: Thursday, June 06, 2013 2:59 PM >>>> To: Ben Guthro >>>> Cc: Andrew Cooper; Zhang, Xiantao; xen-devel >>>> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled >>>> >>>>>>> On 06.06.13 at 01:53, Ben Guthro <ben@xxxxxxxxxx> wrote: >>>>> On Wed, Jun 5, 2013 at 4:27 PM, Ben Guthro <ben@xxxxxxxxxx> wrote: >>>>>> On Wed, Jun 5, 2013 at 11:38 AM, Jan Beulich <JBeulich@xxxxxxxx> >> wrote: >>>>>>>>>> On 05.06.13 at 17:25, Ben Guthro <ben@xxxxxxxxxx> wrote: >>>>>>>> On Wed, Jun 5, 2013 at 11:14 AM, Jan Beulich <JBeulich@xxxxxxxx> >>>> wrote: >>>>>>>>> Depending on whether ATS is in use, more than one invalidation >>>>>>>>> can be done in the processing here - could you therefore check >>>>>>>>> whether there's any sign of ATS use ("iommu=verbose" should >>>>>>>>> make you see respective messages), and if so see whether >>>>>>>>> disabling it ("ats=off") makes a difference? >>>>>>>> >>>>>>>> ATS does not appear to be running: >>>>>>>> >>>>>>>> (XEN) [VT-D]dmar.c:737: Host address width 36 >>>>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD: >>>>>>>> (XEN) [VT-D]dmar.c:412: dmaru->address = fed90000 >>>>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed90000 iommu->reg = >>>> ffff82c3ffd57000 >>>>>>>> (XEN) [VT-D]iommu.c:1199: cap = c0000020e60262 ecap = f0101a >>>>>>>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:02.0 >>>>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD: >>>>>>>> (XEN) [VT-D]dmar.c:412: dmaru->address = fed91000 >>>>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed91000 iommu->reg = >>>> ffff82c3ffd56000 >>>>>>>> (XEN) [VT-D]iommu.c:1199: cap = c9008020660262 ecap = f0105a >>>>>>>> (XEN) [VT-D]dmar.c:354: IOAPIC: 0000:f0:1f.0 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.0 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.1 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.2 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.3 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.4 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.5 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.6 >>>>>>>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.7 >>>>>>>> (XEN) [VT-D]dmar.c:426: flags: INCLUDE_ALL >>>>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR: >>>>>>>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:1d.0 >>>>>>>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:1a.0 >>>>>>>> (XEN) [VT-D]dmar.c:625: RMRR region: base_addr ba8d5000 >>>> end_address >>>>>>>> ba8ebfff >>>>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR: >>>>>>>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:02.0 >>>>>>>> (XEN) [VT-D]dmar.c:625: RMRR region: base_addr bb800000 >>>> end_address >>>>>>>> bf9fffff >>>>>>>> >>>>>>>> I would expect a line with "found ACPI_DMAR_ATSR" to be printed, if it >>>>>>>> was found. >>>>>>> >>>>>>> Right. So one less variable. >>>>>> >>>>>> Some more info. >>>>>> Ross Philipson provided me with a handy utility to dump a bunch more >>>>>> info about the DMAR tables, and with some more trace, this appears to >>>>>> be tied to the IGD. >>>>>> >>>>>> Early in the boot process, I see queue_invalidate_wait() called for >>>>>> DRHD unit 0, and 1 >>>>>> (unit 0 is wired up to the IGD, unit 1 is everything else) >>>>>> >>>>>> Up until i915 does the following, I see that unit being flushed with >>>>>> queue_invalidate_wait() : >>>>>> >>>>>> [ 0.704537] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' >>>>>> [ 0.704537] ENERGY_PERF_BIAS: View and update with x86_energy_p >>>>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0 >>>>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0 >>>>>> [ 1.983028] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to >>>>>> bit banging on pin 5 >>>>>> [ 2.253551] fbcon: inteldrmfb (fb0) is primary device >>>>>> [ 3.111838] Console: switching to colour frame buffer device 170x48 >>>>>> [ 3.171631] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device >>>>>> [ 3.171634] i915 0000:00:02.0: registered panic notifier >>>>>> [ 3.173339] acpi device:00: registered as cooling_device1 >>>>>> [ 3.173401] ACPI: Video Device [VID] (multi-head: yes rom: no post: >>>>>> no) >>>>>> [ 3.173962] input: Video Bus as >> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4 >>>>>> [ 3.174232] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on >>>>> minor 0 >>>>>> [ 3.174258] ahci 0000:00:1f.2: version 3.0 >>>>>> [ 3.174270] xen: registering gsi 19 triggering 0 polarity 1 >>>>>> [ 3.174274] Already setup the GSI :19 >>>>>> >>>>>> >>>>>> After that - the unit never seems to be flushed. >>>>>> >>>>>> ...until we enter into the S3 hypercall, which loops over all DRHD >>>>>> units, and explicitly flushes all of them via iommu_flush_all() >>>>>> >>>>>> It is at that point that it hangs up when talking to the device that >>>>>> the IGD is plumbed up to. >>>>>> >>>>>> >>>>>> Does this point to something in the i915 driver doing something that >>>>>> is incompatible with Xen? >>>>> >>>>> I actually separated it from the S3 hypercall, adding a new debug key >>>>> 'F' - to just call iommu_flush_all() >>>>> I can crash it on demand with this. >>>>> >>>>> Booting with "i915.modeset=0 single" (to prevent both KMS, and Xorg) - >>>>> it does not occur. >>>>> So, that pretty much narrows it down to the IGD, in my mind. >>>> >>>> Indeed, I agree. Yet I can't in any way comment on what or why. >>>> Xiantao (perhaps some graphics person would good to be Cc-ed >>>> here too)? >>> Hi, Jan/Ben >>> Thanks for your analysis! Could you try to enable "snb_igd_quirk" to have >>> a >> try ? thanks! >>> Xiantao >> >> >> Thanks for your reply. I tried this param yesterday, but it did not >> change the behavior. > Okay, I recalled one bug in IGD i915 driver is found recently, and it may > bring some errors to VT-d, and should be fixed in latest kernel. Could you > try latest kernel 3.9.4 or 3.10-rcx ? > Xiantao It may have been dropped off of the top of this thread, but i sent out what i have tested with, and this was one of them. Testing 3.10 did not change this behavior. Did you have a particular changeset in mind? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |