[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled



>>> On 04.06.13 at 14:25, Ben Guthro <ben@xxxxxxxxxx> wrote:
> On Tue, Jun 4, 2013 at 4:54 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>> On 03.06.13 at 21:22, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 03/06/13 19:29, Ben Guthro wrote:
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82c480149091>] invalidate_sync+0x258/0x291
>>>> (XEN)    [<ffff82c48014919d>] flush_iotlb_qi+0xd3/0xef
>>>> (XEN)    [<ffff82c480145a60>] iommu_flush_all+0xb5/0xde
>>>> (XEN)    [<ffff82c480145b08>] vtd_suspend+0x23/0xf1
>>>> (XEN)    [<ffff82c480141e12>] iommu_suspend+0x3c/0x3e
>>>> (XEN)    [<ffff82c48019f315>] enter_state_helper+0x1a0/0x3cb
>>>> (XEN)    [<ffff82c480105ed4>] continue_hypercall_tasklet_handler+0x51/0xbf
>>>> (XEN)    [<ffff82c480127a1e>] do_tasklet_work+0x8d/0xc7
>>>> (XEN)    [<ffff82c480127d89>] do_tasklet+0x6b/0x9b
>>>> (XEN)    [<ffff82c48015a42f>] idle_loop+0x67/0x6f
>>>
>>> This was likely broken by XSA-36
>>>
>>> My fix for the crash path is:
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=53fd1d8458de01169dfb
>>>  
>>> 56feb315f02c2b521a86
>>>
>>> You want to inspect the use of iommu_enabled and iommu_intremap.
>>
>> According to the comment in vtd_suspend(),
>> iommu_disable_x2apic_IR() is supposed to run after
>> iommu_suspend() (and indeed lapic_suspend() gets called
>> immediately after iommu_suspend() by device_power_down()),
>> and hence that shouldn't be the reason here. But, Ben, to be
>> sure, dumping the state of the various IOMMU related enabling
>> variables would be a good idea.
> 
> I assume you are referring to the variables below, defined at the top of 
> iommu.c
> At the time of the crash, they look like this:
> 
> (XEN) iommu_enabled = 1
> (XEN) force_iommu; = 0
> (XEN) iommu_verbose; = 0
> (XEN) iommu_workaround_bios_bug; = 0
> (XEN) iommu_passthrough; = 0
> (XEN) iommu_snoop = 0
> (XEN) iommu_qinval = 1
> (XEN) iommu_intremap = 1
> (XEN) iommu_hap_pt_share = 0
> (XEN) iommu_debug; = 0
> (XEN) amd_iommu_perdev_intremap = 1
> 
> If that gives any additional insight, please let me know.
> I'm not sure I gleaned anything particularly significant from it though.
> 
> Or - perhaps you are referring to other enabling variables?

These were exactly the ones (or really you picked a superset of
what I wanted to know the state of). To me this pretty clearly
means that Andrew's original thought here is not applicable, as
at this point we can't possibly have shut down qinval yet.

>> Is this perhaps having some similarity with
>> http://lists.xen.org/archives/html/xen-devel/2013-04/msg00343.html? 
>> We're clearly running single-CPU only here and there...
> 
> We certainly should be, as we have gone through the
> disable_nonboot_cpus() by this point - and I can verify that from the
> logs.

I'm much more tending towards the connection here, noting that
Andrew's original thread didn't really lead anywhere (i.e. we still
don't know what the panic he saw was actually caused by).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.