[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x



On 26/03/2013 16:45, Marek Marczykowski wrote:
> On 26.03.2013 17:03, Jan Beulich wrote:
>>>>> On 26.03.13 at 14:50, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>> wrote:
>>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>>> On 26.03.13 at 13:17, Marek Marczykowski 
>>>>>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> 
>>> wrote:
>>>>> Finally got serial console :)
>>>>> The debug=y problem is (actually at resume):
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at 
>>>>> io_apic.c:542
>>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    0
>>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 
>>>>> ffff82c48029ff18
>>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: 
>>>>> ffff830421060538
>>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  
>>>>> ffff88041820eb60
>>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 
>>>>> 0000000000000000
>>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: 
>>>>> ffff82c48029ff18
>>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 
>>>>> 00000000000026f0
>>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 
>>>>> ffff82c4802dd9e0
>>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 
>>>>> 000000013fff3728
>>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 
>>>>> ffff82c48014de60
>>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 
>>>>> 00000000ffff3729
>>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 
>>>>> 0000000000007ff0
>>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 
>>>>> ffffffff81b90a88
>>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 
>>>>> 0000002000000000
>>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 
>>>>> ffffffff81a01db8
>>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 
>>>>> 0000000000000000
>>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 
>>>>> 0000000000000000
>>>>> (XEN)    0000000000000000
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN)
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 0:
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at 
>>>>> io_apic.c:542
>>>>> (XEN) ****************************************
>>>> To make sense of this, we need to know the register (and maybe
>>>> stack) allocation at this point, to know which vector it was that
>>>> triggered the assertion. You can either do this analysis for us, or
>>>> point us at the xen-syms binary matching the xen.gz you used.
>>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>> And that system isn't using a strange mixed mode IO-APIC/legacy
>> PIC model, where particularly IRQ 9 (usually ACPI SCI) gets
>> channeled through the legacy PIC?
> I don't know...
>
>> Could you attach the complete log, ideally with 'i' output logged
>> right before suspending?
> Sure, attached.
>
>> Is this reproducible with 4.2.x or 4.3-unstable? If not, but if readily
>> reproducible with 4.1.5-rc1, could you try changing the containing
>> loop's upper bound from "< NR_VECTORS" to
>> "<= LAST_DYNAMIC_VECTOR"?
> I've tried 4.2.x some time ago and bug also exists there (but I had not
> console, so not sure if exactly the same). 4.3 seems to be not affected.
>

Can you replace the ASSERT() with code similar to that in

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668

Which should call dump_irqs() in before dying because of the ASSERT. 
You might need to also take the latest version of dump_irqs() from
unstable, as I seem to remember there was another assertion failure due
to xfree()'ing in IRQ context.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.