[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x



On 26/03/2013 15:47, Andrew Cooper wrote:
> On 26/03/2013 13:50, Marek Marczykowski wrote:
>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>> On 26.03.13 at 13:17, Marek Marczykowski 
>>>>>> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>> Finally got serial console :)
>>>> The debug=y problem is (actually at resume):
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at 
>>>> io_apic.c:542
>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 
>>>> ffff82c4802dd9e0
>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 
>>>> 000000013fff3728
>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 
>>>> ffff82c48014de60
>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 
>>>> 00000000ffff3729
>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 
>>>> 0000000000007ff0
>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 
>>>> ffffffff81b90a88
>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 
>>>> 0000002000000000
>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 
>>>> ffffffff81a01db8
>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 
>>>> 0000000000000000
>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 
>>>> 0000000000000000
>>>> (XEN)    0000000000000000
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at 
>>>> io_apic.c:542
>>>> (XEN) ****************************************
>>> To make sense of this, we need to know the register (and maybe
>>> stack) allocation at this point, to know which vector it was that
>>> triggered the assertion. You can either do this analysis for us, or
>>> point us at the xen-syms binary matching the xen.gz you used.
>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>>
>>> From the register values, the most likely candidates are vector 0xe9
>>> and 0x2a. The former having two registers set to this value seems
>>> more likely from than angle, but vectors in the 0xe? range should
>>> never end up in smp_irq_move_cleanup_interrupt().
>>>
>>> And if it's the 0x2a one, then we'd need to know what IRQ it was
>>> last used for. That can't be reconstructed from the data above, so
>>> would require you being able to reproduce this and adding some
>>> instrumentation to the code.
>>>
>>> Jan
>>>
> Could it be something to do with switching virtual wire mode, and having
> PIC compatibility stuff left in the IO-APIC after leaving the BIOS but
> before starting back up again?
>
> Looking at the stack dump, there is an extra exception frame under what
> is printed by the assertion failure.
>
> 0000002000000000 TRAP_syscall

Apologies - this is a vector 0x20 interrupt, not TRAP_syscall, which
makes sense as 0x20 is FIRST_DYNAMIC_IRQ which is also the cleanup IPI
vector.

The other comments still stand, espcially as we appear to be
interrupting dom0 which is already running.

~Andrew

> ffffffff81a01db8 guest kernel addr
> 0000000000000246 FLAGS
> 000000000000e033 FLAT_RING3_CS64
> ffffffff8105dd5a guest kernel addr
> 000000000000e02b FLAT_RING3_SS{64,32}
>
> So it appears that we are already executing a guest (presumably dom0) by the 
> time this assertion occurs.  From the serial, is there any indication that 
> dom0 has started up again?
>
> I would have thought that we should have successfully reset the IO-APIC back 
> up properly before we would ever get back around to executing dom0.
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.