[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Patch] x86/hpet: Disable interrupts while running hpet interrupt handler.



On 06/08/13 09:01, Jan Beulich wrote:
>>>> On 05.08.13 at 22:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> Automated testing on Xen-4.3 testing tip found an interesting issue
>>
>> (XEN) *** DOUBLE FAULT ***
>> (XEN) ----[ Xen-4.3.0  x86_64  debug=y  Not tainted ]----
> The call trace is suspicious in ways beyond what Keir already
> pointed out - with debug=y, there shouldn't be bogus entries listed,
> yet ...

show_stack_overflow() doesn't have a debug case which follows frame
pointers.  I shall submit a patch for this presently, and put it into
XenServer in the hope of getting a better stack trace in the future.

<snip>
> And this one looks bogus too. Question therefore is whether the
> problem you describe isn't a consequence of an earlier issue.

There is nothing apparently interesting preceding the crash.  Just some
spew from an HVM domain using the 0x39 debug port.

>
>> (XEN)    ffff83043f2c7b48: [<ffff82c4c0128bb3>] vcpu_unblock+0x4b/0x4d
>> (XEN)    ffff83043f2c7c48: [<ffff82c4c01e9400>] 
>> __get_gfn_type_access+0x94/0x20e
>> (XEN)    ffff83043f2c7c98: [<ffff82c4c01bccf3>] 
>> hvm_hap_nested_page_fault+0x25d/0x456
>> (XEN)    ffff83043f2c7d18: [<ffff82c4c01e1257>] 
>> vmx_vmexit_handler+0x140a/0x17ba
>> (XEN)    ffff83043f2c7d30: [<ffff82c4c01be519>] hvm_do_resume+0x1a/0x1b7
>> (XEN)    ffff83043f2c7d60: [<ffff82c4c01dae73>] vmx_do_resume+0x13b/0x15a
>> (XEN)    ffff83043f2c7da8: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
>> (XEN)    ffff83043f2c7e20: [<ffff82c4c0128091>] schedule+0x82a/0x839
>> (XEN)    ffff83043f2c7e50: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
>> (XEN)    ffff83043f2c7e68: [<ffff82c4c01cb132>] 
>> vlapic_has_pending_irq+0x3f/0x85
>> (XEN)    ffff83043f2c7e88: [<ffff82c4c01c50a7>] 
>> hvm_vcpu_has_pending_irq+0x9b/0xcd
>> (XEN)    ffff83043f2c7ec8: [<ffff82c4c01deca9>] vmx_vmenter_helper+0x60/0x139
>> (XEN)    ffff83043f2c7f18: [<ffff82c4c01e7439>] vmx_asm_do_vmentry+0/0xe7
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) DOUBLE FAULT -- system shutdown
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>> The hpet interrupt handler runs with interrupts enabled, due to this the
>> spin_unlock_irq() in:
>>
>>     while ( desc->status & IRQ_PENDING )
>>     {
>>         desc->status &= ~IRQ_PENDING;
>>         spin_unlock_irq(&desc->lock);
>>         tsc_in = tb_init_done ? get_cycles() : 0;
>>         action->handler(irq, action->dev_id, regs);
>>         TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles());
>>         spin_lock_irq(&desc->lock);
>>     }
>>
>> in do_IRQ().
>>
>> Clearly there are cases where the frequency of the HPET interrupt is faster
>> than the time it takes to process handle_hpet_broadcast(), I presume in part
>> because of the large amount of cpumask manipulation.
> How many CPUs (and how many usable HPET channels) does the
> system have that this crash was observed on?
>
> Jan

The machine we found this crash on is a Dell R310.  4 CPUs, 16G Ram.

The full boot xl dmesg is attached, but it appears that the are 8
broadcast hpets.  This is futher backed up by the 'i' debugkey (also
attached)

Keir: (merging your thread back here)
  I see your point regarding IRQ_INPROGRESS, but even with 8 hpet
interrupts, there are rather more than 8 occurences of
handle_hpet_broadcast() in the stack.  If the occurences were just
function pointers on the stack, I would expect to see several
handle_hpet_broadcast()+0x0/0x268

~Andrew

Attachment: xl-dmesg-boot
Description: Text document

Attachment: xl-debugkeys-i
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.