|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC Patch] x86/hpet: Disable interrupts while running hpet interrupt handler.
>>> On 05.08.13 at 22:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> Automated testing on Xen-4.3 testing tip found an interesting issue
>
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.3.0 x86_64 debug=y Not tainted ]----
The call trace is suspicious in ways beyond what Keir already
pointed out - with debug=y, there shouldn't be bogus entries listed,
yet ...
> (XEN) CPU: 3
> (XEN) RIP: e008:[<ffff82c4c01003d0>] __bitmap_and+0/0x3f
> (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: 0000000000000020 rcx: 0000000000000100
> (XEN) rdx: ffff82c4c032dfc0 rsi: ffff83043f2c6068 rdi: ffff83043f2c6008
> (XEN) rbp: ffff83043f2c6048 rsp: ffff83043f2c6000 r8: 0000000000000001
> (XEN) r9: 0000000000000000 r10: ffff83043f2c76f0 r11: 0000000000000000
> (XEN) r12: ffff83043f2c6008 r13: 7fffffffffffffff r14: ffff83043f2c6068
> (XEN) r15: 000003343036797b cr0: 0000000080050033 cr4: 00000000000026f0
> (XEN) cr3: 0000000403c40000 cr2: ffff83043f2c5ff8
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Valid stack range: ffff83043f2c6000-ffff83043f2c8000,
> sp=ffff83043f2c6000, tss.esp0=ffff83043f2c7fc0
> (XEN) Xen stack overflow (dumping trace ffff83043f2c6000-ffff83043f2c8000):
[... removed redundant stuff]
> (XEN) ffff83043f2c6b28: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f
> (XEN) ffff83043f2c6bf8: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN) ffff83043f2c6c80: [<ffff82c4c012a535>]
> _spin_unlock_irqrestore+0x40/0x42
> (XEN) ffff83043f2c6cb8: [<ffff82c4c01a78d4>]
> handle_hpet_broadcast+0x5b/0x268
> (XEN) ffff83043f2c6d28: [<ffff82c4c01a7b41>]
> hpet_interrupt_handler+0x3e/0x40
> (XEN) ffff83043f2c6d38: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f
> (XEN) ffff83043f2c6e08: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN) ffff83043f2c6e90: [<ffff82c4c012a577>] _spin_unlock_irq+0x40/0x41
> (XEN) ffff83043f2c6eb8: [<ffff82c4c01704d6>] do_IRQ+0x970/0xa4f
> (XEN) ffff83043f2c6ed8: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba
> (XEN) ffff83043f2c6f28: [<ffff82c4c01a7a29>]
> handle_hpet_broadcast+0x1b0/0x268
> (XEN) ffff83043f2c6f88: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70
> (XEN) ffff83043f2c7010: [<ffff82c4c0164f94>] unmap_domain_page+0x6/0x32d
> (XEN) ffff83043f2c7048: [<ffff82c4c01ef69d>] ept_next_level+0x9c/0xde
> (XEN) ffff83043f2c7078: [<ffff82c4c01f0ab3>] ept_get_entry+0xb3/0x239
> (XEN) ffff83043f2c7108: [<ffff82c4c01e9497>]
> __get_gfn_type_access+0x12b/0x20e
> (XEN) ffff83043f2c7158: [<ffff82c4c01e9cc2>]
> get_page_from_gfn_p2m+0xc8/0x25d
> (XEN) ffff83043f2c71c8: [<ffff82c4c01f4660>]
> map_domain_gfn_3_levels+0x43/0x13a
> (XEN) ffff83043f2c7208: [<ffff82c4c01f4b6b>]
> guest_walk_tables_3_levels+0x414/0x489
> (XEN) ffff83043f2c7288: [<ffff82c4c0223988>]
> hap_p2m_ga_to_gfn_3_levels+0x178/0x306
> (XEN) ffff83043f2c7338: [<ffff82c4c0223b35>]
> hap_gva_to_gfn_3_levels+0x1f/0x2a
> (XEN) ffff83043f2c7348: [<ffff82c4c01ebc1e>] paging_gva_to_gfn+0xb6/0xcc
> (XEN) ffff83043f2c7398: [<ffff82c4c01bedf2>] __hvm_copy+0x57/0x36d
> (XEN) ffff83043f2c73c8: [<ffff82c4c01b6d34>]
> hvmemul_virtual_to_linear+0x102/0x153
> (XEN) ffff83043f2c7408: [<ffff82c4c01c1538>]
> hvm_copy_from_guest_virt+0x15/0x17
> (XEN) ffff83043f2c7418: [<ffff82c4c01b7cd3>] __hvmemul_read+0x12d/0x1c8
> (XEN) ffff83043f2c7498: [<ffff82c4c01b7dc1>] hvmemul_read+0x12/0x14
> (XEN) ffff83043f2c74a8: [<ffff82c4c01937e9>] read_ulong+0xe/0x10
> (XEN) ffff83043f2c74b8: [<ffff82c4c0195924>] x86_emulate+0x169d/0x11309
... how would this end up getting called from do_IRQ()?
> (XEN) ffff83043f2c7558: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f
> (XEN) ffff83043f2c75c0: [<ffff82c4c012a100>]
> _spin_trylock_recursive+0x63/0x93
> (XEN) ffff83043f2c75d8: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f
> (XEN) ffff83043f2c7618: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba
> (XEN) ffff83043f2c7668: [<ffff82c4c01a7a29>]
> handle_hpet_broadcast+0x1b0/0x268
> (XEN) ffff83043f2c76c8: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN) ffff83043f2c7788: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN) ffff83043f2c77b8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239
> (XEN) ffff83043f2c7848: [<ffff82c4c01775ef>] get_page+0x27/0xf2
> (XEN) ffff83043f2c7898: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde
> (XEN) ffff83043f2c78c8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239
> (XEN) ffff83043f2c7a98: [<ffff82c4c01b7f60>] hvm_emulate_one+0x127/0x1bf
> (XEN) ffff83043f2c7aa8: [<ffff82c4c01b6c1b>] hvmemul_get_seg_reg+0x49/0x60
> (XEN) ffff83043f2c7ae8: [<ffff82c4c01c38c5>] handle_mmio+0x55/0x1f0
> (XEN) ffff83043f2c7b38: [<ffff82c4c0108208>] do_event_channel_op+0/0x10cb
And this one looks bogus too. Question therefore is whether the
problem you describe isn't a consequence of an earlier issue.
> (XEN) ffff83043f2c7b48: [<ffff82c4c0128bb3>] vcpu_unblock+0x4b/0x4d
> (XEN) ffff83043f2c7c48: [<ffff82c4c01e9400>]
> __get_gfn_type_access+0x94/0x20e
> (XEN) ffff83043f2c7c98: [<ffff82c4c01bccf3>]
> hvm_hap_nested_page_fault+0x25d/0x456
> (XEN) ffff83043f2c7d18: [<ffff82c4c01e1257>]
> vmx_vmexit_handler+0x140a/0x17ba
> (XEN) ffff83043f2c7d30: [<ffff82c4c01be519>] hvm_do_resume+0x1a/0x1b7
> (XEN) ffff83043f2c7d60: [<ffff82c4c01dae73>] vmx_do_resume+0x13b/0x15a
> (XEN) ffff83043f2c7da8: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
> (XEN) ffff83043f2c7e20: [<ffff82c4c0128091>] schedule+0x82a/0x839
> (XEN) ffff83043f2c7e50: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48
> (XEN) ffff83043f2c7e68: [<ffff82c4c01cb132>]
> vlapic_has_pending_irq+0x3f/0x85
> (XEN) ffff83043f2c7e88: [<ffff82c4c01c50a7>]
> hvm_vcpu_has_pending_irq+0x9b/0xcd
> (XEN) ffff83043f2c7ec8: [<ffff82c4c01deca9>] vmx_vmenter_helper+0x60/0x139
> (XEN) ffff83043f2c7f18: [<ffff82c4c01e7439>] vmx_asm_do_vmentry+0/0xe7
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) DOUBLE FAULT -- system shutdown
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
> The hpet interrupt handler runs with interrupts enabled, due to this the
> spin_unlock_irq() in:
>
> while ( desc->status & IRQ_PENDING )
> {
> desc->status &= ~IRQ_PENDING;
> spin_unlock_irq(&desc->lock);
> tsc_in = tb_init_done ? get_cycles() : 0;
> action->handler(irq, action->dev_id, regs);
> TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles());
> spin_lock_irq(&desc->lock);
> }
>
> in do_IRQ().
>
> Clearly there are cases where the frequency of the HPET interrupt is faster
> than the time it takes to process handle_hpet_broadcast(), I presume in part
> because of the large amount of cpumask manipulation.
How many CPUs (and how many usable HPET channels) does the
system have that this crash was observed on?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |