Xen project Mailing List

Re: [Xen-devel] [RFC Patch] x86/hpet: Disable interrupts while running hpet interrupt handler.

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 06 Aug 2013 09:01:03 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Tim Deegan <tim@xxxxxxx>

Delivery-date: Tue, 06 Aug 2013 08:01:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 05.08.13 at 22:38, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > Automated testing on Xen-4.3 testing tip found an interesting issue > > (XEN) *** DOUBLE FAULT *** > (XEN) ----[ Xen-4.3.0 x86_64 debug=y Not tainted ]---- The call trace is suspicious in ways beyond what Keir already pointed out - with debug=y, there shouldn't be bogus entries listed, yet ... > (XEN) CPU: 3 > (XEN) RIP: e008:[<ffff82c4c01003d0>] __bitmap_and+0/0x3f > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: 0000000000000020 rcx: 0000000000000100 > (XEN) rdx: ffff82c4c032dfc0 rsi: ffff83043f2c6068 rdi: ffff83043f2c6008 > (XEN) rbp: ffff83043f2c6048 rsp: ffff83043f2c6000 r8: 0000000000000001 > (XEN) r9: 0000000000000000 r10: ffff83043f2c76f0 r11: 0000000000000000 > (XEN) r12: ffff83043f2c6008 r13: 7fffffffffffffff r14: ffff83043f2c6068 > (XEN) r15: 000003343036797b cr0: 0000000080050033 cr4: 00000000000026f0 > (XEN) cr3: 0000000403c40000 cr2: ffff83043f2c5ff8 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Valid stack range: ffff83043f2c6000-ffff83043f2c8000, > sp=ffff83043f2c6000, tss.esp0=ffff83043f2c7fc0 > (XEN) Xen stack overflow (dumping trace ffff83043f2c6000-ffff83043f2c8000): [... removed redundant stuff] > (XEN) ffff83043f2c6b28: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f > (XEN) ffff83043f2c6bf8: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70 > (XEN) ffff83043f2c6c80: [<ffff82c4c012a535>] > _spin_unlock_irqrestore+0x40/0x42 > (XEN) ffff83043f2c6cb8: [<ffff82c4c01a78d4>] > handle_hpet_broadcast+0x5b/0x268 > (XEN) ffff83043f2c6d28: [<ffff82c4c01a7b41>] > hpet_interrupt_handler+0x3e/0x40 > (XEN) ffff83043f2c6d38: [<ffff82c4c0170500>] do_IRQ+0x99a/0xa4f > (XEN) ffff83043f2c6e08: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70 > (XEN) ffff83043f2c6e90: [<ffff82c4c012a577>] _spin_unlock_irq+0x40/0x41 > (XEN) ffff83043f2c6eb8: [<ffff82c4c01704d6>] do_IRQ+0x970/0xa4f > (XEN) ffff83043f2c6ed8: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba > (XEN) ffff83043f2c6f28: [<ffff82c4c01a7a29>] > handle_hpet_broadcast+0x1b0/0x268 > (XEN) ffff83043f2c6f88: [<ffff82c4c016806f>] common_interrupt+0x5f/0x70 > (XEN) ffff83043f2c7010: [<ffff82c4c0164f94>] unmap_domain_page+0x6/0x32d > (XEN) ffff83043f2c7048: [<ffff82c4c01ef69d>] ept_next_level+0x9c/0xde > (XEN) ffff83043f2c7078: [<ffff82c4c01f0ab3>] ept_get_entry+0xb3/0x239 > (XEN) ffff83043f2c7108: [<ffff82c4c01e9497>] > __get_gfn_type_access+0x12b/0x20e > (XEN) ffff83043f2c7158: [<ffff82c4c01e9cc2>] > get_page_from_gfn_p2m+0xc8/0x25d > (XEN) ffff83043f2c71c8: [<ffff82c4c01f4660>] > map_domain_gfn_3_levels+0x43/0x13a > (XEN) ffff83043f2c7208: [<ffff82c4c01f4b6b>] > guest_walk_tables_3_levels+0x414/0x489 > (XEN) ffff83043f2c7288: [<ffff82c4c0223988>] > hap_p2m_ga_to_gfn_3_levels+0x178/0x306 > (XEN) ffff83043f2c7338: [<ffff82c4c0223b35>] > hap_gva_to_gfn_3_levels+0x1f/0x2a > (XEN) ffff83043f2c7348: [<ffff82c4c01ebc1e>] paging_gva_to_gfn+0xb6/0xcc > (XEN) ffff83043f2c7398: [<ffff82c4c01bedf2>] __hvm_copy+0x57/0x36d > (XEN) ffff83043f2c73c8: [<ffff82c4c01b6d34>] > hvmemul_virtual_to_linear+0x102/0x153 > (XEN) ffff83043f2c7408: [<ffff82c4c01c1538>] > hvm_copy_from_guest_virt+0x15/0x17 > (XEN) ffff83043f2c7418: [<ffff82c4c01b7cd3>] __hvmemul_read+0x12d/0x1c8 > (XEN) ffff83043f2c7498: [<ffff82c4c01b7dc1>] hvmemul_read+0x12/0x14 > (XEN) ffff83043f2c74a8: [<ffff82c4c01937e9>] read_ulong+0xe/0x10 > (XEN) ffff83043f2c74b8: [<ffff82c4c0195924>] x86_emulate+0x169d/0x11309 ... how would this end up getting called from do_IRQ()? > (XEN) ffff83043f2c7558: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f > (XEN) ffff83043f2c75c0: [<ffff82c4c012a100>] > _spin_trylock_recursive+0x63/0x93 > (XEN) ffff83043f2c75d8: [<ffff82c4c0170564>] do_IRQ+0x9fe/0xa4f > (XEN) ffff83043f2c7618: [<ffff82c4c01aa7bd>] cpuidle_wakeup_mwait+0xad/0xba > (XEN) ffff83043f2c7668: [<ffff82c4c01a7a29>] > handle_hpet_broadcast+0x1b0/0x268 > (XEN) ffff83043f2c76c8: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde > (XEN) ffff83043f2c7788: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde > (XEN) ffff83043f2c77b8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239 > (XEN) ffff83043f2c7848: [<ffff82c4c01775ef>] get_page+0x27/0xf2 > (XEN) ffff83043f2c7898: [<ffff82c4c01ef6a5>] ept_next_level+0xa4/0xde > (XEN) ffff83043f2c78c8: [<ffff82c4c01f0c27>] ept_get_entry+0x227/0x239 > (XEN) ffff83043f2c7a98: [<ffff82c4c01b7f60>] hvm_emulate_one+0x127/0x1bf > (XEN) ffff83043f2c7aa8: [<ffff82c4c01b6c1b>] hvmemul_get_seg_reg+0x49/0x60 > (XEN) ffff83043f2c7ae8: [<ffff82c4c01c38c5>] handle_mmio+0x55/0x1f0 > (XEN) ffff83043f2c7b38: [<ffff82c4c0108208>] do_event_channel_op+0/0x10cb And this one looks bogus too. Question therefore is whether the problem you describe isn't a consequence of an earlier issue. > (XEN) ffff83043f2c7b48: [<ffff82c4c0128bb3>] vcpu_unblock+0x4b/0x4d > (XEN) ffff83043f2c7c48: [<ffff82c4c01e9400>] > __get_gfn_type_access+0x94/0x20e > (XEN) ffff83043f2c7c98: [<ffff82c4c01bccf3>] > hvm_hap_nested_page_fault+0x25d/0x456 > (XEN) ffff83043f2c7d18: [<ffff82c4c01e1257>] > vmx_vmexit_handler+0x140a/0x17ba > (XEN) ffff83043f2c7d30: [<ffff82c4c01be519>] hvm_do_resume+0x1a/0x1b7 > (XEN) ffff83043f2c7d60: [<ffff82c4c01dae73>] vmx_do_resume+0x13b/0x15a > (XEN) ffff83043f2c7da8: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48 > (XEN) ffff83043f2c7e20: [<ffff82c4c0128091>] schedule+0x82a/0x839 > (XEN) ffff83043f2c7e50: [<ffff82c4c012a1e1>] _spin_lock+0x11/0x48 > (XEN) ffff83043f2c7e68: [<ffff82c4c01cb132>] > vlapic_has_pending_irq+0x3f/0x85 > (XEN) ffff83043f2c7e88: [<ffff82c4c01c50a7>] > hvm_vcpu_has_pending_irq+0x9b/0xcd > (XEN) ffff83043f2c7ec8: [<ffff82c4c01deca9>] vmx_vmenter_helper+0x60/0x139 > (XEN) ffff83043f2c7f18: [<ffff82c4c01e7439>] vmx_asm_do_vmentry+0/0xe7 > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) DOUBLE FAULT -- system shutdown > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > The hpet interrupt handler runs with interrupts enabled, due to this the > spin_unlock_irq() in: > > while ( desc->status & IRQ_PENDING ) > { > desc->status &= ~IRQ_PENDING; > spin_unlock_irq(&desc->lock); > tsc_in = tb_init_done ? get_cycles() : 0; > action->handler(irq, action->dev_id, regs); > TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles()); > spin_lock_irq(&desc->lock); > } > > in do_IRQ(). > > Clearly there are cases where the frequency of the HPET interrupt is faster > than the time it takes to process handle_hpet_broadcast(), I presume in part > because of the large amount of cpumask manipulation. How many CPUs (and how many usable HPET channels) does the system have that this crash was observed on? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.