[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] HVM bug: system crashes after offline online a vcpu



Hi Konrad

I encountered a bug when trying to bring offline a cpu then online it
again in HVM. As I'm not very familiar with HVM stuffs I cannot come up
with a quick fix.

The HVM DomU is configured with 4 vcpus. After booting into command
prompt, I do following operations.

# echo 0 > /sys/devices/system/cpu/cpu3/online
# echo 1 > /sys/devices/system/cpu/cpu3/online

With Debian's default 2.6.32-5-amd64 kernel, the last log is:

    Booting processor 3 APIC 0x6 ip 0x6000

With my own kernel which is of version 3.5, I'm able to get more logs:

[   44.047358] Booting Node 0 Processor 3 APIC 0x6
[   44.061201] ------------[ cut here ]------------
[   44.065186] kernel BUG at kernel/hrtimer.c:1259!
[   44.065186] invalid opcode: 0000 [#1] SMP
[   44.065186] CPU 3
[   44.065186] Modules linked in:
[   44.065186]
[   44.065186] Pid: 0, comm: swapper/3 Not tainted 3.5.0-xen-evtchn+ #50 Xen 
HVM domU
[   44.065186] RIP: 0010:[<ffffffff8105682e>]  [<ffffffff8105682e>] 
hrtimer_interrupt+0x24/0x1a5
[   44.065186] RSP: 0000:ffff88000f463de8  EFLAGS: 00010046
[   44.065186] RAX: ffffffff8105680a RBX: ffff88000f46e640 RCX: 00000000fffffffa
[   44.065186] RDX: 00000000fffffffa RSI: 0000000000000000 RDI: ffff88000f46bd80
[   44.065186] RBP: 0000000000000057 R08: ffff88000e000b40 R09: 0000000000000019
[   44.065186] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88000e6e8e00
[   44.065186] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[   44.065186] FS:  0000000000000000(0000) GS:ffff88000f460000(0000) 
knlGS:0000000000000000
[   44.065186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   44.065186] CR2: 0000000000000000 CR3: 000000000181b000 CR4: 00000000000007e0
[   44.065186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   44.065186] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   44.065186] Process swapper/3 (pid: 0, threadinfo ffff88000e62e000, task 
ffff88000e62aea0)
[   44.065186] Stack:
[   44.065186]  0000000000000001 ffff88000f46e680 ffffffff81013711 
00000008cfba9b27
[   44.065186]  00000000fffffffa ffff88000e6e97c0 0000000000000057 
ffff88000e6e8e00
[   44.065186]  0000000000000000 0000000000000001 0000000000000000 
ffffffff81006954
[   44.065186] Call Trace:
[   44.065186]  <IRQ>
[   44.065186]  [<ffffffff81013711>] ? paravirt_sched_clock+0x5/0x8
[   44.065186]  [<ffffffff81006954>] ? xen_timer_interrupt+0x26/0x162
[   44.065186]  [<ffffffff8109a220>] ? 
check_for_new_grace_period.isra.32+0x90/0x9a
[   44.065186]  [<ffffffff810956df>] ? handle_irq_event_percpu+0x32/0x1b0
[   44.065186]  [<ffffffff8128f88b>] ? irq_get_handler_data+0x7/0x16
[   44.065186]  [<ffffffff81097e39>] ? handle_percpu_irq+0x3a/0x4f
[   44.065186]  [<ffffffff8128f9ec>] ? __xen_evtchn_do_upcall_l2+0x131/0x1c0
[   44.065186]  [<ffffffff812913d3>] ? xen_evtchn_do_upcall+0x27/0x37
[   44.065186]  [<ffffffff8140081a>] ? xen_hvm_callback_vector+0x6a/0x70
[   44.065186]  <EOI>
[   44.065186]  [<ffffffff81094b8f>] ? cpumask_next+0x17/0x19
[   44.065186]  [<ffffffff813eb75b>] ? start_secondary+0x184/0x1e2
[   44.065186]  [<ffffffff813eb757>] ? start_secondary+0x180/0x1e2
[   44.065186]  [<ffffffff813eb5d7>] ? set_cpu_sibling_map+0x40e/0x40e
[   44.065186] Code: 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 c7 
c3 40 e6 00 00 48 83 ec 28 65 48 03 1c 25 e8 db 00 00 83 7b 18 00 75 02 <0f> 0b 
48
 ff 43 20 48 bd ff ff ff ff ff ff ff 7f 41 be 03 00 00
[   44.065186] RIP  [<ffffffff8105682e>] hrtimer_interrupt+0x24/0x1a5
[   44.065186]  RSP <ffff88000f463de8>
[   44.065186] ---[ end trace 9366352b116a03db ]---
[   44.065186] Kernel panic - not syncing: Fatal exception in interrupt

And if I offline online cpu 2 in 2.6.32-5-amd64:

[   27.933928] Booting processor 2 APIC 0x4 ip 0x6000
[   25.708098] Initializing CPU#2
[   25.708098] CPU: L1 I cache: 32K, L1 D cache: 32K
[   25.708098] CPU: L2 cache: 6144K
[   25.708098] CPU 2/0x4 -> Node 0
[   25.708098] CPU: Physical Processor ID: 0
[   25.708098] CPU: Processor Core ID: 4
[   28.028234] CPU2: Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz stepping 07
[   28.069320] checking TSC synchronization [CPU#0 -> CPU#2]: passed.
[   25.708098] installing Xen timer for CPU 2
[   28.098101] CPU0 attaching NULL sched-domain.
[   28.098106] CPU1 attaching NULL sched-domain.
[   28.098110] CPU3 attaching NULL sched-domain.
[   28.098092] ------------[ cut here ]------------
[   28.098092] WARNING: at 
/build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_none/kernel/irq/chip.c:88
 unbind_from_irq+0
x147/0x159()
[   28.098092] Hardware name: HVM domU
[   28.144127] CPU0 attaching sched-domain:
[   28.144131]  domain 0: span 0-3 level CPU
[   28.144133]   groups: 0 1 2 3
[   28.144139] CPU1 attaching sched-domain:
[   28.144142]  domain 0: span 0-3 level CPU
[   28.144145]   groups: 1 2 3 0
[   28.144150] CPU2 attaching sched-domain:
[   28.144152]  domain 0: span 0-3 level CPU
[   28.144155]   groups: 2 3 0 1
[   28.144160] CPU3 attaching sched-domain:
[   28.144162]  domain 0: span 0-3 level CPU
[   28.144165]   groups: 3 0 1 2
[   28.209159] Destroying IRQ18 without calling free_irq
[   28.215985] Modules linked in: loop parport_pc parport psmouse evdev 
serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr i2c_piix4 
i2c_core butto
n processor ext3 jbd mbcache ata_generic ata_piix libata floppy thermal 
thermal_sys xen_blkfront scsi_mod [last unloaded: scsi_wait_scan]
[   28.224050] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
[   28.224050] Call Trace:
[   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
[   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
[   28.224050]  [<ffffffff8104dd7c>] ? warn_slowpath_common+0x77/0xa3
[   28.224050]  [<ffffffff8104de04>] ? warn_slowpath_fmt+0x51/0x59
[   28.224050]  [<ffffffff810e4493>] ? get_partial_node+0x15/0x85
[   28.224050]  [<ffffffff811966fd>] ? kvasprintf+0x41/0x68
[   28.224050]  [<ffffffff8109639e>] ? dynamic_irq_cleanup_x+0x4b/0xc2
[   28.224050]  [<ffffffff811ef131>] ? unbind_from_irq+0x147/0x159
[   28.224050]  [<ffffffff811ef5b7>] ? bind_virq_to_irqhandler+0x14c/0x15d
[   28.224050]  [<ffffffff8100df77>] ? xen_timer_interrupt+0x0/0x18d
[   28.224050]  [<ffffffff812f5121>] ? set_cpu_sibling_map+0x2f4/0x311
[   28.224050]  [<ffffffff8100df0d>] ? xen_setup_timer+0x55/0xa2
[   28.224050]  [<ffffffff8100df71>] ? xen_hvm_setup_cpu_clockevents+0x17/0x1d
[   28.224050]  [<ffffffff812f52fc>] ? start_secondary+0x17c/0x185
[   28.224050] ---[ end trace db1493923b5e103d ]---

The logs for cpu 2 in my 3.5 kernel is identical to those for cpu 3.


Wei.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.