[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] 33 VCPUs in HVM guests with live migration with Linux hangs
When live migrating I found out that if you try with more than 32 VCPUs the guest is stuck. It works OK when booting - all 33 VCPUs show up. I use this small config: kernel = "hvmloader" device_model_version = 'qemu-xen-traditional' vcpus = 33 builder='hvm' memory=1024 serial='file:/var/log/xen/console-bootstrap-x86_64-pvhvm' name="m" disk = [ 'file:/mnt/lab/bootstrap-x86_64/root_image.iso,hdc:cdrom,r','phy:/dev/guests/bootstrap-x86_64-pvhvm,xvda,w'] boot="dn" vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ] vnc=1 vnclisten="0.0.0.0" usb=1 usbdevice="tablet" And do a migration: m 33 1023 33 -b---- 14.3 -bash-4.1# xl migrate m localhost root@localhost's password: migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/418) Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/418) Savefile contains xl domain config WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware libxl: notice: libxl_numa.c:494:libxl__get_numa_candidate: NUMA placement failed, performance might be affected xc: Reloading memory pages: 262144/1045504 25%migration target: Transfer complete, requesting permission to start domain. migration sender: Target has acknowledged transfer. migration sender: Giving target permission to start. migration target: Got permission, starting domain. migration target: Domain started successsfully. migration sender: Target reports successful startup. Migration successful. Which completes succesfully. xl vpcu-list tells me that all 32 VCPUs are blocked except one - 33rd. Which seems to alternate between: Call Trace: [<ffffffff81128a5a>] multi_cpu_stop+0x9a <-- ffff8800365b5da0: [<ffffffff811289c0>] multi_cpu_stop ffff8800365b5dc0: [<ffffffff811290aa>] cpu_stopper_thread+0x4a ffff8800365b5de0: [<ffffffff816f7081>] __schedule+0x381 ffff8800365b5e38: [<ffffffff810cbf90>] smpboot_thread_fn ffff8800365b5e80: [<ffffffff810cc0d8>] smpboot_thread_fn+0x148 ffff8800365b5eb0: [<ffffffff810cbf90>] smpboot_thread_fn ffff8800365b5ec0: [<ffffffff810c498e>] kthread+0xce ffff8800365b5f28: [<ffffffff810c48c0>] kthread ffff8800365b5f50: [<ffffffff81703e0c>] ret_from_fork+0x7c ffff8800365b5f80: [<ffffffff810c48c0>] kthread and [<ffffffff8108f3c8>] pvclock_clocksource_read+0x18 <-- ffff880038603ef0: [<ffffffff81045698>] xen_clocksource_read+0x28 ffff880038603f00: [<ffffffff81057909>] sched_clock+0x9 ffff880038603f10: [<ffffffff810d7b85>] sched_clock_local+0x25 ffff880038603f40: [<ffffffff810d7ca8>] sched_clock_cpu+0xb8 ffff880038603f60: [<ffffffff810d840e>] irqtime_account_irq+0x4e ffff880038603f80: [<ffffffff810a5279>] irq_enter+0x39 ffff880038603f90: [<ffffffff813f8480>] xen_evtchn_do_upcall+0x20 ffff880038603fb0: [<ffffffff817058ed>] xen_hvm_callback_vector+0x6d which implies that the CPU is receiving interrupts, but somehow is in thread doing something.. probably waiting for an mutex. When the CPU (33) started (before migration) this was its stack: Call Trace: [<ffffffff8108e846>] native_safe_halt+0x6 <-- ffff8800377d1e90: [<ffffffff8105989a>] default_idle+0x1a ffff8800377d1eb0: [<ffffffff810591f6>] arch_cpu_idle+0x26 ffff8800377d1ec0: [<ffffffff810f6d76>] cpu_startup_entry+0xa6 ffff8800377d1ef0: [<ffffffff81109a55>] clockevents_register_device+0x105 ffff8800377d1f30: [<ffffffff8108236e>] start_secondary+0x19e The only culprit I could think of was commit d5b17dbff83d63fb6bf35daec21c8ebfb8d695b5 "xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info" which I had reverted - but that did not help. So questions: 1) Had anybody else actually booted HVM guests with more than 32 VCPUs and tried to migrate? 2) If yes, had you seen this before? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |