[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen 4.6 Live Migration and Hotplugging Issues


  • To: xen-users@xxxxxxxxxxxxx
  • From: Jan Marquardt <jm@xxxxxxxxxxx>
  • Date: Wed, 10 Jan 2018 12:07:20 +0100
  • Delivery-date: Wed, 10 Jan 2018 11:08:36 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>

Hi,

we are still experiencing this issue. In the meantime we also tested Xen
4.9 and HVM domus with Debian and several kernel versions as well as
Ubuntu and CentOS with their current kernels. All of them show the same
problem.

Is this supposed to work? Does anyone do this successful in the wild?

Best Regards

Jan

Am 30.10.17 um 17:14 schrieb Tim Evers:
> Hi,
> 
> I am trying to set up two Ubuntu 16.04 / Xen 4.6 Machines to perform
> live migration and CPU / memory hotplug. So far I encountered several
> catastrophic issues. They are so severe that I am thinking I might be on
> the wrong track alltogether.
> 
> Any input is highly appreciated!
> 
> The setup:
> 
> 2 Dell M630 with Ubuntu 16.04 and Xen 4.6, 64bit Dom0 (node1 + node2)
> 
> 2 Domus, Debian Jessie 64bit PV and Debian Jessie 64bit HVM
> 
> Now create a PV Domu on node1 with 1 CPU Core and 2 GB RAM and plenty of
> room for hot-add / hotplug:
> 
> Config excerpt:
> 
> kernel       = "/home/xen/shared/boot/tests/vmlinuz-3.16.0-4-amd64"
> ramdisk      = "/home/xen/shared/boot/tests/initrd.img-3.16.0-4-amd64"
> maxmem       = 16384
> memory       = 2048
> maxvcpus     = 8
> vcpus        = 1
> cpus         = "18"
> 
> xm list:
> 
> root1823     97  2048     1     -b----      15.1
> 
> All is fine. Now migrate to node2. Immediately after the migratiion we see:
> 
> xm list:
> 
> root182      360 16384     1     -b----      10.5
> 
> So the DomU immediately ballooned to its maxmem after the migration, and
> even better, inside the Domu we see all CPUs are suddenly hotplugged
> (but not online due to missing udev rules):
> 
> root@debian8:~# ls /sys/devices/system/cpu/ | grep cpu
> cpu0
> cpu1
> cpu2
> cpu3
> cpu4
> cpu5
> cpu6
> cpu7
> 
> So this is already not how it is supposed to be (DomU should look the
> same before and after migration).
> 
> Now we take cpu1 online:
> 
> echo 1 > /sys/devices/system/cpu/cpu1/online
> 
> Result as seen through hvc on the Dom0:
> 
> [  373.360949] installing Xen timer for CPU 1
> [  400.032003] BUG: soft lockup - CPU#0 stuck for 22s! [bash:733]
> [  400.032003] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl
> nfs lockd fscache sunrpc evdev pcspkr x86_pkg_temp_thermal thermal_sys
> coretemp crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper
> ablk_helper cryptd autofs4 ext4 crc16 mbcache jbd2 crct10dif_pclmul
> crct10dif_common xen_netfront xen_blkfront crc32c_intel
> [  400.032003] CPU: 0 PID: 733 Comm: bash Not tainted 3.16.0-4-amd64 #1
> Debian 3.16.43-2+deb8u3
> [  400.032003] task: ffff88000470e1d0 ti: ffff88006acec000 task.ti:
> ffff88006acec000
> [  400.032003] RIP: e030:[<ffffffff810013aa>]  [<ffffffff810013aa>]
> xen_hypercall_sched_op+0xa/0x20
> [  400.032003] RSP: e02b:ffff88006acefdd0  EFLAGS: 00000246
> [  400.032003] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
> ffffffff810013aa
> [  400.032003] RDX: ffff88007d640000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [  400.032003] RBP: ffff88006bcf6000 R08: ffff88007d03d5c8 R09:
> 0000000000000122
> [  400.032003] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000001
> [  400.032003] R13: 000000000000cd60 R14: ffff88006d1dca20 R15:
> 000000000007d649
> [  400.032003] FS:  00007fe4b215e700(0000) GS:ffff88007d600000(0000)
> knlGS:0000000000000000
> [  400.032003] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  400.032003] CR2: 00000000016de6d0 CR3: 0000000004a67000 CR4:
> 0000000000042660
> [  400.032003] Stack:
> [  400.032003]  ffff88006acefb3e 0000000000000000 ffffffff81010dc1
> 0000000001323d35
> [  400.032003]  0000000000000000 0000000000000000 0000000000000001
> 0000000000000001
> [  400.032003]  ffff88006d1dca20 0000000000000000 ffffffff81068cac
> 000000306aceff3c
> [  400.032003] Call Trace:
> [  400.032003]  [<ffffffff81010dc1>] ? xen_cpu_up+0x211/0x500
> [  400.032003]  [<ffffffff81068cac>] ? _cpu_up+0x12c/0x160
> [  400.032003]  [<ffffffff81068d59>] ? cpu_up+0x79/0xa0
> [  400.032003]  [<ffffffff8150b615>] ? cpu_subsys_online+0x35/0x80
> [  400.032003]  [<ffffffff813a608d>] ? device_online+0x5d/0xa0
> [  400.032003]  [<ffffffff813a6145>] ? online_store+0x75/0x80
> [  400.032003]  [<ffffffff8121b56a>] ? kernfs_fop_write+0xda/0x150
> [  400.032003]  [<ffffffff811aaf32>] ? vfs_write+0xb2/0x1f0
> [  400.032003]  [<ffffffff811aba72>] ? SyS_write+0x42/0xa0
> [  400.032003]  [<ffffffff8151a48d>] ?
> system_call_fast_compare_end+0x10/0x15
> [  400.032003] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc
> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00
> 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
> 
> The same happens on the HVM DomU but always only _after_ live migration.
> Hotplugging works flawlessly if done on the Dom0 where the DomU is
> started on.
> 
> Any idea what might be happening here? Anyone who has managed to migrate
> and afterwards hotplug a DomU?
> 
> Thanks
> 
> Tim
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxx
> https://lists.xen.org/xen-users

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: support@xxxxxxxxxxx | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.