[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Live migration OOPS
Hi all,
I am running Debian Squeeze with a back ported kernel 3.2.0-0.bpo.4-amd64. The cluster is managed by ganeti. Hardware: Source:
Dell r510 64Gb RAM Destination: Dell r720 192Gb RAM When live migrating VMs from an older machine to the newer one I get a kernel oops and the VM seems to not recover from being restored.
It looks similar to If I start it on a smaller system and migrate to the larger one it fails. If I start the VM on the larger host and migrate back it works. If I then migrate back to the larger host it works.
If I remove memory from the newer machine making it have 64Gb I can live migrate VMs to it. Can anyone advise a solution? I notice there are possible fixes for this but unsure from the threads what is the current status of the patches, which versions of Xen/linux kernel this should work with.
Many thanks, Matt Output on the console of the VM is as follows: [4760249.894618] BUG: unable to handle kernel paging request at 00007f08a0b23414
[4760249.894618] IP: [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b [4760249.894618] PGD 0 [4760249.894618] Oops: 0002 [#1] SMP [4760249.894618] CPU 0 [4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] [4760249.894618] Pid: 6, comm: migration/0 Not tainted 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1 [4760249.894618] RIP: e030:[<ffffffff810077c9>] [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618] RSP: e02b:ffff88001e9dbd70 EFLAGS: 00010002 [4760249.894618] RAX: 0000000001805694 RBX: ffffffffff57a000 RCX: ffffffff81815000 [4760249.894618] RDX: 000000000000000c RSI: ffff88001e9d4e60 RDI: 0000000001805694
[4760249.894618] RBP: 0000000000000000 R08: 0000000000000000 R09: 80000000ba109063 [4760249.894618] R10: 0000000000007ff0 R11: 000000000000036d R12: 0000000000000003 [4760249.894618] R13: ffff88001ea13da4 R14: ffff88001ea13d01 R15: ffff88001e9d4e60
[4760249.894618] FS: 00007fe3e858e700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [4760249.894618] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4760249.894618] CR2: 00007f08a0b23414 CR3: 000000000ca9f000 CR4: 0000000000002660
[4760249.894618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4760249.894618] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4760249.894618] Process migration/0 (pid: 6, threadinfo ffff88001e9da000, task ffff88001e9d4e60)
[4760249.894618] Stack: [4760249.894618] 0000000000000000 ffffffff81007292 ffff88001ea13e10 ffffffff812323b4 [4760249.894618] ffffffff81815000 ffffffff81232389 ffff880000000002 ffff88001e9da010
[4760249.894618] ffff88001e9d0201 ffff88001ea13d80 ffff88001fc002b8 ffffffff8108da5f [4760249.894618] Call Trace: [4760249.894618] [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97
[4760249.894618] [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14 [4760249.894618] [<ffffffff81232389>] ? xen_suspend+0x68/0x8a [4760249.894618] [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0
[4760249.894618] [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39 [4760249.894618] [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191 [4760249.894618] [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7
[4760249.894618] [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd [4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf [4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618] [<ffffffff8106371d>] ? kthread+0x7a/0x82 [4760249.894618] [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10 [4760249.894618] [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b
[4760249.894618] [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6 [4760249.894618] [<ffffffff81370270>] ? gs_change+0x13/0x13 [4760249.894618] Code: 8b 1d e4 6f 60 00 48 81 fb 20 a1 73 81 75 04 0f 0b eb fe 48 8b 3d 08 56 73 00 e8 77 b8 02 00 48 c1 e8 0c 48 89 c7 e8 7e fe ff ff <48> 89 83 18 0c 00 00 48 8b 15 69 54 68 00 48 8b 05 aa 6f 60 00
[4760249.894618] RIP [<ffffffff810077c9>] xen_setup_mfn_list_list+0x2d/0x4b [4760249.894618] RSP <ffff88001e9dbd70> [4760249.894618] CR2: 00007f08a0b23414 [4760249.894618] ---[ end trace 0fcf6cf0a1d1efdd ]---
[4760249.894618] ------------[ cut here ]------------ [4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:298 ktime_get_ts+0x27/0x85()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 6, comm: migration/0 Tainted: G D 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1 [4760249.894618] Call Trace: [4760249.894618] [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618] [<ffffffff8106b2eb>] ? ktime_get_ts+0x27/0x85 [4760249.894618] [<ffffffff81081d56>] ? do_acct_process+0x89/0x3bc [4760249.894618] [<ffffffff810820ed>] ? acct_process+0x64/0x7d
[4760249.894618] [<ffffffff8104cf53>] ? do_exit+0x265/0x799 [4760249.894618] [<ffffffff81366e89>] ? printk+0x40/0x47 [4760249.894618] [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618] [<ffffffff81049ea6>] ? kmsg_dump+0x53/0xef [4760249.894618] [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11 [4760249.894618] [<ffffffff81369a09>] ? oops_end+0xb1/0xb6
[4760249.894618] [<ffffffff8102fdad>] ? no_context+0x1ff/0x20c [4760249.894618] [<ffffffff8136bb6a>] ? do_page_fault+0x1ad/0x34c [4760249.894618] [<ffffffff81007605>] ? get_phys_to_machine+0x16/0x58
[4760249.894618] [<ffffffff810046e2>] ? pte_pfn_to_mfn+0x23/0x74 [4760249.894618] [<ffffffff810047b0>] ? xen_make_pte+0x7d/0x7f [4760249.894618] [<ffffffff810044f5>] ? __raw_callee_save_xen_make_pte+0x11/0x1e
[4760249.894618] [<ffffffff810042c9>] ? xen_mc_flush+0x12b/0x158 [4760249.894618] [<ffffffff813690f5>] ? page_fault+0x25/0x30 [4760249.894618] [<ffffffff810077c9>] ? xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618] [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97 [4760249.894618] [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14 [4760249.894618] [<ffffffff81232389>] ? xen_suspend+0x68/0x8a
[4760249.894618] [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0 [4760249.894618] [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39 [4760249.894618] [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191
[4760249.894618] [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7 [4760249.894618] [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd [4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf [4760249.894618] [<ffffffff8106371d>] ? kthread+0x7a/0x82 [4760249.894618] [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10
[4760249.894618] [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b [4760249.894618] [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6 [4760249.894618] [<ffffffff81370270>] ? gs_change+0x13/0x13
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efde ]--- [4760249.894618] ------------[ cut here ]------------ [4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x88()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 6, comm: migration/0 Tainted: G D W 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1 [4760249.894618] Call Trace: [4760249.894618] [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618] [<ffffffff8106b367>] ? ktime_get+0x1e/0x88 [4760249.894618] [<ffffffffa003a8f2>] ? start_this_handle+0x16c/0x30d [jbd] [4760249.894618] [<ffffffffa003abe6>] ? journal_start+0x94/0xc3 [jbd]
[4760249.894618] [<ffffffffa004dd1d>] ? ext3_dirty_inode+0x25/0x78 [ext3] [4760249.894618] [<ffffffff81124fad>] ? __mark_inode_dirty+0x22/0x1a7 [4760249.894618] [<ffffffff81119558>] ? file_update_time+0xd4/0xff
[4760249.894618] [<ffffffff810bde74>] ? __generic_file_aio_write+0x15b/0x277 [4760249.894618] [<ffffffff81049b42>] ? __call_console_drivers+0x75/0x86 [4760249.894618] [<ffffffff81368cc8>] ? _raw_spin_lock_irqsave+0x11/0x2f
[4760249.894618] [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11 [4760249.894618] [<ffffffff810bdfef>] ? generic_file_aio_write+0x5f/0xb3 [4760249.894618] [<ffffffff811066e2>] ? do_sync_write+0xba/0xf3
[4760249.894618] [<ffffffff8100fcd2>] ? dump_trace+0x236/0x245 [4760249.894618] [<ffffffff8106b2eb>] ? ktime_get_ts+0x27/0x85 [4760249.894618] [<ffffffff8102d1b2>] ? pvclock_clocksource_read+0x46/0xb4
[4760249.894618] [<ffffffff8106a696>] ? timekeeping_get_ns+0xd/0x2a [4760249.894618] [<ffffffff81082047>] ? do_acct_process+0x37a/0x3bc [4760249.894618] [<ffffffff810820ed>] ? acct_process+0x64/0x7d
[4760249.894618] [<ffffffff8104cf53>] ? do_exit+0x265/0x799 [4760249.894618] [<ffffffff81366e89>] ? printk+0x40/0x47 [4760249.894618] [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11
[4760249.894618] [<ffffffff81049ea6>] ? kmsg_dump+0x53/0xef [4760249.894618] [<ffffffff81368add>] ? _raw_spin_unlock_irqrestore+0x10/0x11 [4760249.894618] [<ffffffff81369a09>] ? oops_end+0xb1/0xb6
[4760249.894618] [<ffffffff8102fdad>] ? no_context+0x1ff/0x20c [4760249.894618] [<ffffffff8136bb6a>] ? do_page_fault+0x1ad/0x34c [4760249.894618] [<ffffffff81007605>] ? get_phys_to_machine+0x16/0x58
[4760249.894618] [<ffffffff810046e2>] ? pte_pfn_to_mfn+0x23/0x74 [4760249.894618] [<ffffffff810047b0>] ? xen_make_pte+0x7d/0x7f [4760249.894618] [<ffffffff810044f5>] ? __raw_callee_save_xen_make_pte+0x11/0x1e
[4760249.894618] [<ffffffff810042c9>] ? xen_mc_flush+0x12b/0x158 [4760249.894618] [<ffffffff813690f5>] ? page_fault+0x25/0x30 [4760249.894618] [<ffffffff810077c9>] ? xen_setup_mfn_list_list+0x2d/0x4b
[4760249.894618] [<ffffffff81007292>] ? xen_arch_post_suspend+0xd/0x97 [4760249.894618] [<ffffffff812323b4>] ? xen_post_suspend+0x9/0x14 [4760249.894618] [<ffffffff81232389>] ? xen_suspend+0x68/0x8a
[4760249.894618] [<ffffffff8108da5f>] ? stop_machine_cpu_stop+0x84/0xc0 [4760249.894618] [<ffffffff8108d9db>] ? stop_one_cpu_nowait+0x39/0x39 [4760249.894618] [<ffffffff8108d7cf>] ? cpu_stopper_thread+0xef/0x191
[4760249.894618] [<ffffffff810463a2>] ? finish_task_switch+0x53/0xc7 [4760249.894618] [<ffffffff8136761c>] ? __schedule+0x5a0/0x5cd [4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf
[4760249.894618] [<ffffffff8108d6e0>] ? res_counter_charge+0xbf/0xbf [4760249.894618] [<ffffffff8106371d>] ? kthread+0x7a/0x82 [4760249.894618] [<ffffffff81370274>] ? kernel_thread_helper+0x4/0x10
[4760249.894618] [<ffffffff8136e333>] ? int_ret_from_sys_call+0x7/0x1b [4760249.894618] [<ffffffff81368e7c>] ? retint_restore_args+0x5/0x6 [4760249.894618] [<ffffffff81370270>] ? gs_change+0x13/0x13
[4760249.894618] ---[ end trace 0fcf6cf0a1d1efdf ]--- [4760249.894618] ------------[ cut here ]------------ [4760249.894618] WARNING: at /build/buildd-linux_3.2.41-2+deb7u2~bpo60+1-amd64-mnypfK/linux-3.2.41/kernel/time/timekeeping.c:265 ktime_get+0x1e/0x88()
[4760249.894618] Modules linked in: autofs4 nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop coretemp crc32c_intel ghash_clmulni_intel evdev aesni_intel aes_x86_64 snd_pcm snd_page_alloc snd_timer snd aes_generic soundcore cryptd pcspkr ext3 jbd mbcache dm_mod xen_netfront xen_blkfront
[4760249.894618] Pid: 0, comm: swapper/0 Tainted: G D W 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2+deb7u2~bpo60+1 [4760249.894618] Call Trace: [4760249.894618] [<ffffffff8104996b>] ? warn_slowpath_common+0x78/0x8c
[4760249.894618] [<ffffffff8106b367>] ? ktime_get+0x1e/0x88 [4760249.894618] [<ffffffff81070f4c>] ? tick_nohz_stop_sched_tick+0x66/0x332 [4760249.894618] [<ffffffff8100ddaf>] ? cpu_idle+0x7c/0xef
[4760249.894618] [<ffffffff816abc48>] ? start_kernel+0x3c7/0x3d2 [4760249.894618] [<ffffffff816ad746>] ? xen_start_kernel+0x415/0x41a [4760249.894618] ---[ end trace 0fcf6cf0a1d1efe0 ]---
Matthew Baker :: Unix/Security Team Lead Infrastructure, Systems and Operations @University of Bristol Team email: it-sysops@xxxxxxxxxxxxx Tel: +44(0)117 3317467 Add: Uni of Bristol, Computer Centre, Tyndal Ave, Bristol. BS8 1UD _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |