[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
Shriram Rajagopalan writes ("Re: [PATCH v5 00/21] libxl: domain save/restore:
run in a separate process"):
> The code segfaults. Here are the system details and error traces from gdb.
> My setup:
> dom0 : ubuntu 64bit, 2.6.32-39 (pvops kernel),
> running latest xen-4.2-unstable (built from your repo)
> tools stack also built from your repo (which I hope has all the
> latest patches).
> domU: ubuntu 32bit PV, xenolinux kernel (22.214.171.124 - novel suse version)
> with suspend event channel support
> As a sanity check, I tested xl remus with latest tip from xen-unstable
> mercurial repo, c/s: 25496:e08cf97e76f0
> Blackhole replication (to /dev/null) and localhost replication worked as
> and the guest recovered properly without any issues.
Thanks for the test runes. That didn't work entirely properly for
me, even with the xen-unstable baseline.
I did this
xl -vvvv remus -b -i 100 debian.guest.osstest dummy >remus.log 2>&1 &
The result was that the guest's networking broke. The guest shows up
in xl list as
debian.guest.osstest 7 512 1 ---ss- 5.2
and is still responsive on its pv console. After I killed the remus
process, the guest's networking was still broken.
At the start, the guest prints this on its console:
[ 36.017241] WARNING: g.e. still in use!
[ 36.021056] WARNING: g.e. still in use!
[ 36.024740] WARNING: g.e. still in use!
[ 36.024763] WARNING: g.e. still in use!
If I try the rune with "localhost" I would have expected, surely, to
see a domain with the incoming migration ? But I don't. I tried
killing the `xl remus' process and the guest became wedged.
However, when I apply my series, I can indeed produce an assertion
xc: detail: All memory is saved
xc: error: Could not get domain info (3 = No such process): Internal error
libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for
domain 3077579968: No such process
xl: libxl_event.c:1426: libxl__ao_inprogress_gc: Assertion `ao->magic ==
So I have indeed made matters worse.
> Blackhole replication:
> xl error:
> xc: error: Could not get domain info (3 = No such process): Internal error
> libxl: error: libxl.c:388:libxl_domain_resume: xc_domain_resume failed for
> domain 4154075147<tel:4154075147>: No such process
> libxl: error: libxl_dom.c:1184:libxl__domain_save_device_model: unable to
> open qemu save file ?8b: No such file or directory
I don't see that at all.
NB that PV guests may have a qemu for certain disk backends, or
consoles, depending on the configuration. Can you show me your domain
config ? Mine is below.
> I also ran xl in GDB to get a stack trace and hopefully some useful debug
> gdb traces: http://pastebin.com/7zFwFjW4
I get a different crash - see above.
> Localhost replication: Partial success, but xl still segfaults
> dmesg shows
> [ 1399.254849] xl: segfault at 0 ip 00007f979483a417 sp
> 00007fffe06043e0 error 6 in libxenlight.so.2.0.0[7f9794807000+4d000]
I see exactly the same thing with `localhost' instead of `dummy'. And
I see no incoming domain.
I will investigate the crash I see. In the meantime can you try to
help me see why it doesn't work me even with the baseline ?
# Configuration file for the Xen instance debian.guest.osstest, created
# by xen-tools 4.2 on Thu Apr 5 16:43:43 2012.
# Kernel + memory size
#kernel = '/boot/vmlinuz-126.96.36.199'
#ramdisk = '/boot/initrd.img-188.8.131.52'
#bootloader = 'pygrub'
bootloader = '/root/strace-pygrub'
memory = '512'
# Disk device(s).
root = '/dev/xvda2 ro'
disk = [
# Physical volumes
name = 'debian.guest.osstest'
#dhcp = 'dhcp'
vif = [ 'mac=5a:36:0e:26:00:01' ]
on_poweroff = 'destroy'
on_reboot = 'restart'
vcpus = 1
Xen-devel mailing list