[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 00/21] libxl: domain save/restore: run in a separate process



On Wed, Jun 27, 2012 at 12:09 PM, Shriram Rajagopalan <rshriram@xxxxxxxxx> wrote:
On Wed, Jun 27, 2012 at 11:59 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Ian Jackson writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process"):
> However, when I apply my series, I can indeed produce an assertion
> failure:
...
> So I have indeed made matters worse.

I found two bugs:

1. The void* passed to the callback was being treated as a
libxl__domain_suspend_state* by the remus callbacks; this is a
holdover from a much earlier version of the series.  It should be
converted to a libxl__save_helper_state and then the dss extracted
with CONTAINER_OF.

2. The way remus works means that the toolstack save callback is
invoked more than once, which the helper's implementation was not
prepared to deal with.  Fix this by moving the rewind of the fd into
the helper.

Fixes for these are below.  With this, on top of my series, seem to I
get the same behaviour as with the baseline.  Would you like to try it ?


Sure, I ll give it a shot.
Btw, my earlier mail was in response to remus not
working on the baseline setup on your dev environment.


The fix works for 2 out of 3 cases
 blackhole replication (xl remus -b)
 localhost replication with failover i.e. destroy primary (xl remus domU localhost)

However, it crashes the guest for localhost replication, when I destroy the backup
i.e. xl destroy domU--incoming . The primary guest would generally resume, but in this
case its in --sc- state.
NB: This seems to happen in baseline xen-unstable too!. 

xc: error: unexpected PFN mapping failure pfn 180e map_mfn 43b808 p2m_mfn 43b808: Internal error
libxl: error: libxl_create.c:760:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:844:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl.c:1220:libxl_domain_destroy: non-existant domain 17
libxl: error: libxl_create.c:995:domcreate_complete: unable to destroy domain 17 following failed creation
migration target: Domain creation failed (code -3).
..
Total Data Sent= 12.597 MB
libxl: debug: libxl_dom.c:801:libxl__domain_suspend_common_callback: issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:805:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:852:libxl__domain_suspend_common_callback: guest acknowledged suspend request
libxl: debug: libxl_dom.c:856:libxl__domain_suspend_common_callback: wait for the guest to suspend
libxl: debug: libxl_dom.c:870:libxl__domain_suspend_common_callback: guest has suspended
pagetables=2,cache_misses=0,emptypages=45
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 16 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 16 save/restore helper [3148] died due to fatal signal Broken pipe
libxl: debug: libxl_event.c:1434:libxl__ao_complete: ao 0x1b08c80: complete, rc=-3
libxl: debug: libxl_event.c:1406:libxl__ao__destroy: ao 0x1b08c80: destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
xc: debug: hypercall buffer: total allocations:2116 total releases:2116
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:1729 misses:2 toobig:385


 
 
Thanks,
Ian.

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index abc5932..069aca1 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -984,7 +984,8 @@ static int libxl__remus_domain_suspend_callback(void *data)

 static int libxl__remus_domain_resume_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
    STATE_AO_GC(dss->ao);

    /* Resumes the domain and the device model */
@@ -1002,7 +1003,8 @@ static void remus_checkpoint_dm_saved(libxl__egc *egc,

 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
-    libxl__domain_suspend_state *dss = data;
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
    libxl__egc *egc = dss->shs.egc;
    STATE_AO_GC(dss->ao);

diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 6332beb..078b7ee 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -105,13 +105,6 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_suspend_state *dss,
                                toolstack_data_buf, toolstack_data_len,
                                "toolstack data tmpfile", 0);
        if (r) { rc = ERROR_FAIL; goto out; }
-
-        r = lseek(toolstack_data_fd, 0, SEEK_SET);
-        if (r) {
-            LOGE(ERROR, "rewind toolstack data tmpfile");
-            rc = ERROR_FAIL;
-            goto out;
-        }
    }

    const unsigned long argnums[] = {
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 3bdfa28..772251a 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -171,12 +171,14 @@ static int toolstack_save_cb(uint32_t domid, uint8_t **buf,
 {
    assert(toolstack_save_fd > 0);

+    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
+    if (r) fail(errno,"rewind toolstack data tmpfile");
+
    *buf = xmalloc(toolstack_save_len);
-    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
+    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
    if (r<0) fail(errno,"read toolstack data");
    if (r==0) fail(0,"read toolstack data eof");

-    toolstack_save_fd = -1;
    *len = toolstack_save_len;
    return 0;
 }



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.