[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] libxc: introduce XC_SAVE_ID_TOOLSTACK



On 2012-01-31, at 7:08 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:

> On Tue, 31 Jan 2012, Stefano Stabellini wrote:
>> On Mon, 30 Jan 2012, Shriram Rajagopalan wrote:
>>> On Mon, Jan 30, 2012 at 8:54 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>>>     Introduce a new save_id to save/restore toolstack specific extra
>>>     information.
>>>
>>>     Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
>>>     ---
>>>  tools/libxc/xc_domain_restore.c |   32 +++++++++++++++++++++++++++++++-
>>>  tools/libxc/xc_domain_save.c   Ã‚ |   17 +++++++++++++++++
>>>  tools/libxc/xenguest.h   Ã‚  Ã‚  Ã‚  Ã‚ |   23 ++++++++++++++++++++++-
>>>  tools/libxc/xg_save_restore.h   |   Ã‚ 1 +
>>>  tools/libxl/libxl_dom.c   Ã‚  Ã‚  Ã‚  |   Ã‚ 2 +-
>>>  tools/xcutils/xc_restore.c   Ã‚  Ã‚ |   Ã‚ 2 +-
>>>  6 files changed, 73 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
>>> index 3fda6f8..ead3df4 100644
>>> --- a/tools/libxc/xc_domain_restore.c
>>> +++ b/tools/libxc/xc_domain_restore.c
>>> @@ -45,6 +45,8 @@ struct restore_ctx {
>>>   Ã‚  int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */
>>>   Ã‚  int compressing; /* Set when sender signals that pages would be sent compressed (for Remus) */
>>>   Ã‚  struct domain_info_context dinfo;
>>> +   Ã‚ uint8_t *toolstack_data;
>>> +   Ã‚ uint32_t toolstack_data_len;
>>>  };
>>>
>>>
>>> This is still leaking speculative state. You have only moved the restore code but
>>> you are still reading the (potentially discardable) state into a global ctx structure.
>>>
>>> Take a look at the tailbuf code right below the pagebuf_get() call in xc_domain_restore().
>>> It reads the tailbuf onto a temporary buffer and then copies it onto the main one.
>>> This way, if an error occurs while receiving data, one can completely discard the current
>>> checkpoint (pagebuf & tmptailbuf) and restore from the previous one (tailbuf).
>>>
>>> You could have a similar *consistent buffer* in the xc_domain_restore function itself, and copy the toolstack
>>> stuff into it before starting to buffer a new checkpoint. Perhaps something like the snippet below?
>>>
>>> + toolstack_data = realloc(pagebuf.toolstack_data_len)
>>> + memcpy(toolstack_data, pagebuf.toolstack_data, pagebuf.toolstack_data_len);
>>> if ( ctx->last_checkpoint )
>>>
>>> Though, this would incur two reallocs (once in pagebuf_get and once more in the main loop).
>>>
>>> If there is a maximum size for this buffer, I would suggest mallocing it once and for all, and just
>>> memcpy it.
>>>
>>
>> Even though the current toolstack data is tiny, I don't want to realloc
>> a potentially big buffer twice or memcpy'ing more than necessary.
>> I don't want to add a maximum size either.
>> What if I add two new arguments to pagebuf_get_one: a pointer to the
>> buffer to be malloc'ed and the length? Like it was done with the
>> callbacks argument in the previous version of this patch?
>>
>>
>>
>>>     @@ -827,6 +829,20 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
>>>       Ã‚  Ã‚  Ã‚  }
>>>       Ã‚  Ã‚  Ã‚  return pagebuf_get_one(xch, ctx, buf, fd, dom);
>>>
>>> +   Ã‚ case XC_SAVE_ID_TOOLSTACK:
>>> +   Ã‚  Ã‚  Ã‚ {
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ RDEXACT(fd, &ctx->toolstack_data_len,
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ sizeof(ctx->toolstack_data_len));
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ ctx->toolstack_data = (uint8_t*) malloc(ctx->toolstack_data_len);
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ if ( ctx->toolstack_data == NULL )
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ {
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ PERROR("error memory allocation");
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ return -1;
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ }
>>> +   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ RDEXACT(fd, ctx->toolstack_data, ctx->toolstack_data_len);
>>>
>>>
>>> Basically this becomes,
>>>  RDEXACT(fd, &buf->toolstack_data_len,...)
>>> buf->toolstack_data = (uint8_t *)realloc(buf->toolstack_data_len, 0)
>>> RDEXACT(fd, buf->toolstack_data, buf->toolstack_data_len)
>>>
>>> And please do realloc. Since the pagebuf_get_one code is called repeatedly
>>> by the main loop, using malloc would result in loads of memory leaks.
>>>
>>
>> I presume that this buf pointer would be an argument passed to
>> pagebuf_get_one.
>> In this case, I don't suppose I need to memcpy anything anywhere, do I?
>> Later on, in finish_hvm, I'll just do:
>>
>>
>>
>>   Ã‚ if ( callbacks != NULL && callbacks->toolstack_restore != NULL )
>>   Ã‚ {
>>   Ã‚  Ã‚  Ã‚ if ( callbacks->toolstack_restore(dom,
>>   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ buf->toolstack_data, buf->toolstack_data_len,
>>   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ callbacks->data) < 0 )
>>   Ã‚  Ã‚  Ã‚ {
>>   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ PERROR("error calling toolstack_restore");
>>   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ free(buf->toolstack_data);
>>          buf->toolstack_data = NULL;
>>          buf->toolstack_data_len = 0;
>>   Ã‚  Ã‚  Ã‚  Ã‚  Ã‚ goto out;
>>   Ã‚  Ã‚  Ã‚ }
>>   Ã‚ }
>>   Ã‚ free(buf->toolstack_data);
>>  buf->toolstack_data = NULL;
>>  buf->toolstack_data_len = 0;
>
>
> Sorry about this, I configured my mailer not to send emails in UTF-8
> because they can be the cause of nasty python bugs with spaces and tabs.
> Let me write this code snippet again:
>
>   if ( callbacks != NULL && callbacks->toolstack_restore != NULL )
>   {
>       if ( callbacks->toolstack_restore(dom,
>                   buf->toolstack_data, buf->toolstack_data_len,
>                   callbacks->data) < 0 )
>       {
>           PERROR("error calling toolstack_restore");
>           free(buf->toolstack_data);
>           buf->toolstack_data = NULL;
>           buf->toolstack_data_len = 0;
>           goto out;
>       }
>   }
>   free(buf->toolstack_data);
>   buf->toolstack_data = NULL;
>   buf->toolstack_data_len = 0;

Let me rephrase the issue to make sure I am on the same page with you.
Please feel free to refute my assumptions or anything obvious that I might have
missed.

Your code as-is (first version of the patch, where restore_toolstack is called
in pagebuf_get_one) would work well for live migration only
i.e. num(checkpoints) =1
It wont work for remus (num(checkpoints) >1), since you are committing
speculative state (state from an incomplete checkpoint).

Now , I assume that the toolstack data is going to be sent in every checkpoint,
just like the qemu device model state.

If that is not the case, the restore side of the code is fine as-is (even with
previous version) but the save end needs to be modified, for it seems to
send the data with every checkpoint.  With that mod, things would work fine
for both live migration & Remus.


Assuming that the toolstack data is sent repeatedly with every checkpoint,
there are a few things to note:

* Do not commit state (like xenstore write) in pagebuf_get_one
because state received there (ie last half buffered checkpoint) could be discarded.
-- this was the bug in the first version (calling toolstack_restore in pagebuf)

* Since pagebuf_get_one gets called for each checkpoint, a malloc in pagebuf_get_one
needs a free before the next call. Alternative: use realloc (e.g. buf->pages)
and free in pagebuf_free().

-- this is the bug in your current version of the patch. I see a malloc in pagebuf_get_one function, but
   no corresponding free. You are freeing in finish_hvm (after the Nth checkpoint), but have
  created N-1 memory leaks.

* Do not touch anything from *current_version* of pagebuf after finish:, since it has been marked incomplete.
  Moving the toolstack_data pointer from a discardable pagebuf to restore_ctx, is not going to solve
  the issue as you are still (indirectly) accessing the data from current_version of pagebuf.

 State received in pagebuf_get is applied in xc_domain_restore, only in the following code
 region (simplified):

//We have a full checkpoint[i] and a corresponding tailbuf.
copypages:
  apply checkpoint[i]

if (pagebuf_get(checkpoint[i+1]) || tailbuf_get(tmptailbuf)) {
  printf("Error receiving last checkpoint");
  //So, "discard" it and resume domain, with memory checkpoint[i],
  // and also apply tailbuf
}

memcpy(tailbuf, tmptailbuf);

/* At this point, I have a full checkpoint[i+1] and a tailbuf that is consistent with
  checkpoint i+1.
*/
goto copypages;

finish:
  if (hvm) goto finish_hvm;
  ... pv stuff
  finish_hvm:
    Apply tailbuf (that corresponds to checkpoint[i])
   ...



But if you wish to receive dynamic state (e.g. device model, toolstack, etc) in every checkpoint
that should be applied *only once*  in the end, you could use the tailbuf (as done by the
qemu device model currently).
Or if you have to receive them in the pagebuf, then have two pieces of the same state,
 consistent_state, curr_state, such that you do

 pagebuf_get(curr_state)
 tailbuf_get(..)..
 if (no error so far), then
  copy curr_state to consistent_state.
  goto copypages;

finish:
 ..
  finish_hvm:
     apply consistent_state.

IMHO, it looks cleaner to push the toolstack state into buffer_tail (send/receive in tailbuf), so
that you can continue to do malloc in buffer_tail and a corresponding free in tailbuf_free.

cheers
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.