[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] libxc: introduce XC_SAVE_ID_TOOLSTACK



On 2012-01-30, at 6:16 AM, Stefano Stabellini 
<stefano.stabellini@xxxxxxxxxxxxx> wrote:

> On Fri, 27 Jan 2012, Shriram Rajagopalan wrote:
>> If there is an error in applying the toolstack state, then pagebuf_get 
>> returns -1 and
>> the rest of the code would still attempt to resume the domain, with possibly
>> inconsistent device model state.
> 
> This sounds like an unhandled error in the caller of pagebuf_get.

Nope. pagebuf_get is used only when Remus is on. 
It is called after applying the last memory checkpoint.
So when it returns an error, we simply discard whatever we buffered
and wrap up. For migration we
have a special xc_save_id_last_checkpoint
that terminates the code after 
buffering the first pagebuf.

> I think that if pagebuf_get returns an error, the error should be
> propagated and the migration should be aborted, right?
> After all I don't think we can successfully resume the domain if we fail
> to read XC_SAVE_ID_TSC_INFO, for example.
> I think we need something like this:
> 
> 
> @@ -1589,9 +1616,10 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> uint32_t dom,
> 
>     // DPRINTF("Buffered checkpoint\n");
> 
> -    if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
> +    if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) < 0 ) {
>         PERROR("error when buffering batch, finishing");
> -        goto finish;
> +        goto out;
>     }
>     memset(&tmptail, 0, sizeof(tmptail));
>     tmptail.ishvm = hvm;
> 
> 
>> Also, lets say there was no error in applying the toolstack state. If a 
>> network
>> error occurs while receiving the next XC_SAVE_ID or so, then again, the code 
>> following
>> the above snippet would attempt to resume the domain, with a memory state 
>> inconsistent
>> with the device state.
> 
> This seems wrong to me, but I am not very familiar with this protocol.
> As I wrote above, why should we continue if pagebuf_get returns an
> error?
> 
> 
>> The right place to call the callbacks->toolstack_restore would be after the 
>> finish_hvm:
>> label, where currently the old QEMU device state is restored.
> 
> Are you suggesting that we should read the toolstack data from
> pagebuf_get but only call the callback after finish_hvm?

Yep. I hope the above explanation made some sense.

> I can do that but honestly it doesn't make too much sense to me.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.