[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC V7 2/3] libxl domain snapshot API design




>>> On 10/21/2014 at 12:11 AM, in message 
>>> <1413821501.29506.13.camel@xxxxxxxxxx>,
Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: 
> On Fri, 2014-10-10 at 16:48 +0800, Chunyan Liu wrote: 
>  
> > int libxl_domain_snapshot_create(libxl_ctx *ctx, int domid, 
> >                                  libxl_domain_snapshot_args *snapshot, 
> >                                  bool live) 
> >  
> >     Creates a new snapshot of a domain based on the snapshot config  
> contained 
> >     in @snapshot. Save domain and do disk snapshot. 
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT):  domain id 
> >     snapshot (INPUT): configuration of domain snapshot 
> >     live (INPUT):   live snapshot or not 
> >     Returns: 0 on success, -1 on failure 
> >  
> >     ctx: 
> >        context. 
> >  
> >     domid: 
> >        If domain is active, this is the domid of the domain. 
> >        If domain is inactive, set domid=-1. Only disk-only snapshot can be 
> >        done. libxl_domain_snapshot_args:memory should be 'false'. 
>  
> I think we discussed last time that if the domain is inactive then libxl 
> doesn't know anything about it and cannot be expected to snapshot it. In 
> this case I think the toolstack's (e.g. libvirt's) storage management is 
> responsible for taking a disk snapshot, libxl is not involved. 

OK. To be simple,  we won't support disk-only snapshot in libxl and xl.

xl domain is always active domain (started one), disk-only snapshot
couldn't keep data  consistent, won't allow that.
Let libvirt call qemu-img to do disk-only snapshot.

>  
> >     live: 
> >        true or false. 
> >        when live is 'true', domain is not paused while creating the  
> snapshot, 
> >        like live migration. This increases size of the memory dump file,  
> but 
> >        reducess downtime of the guest. 
>  
> >  Only support this flag during external checkpoints. 
>  
> Why? 
>  
> Even if valid for the planned implementation I don't think it belongs in 
> this sort of high level design. There should be an error value 
> indicating that a live checkpoint is not possible, which is the right 
> place to encode this behaviour. 
>  
> >     snapshot: 
> >        memory: 
> >            true or false. 
> >            'false' means disk-only, won't save memory state. 
> >            'true' means saving memory state. Memory would be saved in 
> >            'memory_path'. 

Since we decided to not support disk-only snapshot in libxl, this 'memory'
parameter is not needed. It's always 'true'.

> >        memory_path: 
> >            path to save memory file. NULL when 'memory' is false.
> >        num_disks: 
> >            number of disks that need to take disk snapshot. 
> >        disks: 
> >            array of disk snapshot configuration. Has num_disks members. 
> >            libxl_device_disk: 
> >                structure to represent which disk. 
> >            name: 
> >                snapshot name. 
>  
> How is this used? Does it get stored somewhere by libxl?

To do internal disk snapshot, that snapshot name will be stored on disk.
Libxl won't store anything after API.

>  
> >            external: 
> >                true or flase. 
> >                'false' means internal disk snapshot. external_format and 
> >                external_path will be ignored.
> >                'true' means external disk snapshot, then external_format  
> and 
> >                external_path should be provided. 
> >           external_format: 
> >               should be provided when 'external' is true. If not provided,  
> will 
> >               use default 'qcow2'. 
>  
> I think this should say: will use a default appropriate to the disk 
> backend and format of the underlying disk image in use.

Yes, it's a better description in high level design. But in implementation,
referring to libvirt qemu driver code, it's actually uses 'qcow2'. An
external snapshot is trying to treat the original disk image file as
backing file and create a new qcow2 file. Of course we can do in
different ways.
 
>  
> >               ignored when 'external' is false. 
> >           external_path: 
> >               must be provided when 'external' is true. 
> >               ignored when 'external' is false. 
> >  
> >  
> > int libxl_domain_snapshot_delete(libxl_ctx *ctx, int domid, 
> >                                  libxl_domain_snapshot_args *snapshot); 
> >  
> >     Delete a snapshot. 
> >     This will delete the related domain and related disk snapshots. 
>  
> I think last time we agreed that this operation could not "delete the 
> related domain" because it mustn't be active, and therefore libxl 

Sorry, here I missed some words.  I mean delete the related domain
memory state and related disk snapshots.

> doesn't know about it and that the management of the snapshot storage 
> was a matter for the toolstack's storage management layer, not libxl. 
>  
> I think we ended up proposing a scheme where there was an API which the 
> toolstack could use to tell libxl that a snapshot in an active domain's 
> snapshot chain was to be changed/has changed, so that it could rescan 
> and make any necessary adjustments. 
>  
> I think this is what we were discussing here: 
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg01541.html 
>  
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT): domain id 
> >     snapshot (INPUT): domain snapshot related info 
> >     Returns: 0 on success, -1 on error. 
> >  
> >     About each input, explanation is the same as  
> libxl_domain_snapshot_create. 
> >  
> > int libxl_domain_snapshot_revert(libxl_ctx *ctx, int domid, 
> >                                libxl_domain_snapshot_args *snapshot); 
> >  
> >     Revert the domain to a given snapshot. 
> >  
> >     Normally, the domain will revert to the same state the domain was in  
> while 
> >     the snapshot was taken (whether inactive, running, or paused). 
>  
> I don't think inactive makes sense in this interface, there should be no 
> way to create a libxl snapshot of an inactive domain, therefore any 
> reversion to that state will not involve libxl. 
>  
> Is this operation any different to destroying the domain and using 
> libxl_domain_restore to start a new domain based on the snapshot? Is 
> this operation just a convenience layer over that operation? 
>  
> >  
> >     ctx (INPUT): context 
> >     domid (INPUT): domain id 
> >     snapshot (INPUT): snapshot 
> >     Returns: 0 on success, -1 on error. 
> >  
> >     About each input, explanation is the same as  
> libxl_domain_snapshot_create. 
> >  
> > 3. Function Implementation 
> >  
> >    libxl_domain_snapshot_create: 
> >        1). check args validation 
> >        2). if it is not disk-only, save domain memory through save-domain 
> >        3). take disk snapshot by qmp command (if domian is active) or  
> qemu-img 
> >            command (if domain is inactive). 
> >  
> >    libxl_domain_snapshot_delete: 
> >        1). check args validation 
> >        2). remove memory state file if it's not disk-only. 
> >        3). delete disk snapshot. (for internal disk snapshot, through qmp 
> >            command or qemu-img command) 
> >  
> >    libxl_domain_snapshot_revert: 
> >        This may need to hack current libxl code. Could be (?): 
> >        1). pause domain 
> >        2). reload memory 
> >        3). apply disk snapshot. 
> >        4). restore domain config file 
> >        5). resume. 
>  
>  
>  
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.