Xen project Mailing List

RE: [Xen-API] How snapshot work on LVMoISCS SR

To: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Tue, 26 Jan 2010 14:32:13 -0800

Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, Julian Chesterfield <Julian.Chesterfield@xxxxxxxxxxxxx>, Dave.Scott@xxxxxxxxxxxxx, "xen-api@xxxxxxxxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 26 Jan 2010 14:32:08 -0800

List-id: Discussion of API issues surrounding Xen <xen-api.lists.xensource.com>

On Tue, 2010-01-26 at 14:18 -0800, Daniel Stodden wrote: > On Tue, 2010-01-26 at 17:07 -0500, Anthony Xu wrote: > > It is clear now, thanks. > > > > The other thing I'd like to do is how XCP handle disk cache inside VM > > when creating a snapshot? I saw from Xencenter seem the VM is stopped > > temporarily when creating a snapshot. > > > > Does VM flush dirty disk cache when creating snapshot? > > Depends what you mean by disk caches. All I/O performed by the backend > non-buffered, so there's presently no need to flush. As soon as a guest > I/O request is processed, it essentially goes directly to the disk. > > The snapshot is created while the VBD is paused, i.e. guest accesses > which haven't been issued to the disk are suspended. Next, request which > have been sent to the disk are waited for, up to completion. Then blktap > closes the handle to the physical disk node. > > Before resuming guest access, we then reopen the newly created snapshot > node, as the new leaf node. That means if guest linux is executing "yum install kernel" when creating snapshot, the vm created from this snapshot might be not bootable. - Anthony > > Daniel > > > How does XCP make sure this snapshot is usable,say, virtual disk > > metadata is consistent? > > > > Thanks > > - Anthony > > > > > > On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote: > > > > I still have below questions. > > > > > > > > 1. if a non-leaf node is coalesce-able, it will be coalesced later on > > > > regardless how big the physical size of this node? > > > > > > Yes: it's always good to coalesce the chain to improve access performance. > > > > > > > 2. there is one leaf node for a snapshot, actually it may be empty, does > > > > it exist only because it can prevent coalesce. > > > > > > Not quite sure what you're referring to here. The current code has a > > > limitation whereby it is unable to coalesce a leaf into its parent, so > > > after you've created one snapshot you'll always have a chain length of 2 > > > even if you delete the snapshot (if you create a second snapshot it can > > > be coalesced). > > > > > > Coalescing a leaf into its parent is on the todo list: its a little bit > > > different from the other cases because it requires synchronization if the > > > leaf is in active use. It's not a big deal from a performance point of > > > view to have the slightly longer chain length, but it will be good to get > > > this fixed for cleanliness. > > > > > > > 3. a clone will introduce a writable snapshot, it will prevent coalesce > > > > > > A clone will produce a new writeable leaf linked to the parent. It will > > > prevent the linked snapshot from being coalesced, but any other snapshots > > > above or below on the chain can still be coalesced by the garbage > > > collector if the snapshots are deleted. > > > > > > The XCP storage management stuff is pretty cool IMO... > > > > > > Ian > > > > > > > > > > > - Anthony > > > > > > > > > > > > > > > > On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote: > > > > > Hi Anthony, > > > > > > > > > > Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI SR > > > > > work > > > > > well, it provides thin > provisioning, so it is fast and disk space > > > > > efficient. > > > But I still have below concern. > > There is one > > > > > more > > > > > vhd chain when creating snapshot, if I creates 16 > snapshots, there > > > > > are 16 vhd chains, that means when one VM accesses a > disk block, it > > > > > may need to access 16 vhd lvm one by one, then get the > right block, > > > > > it makes VM access disk slow. However, it is > understandable, it is > > > > > part of snapshot IMO. > The depth and speed of access will depend > > > > > on > > > > > the write pattern to the disk. In XCP we add an optimisation called a > > > > > BATmap which stores one bit per BAT entry. This is a fast lookup > > > > > table > > > > > that is cached in memory while the VHD is open, and tells the block > > > > > device handler whether a block has been fully allocated. Once the > > > > > block is fully allocated (all logical 2MB written) the block handler > > > > > knows that it doesn't need to read or write the Bitmap that > > > > > corresponds to the data block, it can go directly to the disk offset. > > > > > Scanning through the VHD chain can therefore be very quick, i.e. the > > > > > block handler reads down the chain of BAT tables for each node until > > > > > it detects a node that is allocated with hopefully the BATmap value > > > > > set. The worst case is a random disk write workload which causes the > > > > > disk to be fragmented and partially allocated. Every read or write > > > > > will therefore potentially incur a bitmap check at every level of the > > > > > chain. > But after I delete all these 16 snapshots, there is still 16 > > > > > vhd chains, > the disk access is still slow, which is not > > > > > understandable and > reasonable, even though there may be only > > > > > several > > > > > KB difference between > each snapshot, > There is a mechanism in > > > > > XCP > > > > > called the GC coalesce thread which gets kicked asynchronously > > > > > following a VDI deletion event. It queries the VHD tree, and > > > > > determines whether there is any coalescable work to do. Coalesceable > > > > > work is defined as: > > > > > > > > > > 'a hidden child node that has no siblings' > > > > > > > > > > Hidden nodes are non-leaf nodes that reside within a chain. When the > > > > > snapshot leaf node is deleted therefore, it will leave redundant links > > > > > in the chain that can be safely coalesced. You can kick off a coalesce > > > > > by issuing an SR scan, although it should kick off automatically > > > > > within > > > > > 30 seconds of deleting the snapshot node, handled by XAPI. If you look > > > > > in the /var/log/SMlog file you'll see a lot of debug information > > > > > including tree dependencies which will tell you a) whether the GC > > > > > thread > > > > > is running, and b) whether there is coalescable work to do. Note that > > > > > deleting snapshot nodes does not always mean that there is coalescable > > > > > work to do since there may be other siblings, e.g. VDI clones. > > > > > > is there any way we can reduce depth of vhd chain after deleting > > > > > > snapshots? get VM back to normal disk performance. > > > > > > > > > > > The coalesce thread handles this, see above. > > > > > > And, I notice there are useless vhd volume exist after deleting snap > > > > > > shots, can we delete them automatically? > > > > > > > > > > > No. I do not recommend deleting VHDs manually since they are almost > > > > > certainly referenced by something else in the chain. If you delete > > > > > them > > > > > manually you will break the chain, it will become unreadable, and you > > > > > potentially lose critical data. VHD chains must be correctly coalesced > > > > > in order to maintain data integrity. > > > > > > > > > > Thanks, > > > > > Julian > > > > > > > > > > > > - Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > xen-api mailing list > > > > > > xen-api@xxxxxxxxxxxxxxxxxxx > > > > > > http://lists.xensource.com/mailman/listinfo/xen-api > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > xen-api mailing list > > > > xen-api@xxxxxxxxxxxxxxxxxxx > > > > http://lists.xensource.com/mailman/listinfo/xen-api > > > > > > _______________________________________________ > > xen-api mailing list > > xen-api@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/mailman/listinfo/xen-api > > _______________________________________________ xen-api mailing list xen-api@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.