[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Re: blktap: Sync with XCP, dropping zero-copy.
On Tue, 16 Nov 2010, Dave Scott wrote: > Hi, > > Re: XCP's use of blktap2: > > > On Mon, 2010-11-15 at 13:27 -0500, Jeremy Fitzhardinge wrote: > > > On 11/12/2010 07:55 PM, Daniel Stodden wrote: > > > > The second issue I see is the XCP side of things. XenServer got a > > lot of > > > > benefit out of blktap2, and particularly because of the tapdevs. It > > > > promotes a fairly rigorous split between a blkback VBD, controlled > > by > > > > the agent, and tapdevs, controlled by XS's storage manager. > > > > > > > > That doesn't prevent blkback to go into userspace, but it better > > won't > > > > share a process with some libblktap, which in turn would better not > > be > > > > controlled under the same xenstore path. > > > > > > > > > Could you elaborate on this? What was the benefit? > > > > It's been mainly a matter of who controls what. Blktap1 was basically a > > VBD, controlled by the agent. Blktap2 is a VDI represented as a block > > device. Leaving management of that to XCP's storage manager, which just > > hands that device node over to Xapi simplified many things. Before, the > > agent had to understand a lot about the type of storage, then talk to > > the right backend accordingly. Worse, in order to have storage > > management control a couple datapath features, you'd basically have to > > talk to Xapi, which would talk though xenstore to blktap, which was a > > bit tedious. :) > > As Daniel says, XCP currently separates domain management (setting up, > rebooting VMs) from storage management (attaching disks, snapshot, coalesce). > In the current design the storage layer handles the storage control-path > (instigating snapshots, clones, coalesce, dedup in future) through a storage > API ("SMAPI") and provides a uniform interface to qemu, blkback for the > data-path (currently in the form of a dom0 block device). In a VM start, xapi > will first ask the storage control-path to make a disk available, and then > pass this information to blkback/qemu. > > One of the trickiest things XCP handles is vhd "coalesce": merging a vhd file > into its "parent". This comes up because vhds are arranged in a tree > structure where the leaves are separate independent VM disks and the nodes > represent shared common blocks, the result of (eg) cloning a single VM lots > of times. When guest disks are deleted and the vhd leaves are removed, it > sometimes becomes possible to save space by merging nodes together. The > tricky bit is doing this while I/O is still being performed in parallel > against logically separate (but related by parentage/history) disks on > different hosts. It's necessary for the thing doing the coalescing to know > where all the I/O is going on (eg to be able to find the host and pid where > the related tapdisks (or qemus) live) and it's necessary for it to be able to > signal to these processes when they need to re-read the vhd tree metadata. > > In the bad old blktap1 days, the storage control-path didn't know enough > about the data-path to reliably signal the active tapdisks: IIRC the tapdisks > were spawned by blktapctrl as a side-effect of the domain manager writing to > xenstore. In the much better blktap2 days :) the storage control-path sets up > (registers?) the data-path (currently via tap-ctl and a dom0 block device) > and so it knows who to talk to in order to co-ordinate a coalesce. > > So I think the critical thing is to be able to have the storage control-path > able to do something to "register" a data-path, enabling it to find later and > signal any processes using that data-path. There are a bunch of different > possibilities the storage control-path could use instead of using tap-ctl to > create a block device, including: > Qemu could be spawned directly (even before the VM) and QMP could be use to communicate with it. The qemu pid and/or the socket to issue QMP commands could be used as identifiers. > > I'm sure there are lots of possibilities :-) Indeed. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |