[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support. (RFC)
Wonderful!! Now we have dm-userspace and blktap, and these two seems to do the similar things. So what are the pros/cons of blktap compared to dm-userspace? Perhaps blktap will have a better performance? Did you have any benchmark to compare dm-userspace & blktap? Thanks. H On 6/20/06, Andrew Warfield <andrew.warfield@xxxxxxxxxxxx> wrote: Attached to this email is a patch containing the (new and improved) blktap Linux driver and associated userspace tools for Xen. In addition to being more flavourful, containing half the fat, and removing stains twice as well as the old driver, this stuff adds a userspace block backend and let you use raw (without loopback), qcow, and vmdk-based image files for your domUs. There's also a fun little driver that provides a shared-memory block device which, in combination with OCFS2, represents a cheap-and-cheerful fast shared filesystem between multiple domUs. This code has been (somewhat lackadaisically) developed over the past few years at Cambridge and has recently enjoyed massive improvements thanks to the considerable efforts of Julian Chesterfield. The code "works for us" and has been tested on a grand total of about three machines. We would love to have feedback from a broader audience, in terms of both trying out the tools and inspecting the code. We'll plan to release new patches at about 1-week intervals based on comments. Performance is quite good, and we intend to focus on this a bit more over the next few weeks, releasing updated patches as they are available. Bonnie results this morning are as follows (64-bit results compare against linux blkback+loopback file, Julian can follow up with loopback results for 32-bit later if anyone's interested): -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 64-bit: xen0 4096 40115 93.4 41067 12.7 22757 1.2 32532 56.7 53724 0.4 121.4 0.0 img-sp 4096 20291 86.0 38091 18.1 19939 8.2 30854 69.0 47779 4.2 95.3 0.4 loop-sp 4096 33421 77.6 33663 13.1 18546 5.1 28606 59.2 46659 6.0 85.2 0.1 32-Bit: xen0 1024 33857 94.0 45804 9.0 23269 0.0 25825 52.0 55628 0 185.0 0.0 img-sp 1448 32743 92.0 40703 8.0 23281 0.0 31139 75.0 56585 0 208.1 0.0 The patch is against cset 0426:840f33e54054 -- but is unlikely to conflict with anything recent. You'll need libaio and libaio-devel on your build machine for the tools to compile. Blktap readme follows.) Thanks! a. --- Blktap Userspace Tools + Library ================================ Andrew Warfield and Julian Chesterfield 16th June 2006 {firstname.lastname}@cl.cam.ac.uk The blktap userspace toolkit provides a user-level disk I/O interface. The blktap mechanism involves a kernel driver that acts similarly to the existing Xen/Linux blkback driver, and a set of associated user-level libraries. Using these tools, blktap allows virtual block devices presented to VMs to be implemented in userspace and to be backed by raw partitions, files, network, etc. The key benefit of blktap is that it makes it easy and fast to write arbitrary block backends, and that these user-level backends actually perform very well. Specifically: - Metadata disk formats such as Copy-on-Write, encrypted disks, sparse formats and other compression features can be easily implemented. O_DIRECT and libaio allow high-performance implementation of even sparse image formats such as QCoW, while still preserving the safe ordering of metadata and data writes to ensure data integrity. (As opposed to, for instance, both the loopback driver and LVM snaps which both have very dangerous failure cases.) - Accessing file-based images from userspace avoids problems related to flushing dirty pages which are present in the Linux loopback driver. (Specifically, doing a large number of writes to an NFS-backed image don't result in the OOM killer going berserk.) - Per-disk handler processes enable easier userspace policing of block resources, and process-granularity QoS techniques (disk scheduling and related tools) may be trivially applied to block devices. - It's very easy to take advantage of userspace facilities such as networking libraries, compression utilities, peer-to-peer file-sharing systems and so on to build more complex block backends. - Crashes are contained -- incremental development/debugging is very fast. - All block data is forwarded in a zero-copy fashion, allowing for low-overhead userspace implementations. How it works (in one paragraph): Working in conjunction with the kernel blktap driver, all disk I/O requests from VMs are passed to the userspace deamon (using a shared memory interface) through a character device. Each active disk is mappd to an individual device node, allowing per-disk processes to implement individual block devices where desired. The userspace drivers are implemented using asynchronous (Linux libaio), O_DIRECT-based calls to preserve the unbuffered, batched and asynchronous request dispatch achieved with the existing blockback code. We provide a simple, asynchronous virtual disk interface that makes it quite easy to add new disk implementations. As of June 2006 the current supported disk formats are: - Raw Images (both on partitions and in image files) - File-backed Qcow disks (sparse qcow overlay on a raw image/patrition). - Standalone sparse Qcow disks (sparse disks, not backed by a parent image). - Fast shareable RAM disk between VMs (requires some form of cluster-based filesystem support e.g. OCFS2 in the guest kernel) - Some VMDK images - your mileage may vary Raw and QCow images have asynchronous backends and so should perform fairly well. VMDK is based directly on the qemu vmdk driver, which is synchronous (a.k.a. slow). The qcow backends support existing qcow disks. There are also a set of tools to generate and convert qcow images. With these tools (and driver support), we maintain the qcow file format but adjust parameters for higher performance with Xen -- using a larger segment size (4096 instead of 512) and more coarsely allocating metadata regions. We are continuing to improve this work and expect qcow performance to improve a great deal over the newxt few weeks. Build and Installation Instructions =================================== You will need libaio >= 0.3.104 on your target system to build the tools (if you are installing RPMs, this means libaio and libaio-devel). Make to configure the blktap backend driver in your dom0 kernel. It will cooperate fine with the existing backend driver, so you can experiment with tap disks without breaking existing VM configs. To build the tools separately, "make && make install" in tools/blktap_user. Using the Tools =============== Prepare the image for booting. For qcow files use the qcow utilities installed earlier. e.g. qcow-create generates a blank standalone image or a file-backed CoW image. img2qcow takes an existing image or partition and creates a sparse, standalone qcow-based file. Start the userspace disk agent either on system boot (e.g. via an init script) or manually => 'blktapctrl' Customise the VM config file to use the 'tap' handler, followed by the driver type. e.g. for a raw image such as a file or partition: disk = ['tap:aio:<FILENAME>,sda1,w'] e.g. for a qcow image: disk = ['tap:qcow:<FILENAME>,sda1,w'] _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |