[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PATCH: Enable QEMU booting of blktap disks
> In the other thread that's currently going on this topic, it sounds > like others are quite successfully using the phantom code. Why is it > broken for you? I really can't see how it works for anybody in 3.1.0 since the code which sets up phantom devices simply doesn't work Well let's fix it then. ;) > As I've said before, I dislike the idea of having separate > implementations of disks -- one in qemu and one in tapdisk. We'd > quite like to encourage people to be able to extend virtual block > devices in the future, and it seems like your approach is going to > force them to do two independent implementations of things. It also > leads to complications if you want to add things like caching, shared > ramdisks, etc. If phantom is broken, why don't we just fix that? AFAICT with or without my change you need to have two separate impls of every disk format, since the phantom device stuff is only ever used by blktap - non blktap disks still get processed directly by QEMU. My concern is that it's possible to run the VM with it only having to depend on a single implementation of a virtual disk. If you don't use PV drivers, the qemu block drivers do this nicely. If you do, the phantom code lets you do this by ensuring that emulated block requests are redirected to tapdisk (in an admittedly ineffecient, but it doesn't really matter for the length of time that it happens, way) until the pv drivers come up. IMHO the entire design & impl of blktap userspace was broken from the start because it is duplicating functionality already in the QEMU codebase. Blktap was written before there were device emulated guests and before qemu was capable of processing more than a single outstanding block request at a time. So the only functionality that it duplicated was to use e.g. the vmdk and qcow code as a basis for some of the image file implementations. Vmdk is largely unchanged and I don't know of anyone who actively uses it, qcow evolved considerably in order to do asynchronous access and batched request processing. With the benefit of hindsight, I would suggest that it would be better to have QEMU able to speak the native blktap protocol straight to the blktap kernel driver. Keep HVM using QEMU for all file backed disks, since it already handles all the formats just fine, and have a new machine type in QEMU for paravirt VMs which provided the tap daemon replacement and also a PVFB daemon replacement. The you could kill the entire blktap userspace codebase & most of the PVFB userspace codebase and the libvncserver requirement. I think a patch that pulled a lot of the tapdisk processing into qemu would be a very interesting thing to compare overheads for against the current model. So there'd only be 1 single daemon in Dom0 per VM, it would be the same daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, QEMU, VirtualBox) would all be reaping the benefit of each other's code improvements to QEMU driver model, in particular for disk format code & VNC server code, rather than forking & reimplementing private copies. Of course this isn't a quick job, but if the motiviation is reducing code duplication & alternative I/O paths, the focusing on QEMU for everything seems like a much more viable idea than more Xen specific code. Absolutely. Dan, I completely agree that it would be very good to have a unified way to implement virtual block devices -- image formats, interposition, and otherwise. I think that the qemu and blktap disk interfaces both shared this as an initial design goal. I agree it's a lot of work and I agree that it would be a very nice thing -- in the same spirit as Rusty's virtio efforts -- to be able to share these implementations across hypervisors/emulators/etc. I also know of some grad students who would be very happy to see virtual block devices that they are building for blktap apply against everything else. The thing is is that doing everything in qemu doesn't currently achieve this -- because PV drivers can't talk directly to qemu and going through the emulated path results in suckful performance. So rather than taking a patch that means PV-based HVM domains have to depend on multiple implementations of disks, I'd much prefer to see us go in the direction of what you propose. a. On 7/19/07, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: On Thu, Jul 19, 2007 at 10:34:12AM -0700, Andrew Warfield wrote: > So two comments on this: > > In the other thread that's currently going on this topic, it sounds > like others are quite successfully using the phantom code. Why is it > broken for you? I really can't see how it works for anybody in 3.1.0 since the code which sets up phantom devices simply doesn't work try: imagetype = self.vm.info['image']['type'] except: imagetype = "" if imagetype == 'hvm': The body of that try: statement is trying to read hash keys which don't exist, since 'vm.info' isn't a hash. So imagetype is always "" and so none of the phantom setup code ever gets run. Even once fixing that I never get any devices appearing and the Vm just immediately shuts down. It seems to be looking for the /dev/xvd* device nodes in Dom0 rather than DomU which seems rather wrong. > As I've said before, I dislike the idea of having separate > implementations of disks -- one in qemu and one in tapdisk. We'd > quite like to encourage people to be able to extend virtual block > devices in the future, and it seems like your approach is going to > force them to do two independent implementations of things. It also > leads to complications if you want to add things like caching, shared > ramdisks, etc. If phantom is broken, why don't we just fix that? AFAICT with or without my change you need to have two separate impls of every disk format, since the phantom device stuff is only ever used by blktap - non blktap disks still get processed directly by QEMU. Now if we intend to remove all support for file: entirely, and make blktap compulsory for file backed VMs then I can see the benefit in having everything go via one codepath. Though now having 2 userspace daemons in Dom0 per HVM guest seems like its going in wrong direction to me. IMHO the entire design & impl of blktap userspace was broken from the start because it is duplicating functionality already in the QEMU codebase. With the benefit of hindsight, I would suggest that it would be better to have QEMU able to speak the native blktap protocol straight to the blktap kernel driver. Keep HVM using QEMU for all file backed disks, since it already handles all the formats just fine, and have a new machine type in QEMU for paravirt VMs which provided the tap daemon replacement and also a PVFB daemon replacement. The you could kill the entire blktap userspace codebase & most of the PVFB userspace codebase and the libvncserver requirement. So there'd only be 1 single daemon in Dom0 per VM, it would be the same daemon for PV and HVM, and all the open source virt platforms (Xen, KVM, QEMU, VirtualBox) would all be reaping the benefit of each other's code improvements to QEMU driver model, in particular for disk format code & VNC server code, rather than forking & reimplementing private copies. Of course this isn't a quick job, but if the motiviation is reducing code duplication & alternative I/O paths, the focusing on QEMU for everything seems like a much more viable idea than more Xen specific code. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |