[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Full virtualization and I/O
Hi Mats, Thanks for your explanation in such details.As you mentioned in your post, could you elaborate using unmodified driver in HVM domain (i.e. using front-end driver in full-virtualized domain)? Do you think para-virtualized domain will have exactly the same behavior as full-virtualized domain when both of them are using this unmodified driver to access virtual block devices? Best regards, Liang----- Original Message ----- From: "Petersson, Mats" <Mats.Petersson@xxxxxxx> To: "Thomas Heinz" <thomasheinz@xxxxxxx>; <xen-devel@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, November 22, 2006 9:24 AM Subject: RE: [Xen-devel] Full virtualization and I/O -----Original Message----- From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Thomas Heinz Sent: 20 November 2006 23:39 To: xen-devel@xxxxxxxxxxxxxxxxxxx Subject: [Xen-devel] Full virtualization and I/O Hi Full virtualization is about providing multiple virtual ISA level environments and mapping them to a single physical one. One particular aspect of this mapping are I/O instructions (explicit or mmapped I/O). In general, there are two strategies to partition the devices, either in time or in space. Partitioning a device in space means that the device (or a part of it) is exclusively available to a single VM. Partitioning a device in time (or time multiplexing) means that it can be used by multiple VMs but only one VM may use it at any point in time. The Xen approach is to not allow any sharing of devices, a device is owned by one domain, no other domain can directly access the device. There is a protocol of so called frontend/backend driver which is basically a dummy-device that forwards a request to another domain (normally domain 0) and the other half of the driver-pair is picking up this data, forwards it to some processing task, that then sends the packet onto the real hardware. For fully virtualized mode (hardware supported virtual machine, such as AMD-V or Intel VT, aka HVM), there is a different model, where a "device model" is involved to perform the hardware modelling. In Xen, this is using a modified version of qemu (called qemu-dm), which has a fairly complete set of "hardware" in it's model. It's got for example IDE controller, several types of network devices, graphics and mouse/keyboard models. The things you'd usually find in a PC, that is. The way it works is that the hypervisor intercepts IOIO and memory mapped IO regions that match the devices involved (such as the A0000-BFFFF region for VGA frame buffer memory or the 0x1F0-0x1F7 IO ports for the IDE controller), and forwards a request from the hypervisor to qemu-dm, where the operation changes the current state, and when it's necessary, the state-change will result in for example a read-request to the "hard-disk" (which may be a real disk, a file on a local disk, or a file on a network storage device, to give some examples). There is also the option of using the frontend drivers as described above in the fully virtualized model. Finally, while I'm on the subject of fully virtualized mode: It is currently not possible to give a DMA-based device to a fully-virtualized domain. The reason for this is that the guest OS will have been told that memory is from 0..256MB (say), and it's actual machine physical address is at 256MB..512MB. The OS is completely unaware of this "mismatch". So the OS will perform some operation to take a virtual address of some buffer (say a network packet) and make it into a "physical address", which will be an address in the range of 0..256MB. This will of course (at least) lead to the wrong data being transmitted, as the address of the actual data is somewhere in the range 256MB..512MB. The only solution to this is to have an IOMMU, which can translate the guest's understanding of a physical address (0..256MB) to a machine physical address (256..512MB). I am trying to understand how I/O virtualization on the ISA level works if a device is shared between multiple VM instances. On a very high level, it should be as follows. First of all, the VMM has to intercept the VM's I/O commands (I/O instructions or load/store to dedicated memory addresses - let's ignore interrupts for the moment). This could be done by traps or by replacing the resp. instructions by VMM calls to I/O primitives. The VMM keeps multiple device model instances (one for each VM using the device) in memory. The models somehow reflect the low level I/O API of the device. Depending on which I/O command is issued by the VM, either the memory model is changed or a number of I/O instructions are executed to make the physical device state reflect the one represented in the memory model. Do you by ISA mean "Instruction Set Architecture" or something else (I presume it's NOT meaning ISA-bus...)? Intercepting IOIO instructions or MMIO instructions is not that hard - in HVM the two processor architectures have specific intercepts and bitmaps to indicate which IO instructions should be intercepted. MMIO will require the page-tables to be set up such that the memory mapped region is mapped "not present" so that any operation to this region gives a page-fault, and then the page-fault is analyzed to see if it's for a MMIO address or for a "real page fault". For para-virtualization, the model is similar, but the exact model of how to intercept the IOIO or MMIO instruction is slightly different - but in essence it's the same principle. Let me know if you really need to know how Xen goes about doing this, as it's quite complicated (more so than the HVM version, for sure). This approach brings up a number of questions. It would be great if some of the virtualization experts here could shed some light on them (even though they are not immediately related to Xen, I know): - How do these device memory models look like? Is there a common (automata) theory behind or are they done ad hoc? Not sure what you're asking for here. Since the devices are either modeled after a REAL device (qemu-dm) and as such will resemble as closely as possible the REAL hardware device that it's emulating, or in the frontend/backend driver, there is an "idealized model", such that the request contains just the basic data that the OS provides normally to the driver, and it's placed in a queue with a message-signaling system to tell the other side that it's got something in the queue. - What kind of strategies/algorithms are used in the merge phase, i.e. the phase where the virtual memory model and the physical one are synchronized? What kind of problems can occur in this phase? The Xen approach is to avoid this by only giving one device to each machine. - Are specific usage patterns used in real world implementations (e.g. VMWare) to simplify the virtualization (model or merge phase)? This is probably the wrong list to ask detailed questions about how VMWare works... ;-) - Do you have any interesting pointers to literature dealing with full I/O virtualization? In particular, how does VMWare's full virtualization works with respect to I/O? Again, wrong list for VMWare questions. - Is every device time partitionable? If not, which requirements does it have to meet to be time partitionable? Certainly not - I would say that almost all devices are NOT time partitionable, as the state in the device is dependant on the current usage. The more complex the device is, the more likely it is to have difficulties, but even such a simple deevice as a serial port would struggle to work in a time-shared fashion (not to mention that serial ports generally are used for multiple transactions to make a whole "bigger picture transaction", so for example a web-server connected via a serial modem would send a packet of several hundred bytes to the serial port driver, which is then portioned out as and when the serial port is ready to send another few bytes. If you switch from one guest to another during this process, and the second guest also has something to send on the serial port, you'd end up with a very scrambled message from the first guest and quite likely the second guests message completely lost!). There are some devices that are specifically built to manage multiple hosts, but other than that, any sharing of a device requires some software to gather up "a full transaction" and then sending that to the actual hardware, often also waiting for the transaction to complete (for example the interrupt signal to say that the hard disk write is complete). -> I don't think every device is. What about a device which supports different modes of operation. If two VMs drive the virtual device in different modes, it may not be possible to constantly switch between them. Ok, this is pretty artificial. A particular problem is devices where you can't necessarily read back the last mode-setting, which may well be the case in many different devices. You can't, for example, read back all the registers on an IDE device, because the read of a particular address amy give the status rather than the current comamnd sent, or some such. -- Mats Thanks a lot for your help! Best wishes Thomas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |