[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Questioning the Xen Design of the VMM
Petersson, Mats wrote: > > > Al Boldi wrote: > > > > I maybe missing something, but why should the Xen-design > > > > require the guest to be patched? > > The main reason to use a para-virtual kernel that it performs better > than the fully virtualized version. > > > So HVM solves the problem, but why can't this layer be implemented in > > software? > > It CAN, and has been done. You mean full virtualization using binary translation in software? My understanding was, that HVM implies full virtualization without the need for binary translation in software. > It is however, a little bit difficult to > cover some of the "strange" corner cases, as the x86 processor wasn't > really designed to handle virtualization natively [until these > extensions where added]. You mean AMDV/IntelVT extensions? If so, then these extensions don't actively participate in the act of virtualization, but rather fix some x86-arch shortcomings, that make it easier for software (i.e. Xen) to virtualize, thus circumventing the need to do binary translation. Is this a correct reading? > This is why you end up with binary translation > in VMWare for example. For example, let's say that we use the method of > "ring compression" (which is when the guest-OS is moved from Ring 0 > [full privileges] to Ring 1 [less than full privileges]), and the > hypervisor wants to have full control of interrupt flags: > > some_function: > ... > pushf // Save interrupt flag. > cli // Disable interrupts > ... > ... > ... > popf // Restore interrupt flag. > ... > > In Ring 0, all this works just fine - but of course, we don't know that > the guest-OS tried to disable interrupts, so we have to change > something. In Ring 1, the guest can't disable interrupts, so the CLI > instruction can be intercepted. Great. But pushf/popf is a valid > instruction in all four rings - it just doesn't change the interrupt > enable flag in the flags register if you're not allowed to use the > CLI/STI instructions! So, that means that interrupts are disabled > forever after [until an STI instruction gets found by chance, at least]. > > > And if the next bit of code is: > > mov someaddress, eax // someaddress is > updated by an interrupt! > $1: > cmp someaddress, eax // Check it... > jz $1 > > Then we'd very likely never get out of there, since the actual interrupt > causing someaddress to change is believed by the VMM to be disabled. > > There is no real way to make popf trap [other than supplying it with > invalid arguments in virtual 8086 mode, which isn't really a practical > thing to do here!] > > Another problem is "hidden bits" in registers. > > Let's say this: > > mov cr0, eax > mov eax, ecx > or $1, eax > mov eax, cr0 > mov $0x10, eax > mov eax, fs > mov ecx, cr0 > > mov $0xF000000, eax > mov $10000, ecx > $1: > mov $0, fs:eax > add $4, eax > dec ecx > jnz $1 > > Let's now say that we have an interrupt that the hypervisor would handle > in the loop in the above code. The hypervisor itself uses FS for some > special purpose, and thus needs to save/restore the FS register. When it > returns, the system will crash (GP fault) because the FS register limit > is 0xFFFF (64KB) and eax is greater than the limit - but the limit of FS > was set to 0xFFFFFFFF before we took the interrupt... Incorrect > behaviour like this is terribly difficult to deal with, and there really > isn't any good way to solve these issues [other than not allowing the > code to run when it does "funny" things like this - or to perform the > necessary code in "translation mode" - i.e. emulate each instruction -> > slow(ish)]. Or introduce AMDV/IntelVT extensions? > > I'm sure there can't be a performance issue, as this > > virtualization doesn't > > occur on the physical resource level, but is (should be) > > rather implemented > > as some sort of a multiplexed routing algorithm, I think :) > > I'm not entirely sure what this statement is trying to say, but as I > understand the situation, performance is entirely the reason why the Xen > paravirtual model was implemented - all other VMM's are slower [although > it's often hard to prove that, since for example Vmware have the rule > that they have to give permission before publishing benchmarks of their > product, and of course that permission would only be given in cases > where there is some benefit to them]. > > One of the obvious reasons for para-virtual being better than full > virtualization is that it can be used in a "batched" mode. Let's say we > have some code that does this: > > ... > p = malloc(2000 * 4096); > ... > > Let's then say that the guts of malloc ends up in something like this: > > map_pages_to_user(...) > { > for(v = random_virtual_address, p = start_page; p < end_page; > p++, v+=4096) > map_one_page_to_user(p, v); > } > > In full virtualization, we have no way to understand that someone is > mapping 2000 pages to the same user-process in one guest, we'd just see > writes to the page-table one page at a time. > > In the para-virtual case, we could do something like: > map_pages_to_user(...) > { > hypervisor_map_pages_to_user(current_process, start_page, > end_page, > random_virtual_address); > } > > Now, the hypervisor knows "the full story" and can map all those pages > in one go - much quicker, I would say. There's still more work than in > the native case, but it's much closer to the native case. Sure, but wouldn't this be for the price of losing guest-OS transparency? Thanks! -- Al _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |