[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Essay on an important Xen decision (long)
Hi Dan,Thanks for the thorough explaination of physical memory virtualization. It's a topic that there isn't a lot of good reference on. You seem to conclude that the only possible solutions are making the dom0 either P==M or P2M. Is it not possible to make dom0 VP? If the only issue for making dom0 VP is DMA, wouldn't it be easier to modify the Linux DMA subsystem[1] to make a special hypercall to essentially pin a VP to a particular MFN that could be used for the DMA? One could imagine the hypervisor reversing low memory specifically for DMA such that bounce buffers could be avoided too. VP makes a lot of interesting memory optimizations considerably easier (memory compacting, swapping, etc.). [1] Realizing that I know very little about the Linux DMA subsystem so I don't know if this is outside the realm of possibilities. Regards, Anthony Liguori Magenheimer, Dan (HP Labs Fort Collins) wrote: A fundamental architectural decision has to be made for Xen regarding handling of physical/machine memory; at a high level, the question is: Should Xen drivers be made more flexible to accommodate different approaches to managing physical memory, or should other architectures be required to conform to the Xen/x86 model? A more detailed description of the specific decision is below. The Xen/ia64 community would like to make this decision soon -- possibly at the Xen summit -- as next steps of Xen/ia64 functionality are significantly affected. Since either choice has an impact on common code and on future Xen architecture, this decision must involve core Xen developers and the broader Xen community rather than just Xen/ia64 developers. While this may seem to be a trivial matter, such fundamental choices often have a way of pre-selecting future design and implementation directions that can have major negative or positive impacts -- possibly unexpected -- on different parties. For example, a decision might make a Xen developers' life easier but create headaches for a distro or a Linux maintainer. If nothing else, discussing fundamental decision points often helps to bring out and codify/document hidden assumptions about the future. This is a lengthy document but I hope to touch on most of the various issues and tradeoffs. Understanding -- or, at a minimum, reading -- this document should probably be a prerequisite for involvement in discussions to resolve this. I would encourage all readers to give the issues and tradeoffs some thought as the "obvious x86" answer may not be the best answer for the future of Xen. First a little terminology and background: In a virtualized environment, the resources of the physical machine must subdivided and/or shared between multiple virtual machines. Like an OS manages memory for its applications, one of the primary roles of a hypervisor is to provide the illusion to each guest OS that it owns some amount of "RAM" in the system. Thus there are two kinds of physical memory addresses: the addresses that a guest believes to be physical addresses and the addresses that actually refer to RAM (e.g. bus addresses). The literature (and Xen) confusingly labels these as "physical" addresses and "machine" addresses. In a virtualized environment, there must be some way of maintaining the relationship -- or "mapping" -- between physical addresses and machine addresses. In Xen (across all architectures), there are currently three different approaches for mapping physical addresses to machine addresses: 1) P==M: The guest is given a subset of machine memory that it can access "directly". Accesses to machine memory addresses outside of this range must somehow be restricted (but not necessarily disallowed) by Xen. 2) guest-aware p!=m (P2M): The guest is given max_pages of contiguous physical memory starting at zero and the knowledge that physical addresses are different than machine addresses. The guest must understand the difference between a physical address and a machine address and utilize the correct one in different situations. 3) virtual physical (VP): The guest is given max_pages of contiguous physical memory starting at zero. Xen provides the illusion to the guest that this is machine memory; any physical-to-machine translation required for functional correctness is handled invisibly by Xen. VP cannot be used by guests that directly program DMA-based I/O devices because a DMA device requires a machine address and, by definition, the guest knows only about physical addresses. Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow mode") for an unprivileged guest when a migration is underway. Xen/ia64 currently uses P==M for domain0 and VP for unprivileged guests. Xen/ppc intends to use VP only. There is an architectural proposal to change Xen/ia64 so that domain0 uses P2M instead of P==M. We will call this choice P2M and the choice to stay on the current path P==M. Here's what I think are the key issues/tradeoffs: XEN CODE IMPACT Some Xen drivers, such as the blkif driver, have been "converted" to accommodate P==M. Others have not. For example, the balloon driver currently assumes domain0 is P2M and thus does not currently work on Xen/ia64 or Xen/ppc. The word "converted" is quoted because nobody is particularly satisfied with the current state of the converted drivers. Many apparently significant function calls are define'd out of existence by macros. Other code does radically different things depending on the architecture or on whether it is being executed by dom0 or an unprivileged domain. And a few ifdef's are sprinkled about. In short, what's done works but is an ugly hack. Some believe that the best way to solve this mess is for other architectures to do things more like Xen/x86. Others believe there is an advantage to defining clear abstractions and making the drivers truly more architecture-independent. P2M will require some rewriting of existing Xen/ia64 core code and the addition of significant changes to Xenlinux/ia64 code but will allow much easier porting of Xen's balloon/networking/migration drivers and also enable some simplifying changes in the Xen block driver. It is fair to guess that it will take at least several weeks/months to rewrite and debug the core and Xenlinux code to get Xen/ia64 back to where it is today, but future driver work will be much faster. Fewer differences from Xen/x86 means less maintenance work for Xen core and Xen/ia64 developers. I'd imagine also that more code will be shared between Xen/VT-i and Xen/VT-x. P==M will require Xen's balloon/networking/migration drivers to evolve to incorporate non-P2M models. This can be done, but is most likely to end up (at least in the short term) as a collection of unpalatable hacks like with the Xen block driver. However, making Xen drivers more tolerant of different approaches may be a good thing in the long run for Xen. XENLINUX IMPACT Today's operating systems are not implemented with an understanding that a physical address and a machine address might be different. Building this awareness into an OS requires non-trivial source code change. For example, Xenlinux/x86 maintains a "p2m" mapping table for quick translation and provides a "m2p" hypercall to keep Xen in sync. OS code that manipulates physical addresses must be modified to access/manage this table and make hypercalls when appropriate. Macros can hide much of the complexity but much OS/driver code exists that does not use standard macros. There is some disagreement on how extensive are the required source code changes, and how difficult it will be to maintain these changes across future versions of guest OS's. One illustrative example however: In paravirtualizing Xenlinux/ia64, seven header files are changed; it is closer to 40 for Xenlinux/x86. Related, some would assert that pushing a small number of changes into Linux (or any OS, open source or not) is far easier that pushing a large number of changes into Linux. Until all the Xen/x86 changes are in, it remains to be seen whether this is true or not. There is a reasonable concern that the broad review required for such an extensive set of changes will involve a large number of people with a large number of agendas and force a number of Xen design issues to be revisited -- at least clearly justified if not changed. This is especially true if Xen's foes have any influence in the process. Transparent paravirtualization (also called "shared binary") is the ability for the same binary to be used both as a Xen guest and natively on real hardware. Xenlinux/ia64 currently support this; indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64 binary can be used natively, and as domain0 and as an unprivileged domain. There have been proposals to do the same for Xenlinux/x86, but the degree of code changed is much much higher. There is debate about the cost/benefit of transparent paravirtualization, but the primary beneficiaries -- distros and end customers -- are not very well represented here. With P2M, it is unlikely that Xenlinux/ia64 will ever again be transparently paravirtualizable. As with Xenlinux/x86, the changes will probably be pushed into a subarch (mach-xen). Since Linux/ia64 has a more diverse set of subarch's, there may be additional work to ensure that Xen is orthogonal (and thus works with) all the subarch's. P==M would continue to allow transparent paravirtualization. This plus the reduced number of changes should make it easier to get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support gets included in Linux/x86). DRIVER DOMAINS Driver domains are "coming soon" and support of driver domains is a "must", however support for hybrid driver domains (i.e. domains that utilize both backend and frontend drivers) is open to debate. It can be assumed however that all driver domains will require DMA access. P2M should make driver domains easier to implement (once the initial Xenlinux/ia64 work is completed) and able to support a broader range of functionality. P==M may disallow hybrid driver domains and create other restrictions, though some creative person may be able to solve these. FUTURE XEN FEATURE SUPPORT None of the approaches have been "design-tested" significantly for support or compatibility with future Xen functionality such as oversubscription or machine-memory hot-plug, nor for exotic machine memory topologies such as NUMA or discontig (sparsely populated). Such functionalities and topologies are much more likely to be encountered in high-end server architectures rather than widely-available PCs and low-end servers. There is some debate as to whether the existing Xen memory architecture will easily evolve to accommodate these future changes or if more fundamental changes will be required. Architectural decisions and restrictions should be made with these uncertainties in mind. Some believe that discovery and policy for machine memory will eventually need to move out of Xen into domain0, leaving only enforcement mechanism in Xen. For example, oversubscription, NUMA or hot-plug memory support are likely to be fairly complicated and a commonly stated goal is to move unnecessary complexity out of Xen. And the plethora of recent changes in Linux/ia64 involving machine memory models indicates there are still many unknowns. P==M more easily supports a model where domain0 owns ALL of machine memory *except* a small amount reserved for and protected by Xen itself. If this is all true, Xen/x86 may eventually need to move to a dom0 P==M model, in which case it would be silly for Xen/ia64 to move to P2M and then back to P==M. Others think these features will be easy to implement in Xen and, with minor changes, entirely compatible with P2M. And that P2M is the once and future model for domain0. SUMMARY I'm sure there are more issues and tradeoffs that will come up in discussion, but let me summarize these: Move domain0 to P2M: + Fewer differences in Xen drivers between Xen/x86 and Xen/ia64 + Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i + Easier to implement remaining Xen drivers for Xen/ia64 - Major changes may require months for Xen/ia64 to regain stability - Many more changes to Xenlinux/ia64; more difficulty pushing upstream - No attempt to make Xen more resilient for future architectures Leave domain0 as P==M: + Fewer changes in Xenlinux; easier to push upstream + Making Xen more flexible is a good thing ? May provide better foundation for future features (oversubscr, NUMA) - More restrictions on driver domains - More hacks required for some Xen drivers, or - More work to better abstract and define a portable driver architecture abstract _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |