[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation

To: Chris Wright <chrisw@xxxxxxxxxxxx>
From: Zachary Amsden <zach@xxxxxxxxxx>
Date: Tue, 14 Mar 2006 14:29:17 -0800
Cc: Andrew Morton <akpm@xxxxxxxx>, Joshua LeVasseur <jtl@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Pratap Subrahmanyam <pratap@xxxxxxxxxx>, Wim Coekaerts <wim.coekaerts@xxxxxxxxxx>, Jack Lo <jlo@xxxxxxxxxx>, Dan Hecht <dhecht@xxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxxxx>, Christopher Li <chrisl@xxxxxxxxxx>, Virtualization Mailing List <virtualization@xxxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxx>, Anne Holler <anne@xxxxxxxxxx>, Jyothy Reddy <jreddy@xxxxxxxxxx>, Kip Macy <kmacy@xxxxxxxxxxx>, Ky Srinivasan <ksrinivasan@xxxxxxxxxx>, Leendert van Doorn <leendert@xxxxxxxxxxxxxx>, Dan Arai <arai@xxxxxxxxxx>
Delivery-date: Wed, 15 Mar 2006 10:18:22 +0000
List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Chris Wright wrote:

Yup.  Just noting that API without clear users is the type of thing that
is regularly rejected from Linux.

Yes. It is becoming clear from feedback from you and Andi that thereare things in the API that are unnecessary for Linux. But keep in mind,they may be necessary for other operating systems. I think we shouldprobably drop the Linux changes to issue things like RDTSC and such viaVMI call wrappers. It does simplify the Linux interface.

But I still think they should be part of the spec - an optional part ofthe spec, that need not be implemented by Linux or even by thehypervisor. If some vendor or kernel combination finds that they are aperformance concern, as they readily could become, they can drop in thefunctionality when and if they need it. No reason to complicate thingson either end, but also no reason to purposely add asymmetry to the specjust because the current set of calls is sufficient for the currentlyknown fast paths.

Many of these will look the same on x86-64, but the API is not
64-bit clean so has to be duplicated.
Yes, register pressure forces the PAE API to be slightly different fromthe long mode API. But long mode has different register callingconventions anyway, so it is not a big deal. The important thing is,once the MMU mess is sorted out, the same interface can be used from Ccode for both platforms, and the details about which lock primitives areused can be hidden. The cost of which lock primitives to use differs on32-bit and 64-bit platforms, across vendor, and the style of thehypervisor implementation (direct / writable / shadowed page tables).
My mistake, it makes perfect sense from ABI point of view.
Is this the batching, multi-call analog?
Yes. This interface needs to be documented in a much better fashion.But the idea is that VMI calls are mapped into Xen multicalls byallowing deferred completion of certain classes of operations. Thatsame mode of deferred operation is used to batch PTE updates in ourimplementation (although Xen uses writable page tables now, this used toprovide the same support facility in Xen as well). To complement this,there is an explicit flush - and it turns out this maps very nicely,getting rid of a lot of the XenoLinux changes around mmu_context.h.
Are these valid differences?  Or did I misunderstand the batching
mechanism?

1) can't use stack based args, so have to allocate each data structure,
which could conceivably fail unless it's some fixed buffer.

We use a fixed buffer that is private to our VMI layer. It's a per-cpupacking struct for hypercalls. Dynamically allocating from the kernelinside the interface layer is a really great way to get into a whole lotof trouble.

2) complicates the rom implementation slightly where implementation of
each deferrable part of the API needs to have switch (am I deferred or
not) to then build the batch, or make direct hypercall.

This is an overhead that is easily absorbed by the gain. The directhypercalls are mostly either always direct, or always queued. The pagetable updates already have conditional logic to do the right thing, andXen doesn't require the queueing of these anymore anyways. And theflush happens at an explicit point. The best approach can still be finetuned. You could have separate VMI calls for queued vs. non-queuedoperation. But that greatly bloats the interface and doesn't make sensefor everything. I believe the best solution is to annotate this in theVMI call itself. Consider the VMI call number, not as an integer, butas an identifier tuple. Perhaps I'm going overboard here. Perhaps not.


31--------24-23---------16-15--------8-7-----------0
| family   | call number | reserved  | annotation |
---------------------------------------------------

Now, you have multiple families of calls -

0x00 legacy
0x01 CPU
0x02 Segmentation
0x03 MMU
0xFF reserved for experimentation

And each family has children:

0x03 MMU:
  0x00  SetPTE
  0x01  SetLongPTE
  0x02  FlushTLB

Now, lets say I add a new feature, and I don't want to redefine part ofthe interface. Lets say that feature is queuing of hypercalls. I havethis private, annotation field as part of the identifier for eachhypercall - in effect, really just the hypercall number.

And I don't want to break binary compatibility of the interface. Sowhat I do is I define a new annotation that is specific to the affectedcalls.

  0x00  SetPTE
       0x00 - no annotation
       0x01 - may be queued !

Now, the hypercall isn't any different. Hypervisors which are unware ofthe annotation treat it no differently. But hypervisors that supportPTE queuing recognize it as a hint and use it appropriately.

Queuing is a common enough optimization that it might even make sense tohave a bit set aside in the call ID for it. Having this type of staticannotation allows you to get rid of the dynamic concerns you have.

The really nice thing about defining your interface this way is you havea hierarchy of different classes of the interface, with the ability toadd new classes, new calls within a class, and new annotations(upgrades, if you will) or those calls.And it provides a natural way to query for supported families of support- do you support a virtual event channel? Should I do some extra workto give you MMU hints or not? And you can add extra, optionalfunctionality on to existing call sites. Something vary useful, if yousay, realize that you want to add a hint field to one of your callswithout breaking the old interface or forcing another vendor intocomplicating the ir hypervisor. Which is most of whatparavirtualization is anyway. Extra, optionally used hints about howthings are being used that allow the hypervisor implementation to avoidmaking costly assumptions to ensure correctness under unknown constraints.

Is this worth threshing out more? I think so, since it does provide anice value proposition as well as overcoming the rather clumsy top levelversioning scheme.


Thanks again for your feedback,

Zach

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Follow-Ups:
- [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
  - From: Chris Wright

References:
- [Xen-devel] [RFC, PATCH 1/24] i386 Vmi documentation
  - From: Zachary Amsden
- [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
  - From: Chris Wright
- [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
  - From: Zachary Amsden
- [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
  - From: Chris Wright

Prev by Date: [Xen-devel] Re: [RFC, PATCH 7/24] i386 Vmi memory hole
Next by Date: [Xen-devel] Re: [RFC, PATCH 7/24] i386 Vmi memory hole
Previous by thread: [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
Next by thread: [Xen-devel] Re: [RFC, PATCH 1/24] i386 Vmi documentation
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.