[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Request for input: Extended event channel support

On Wed, Mar 27, 2013 at 11:23:23AM +0000, George Dunlap wrote:
> * Executive summary
> The number of event channels available for dom0 is currently one of
> the biggest limitations on scaling up the number of VMs which can be
> created on a single system.  There are two alternative implementations
> we could choose, one of which is ready now, the other of which is
> potentially technically superior, but will not be ready for the 4.3
> release.
> The core question we need to ask the community: How important is
> lifting the event channel scalability limit to 4.3?  Will waiting
> until 4.4 cause a limit in the uptake of the Xen platform?
> * The issue
> The existing event channel implementation for PV guests is implemented
> as 2-level bit array.  This limits the total number of event channels
> to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit
> guests.
> This sounds like a lot, until you consider that in a typical system,
> each VM needs 4 or more event channels in domain 0.  This means that
> for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and
> in practice it's more like 180 or so, because of event channels
> required for other things.  XenServer already has customers using VDI
> that require more VMs than this.
> * The dilemma
> When we began the 4.3 release cycle, this was one of the items we
> identified as a key feature we needed to get for 4.3.  Wei Liu started
> work on an extension of the existing implmentation, allowing 3 levels
> of event channels.  The draft of this is ready, and just needs the
> last bit of polishing and bug-chasing before it can be accepted.
> However, several months ago, David Vrabel came up with an alternate
> design which in theory was more scalable, based on queues of linked
> lists (which we have internally been calling "FIFO" for short).  David
> has been working on the implementation since, and has a draft
> protoype; but it's in no shape to be included in 4.3.
> There are some things that are attractive about the second solution,
> including the flexible assignment of interrupt priorities, ease of
> scalability, and potentially even the FIFO nature of the interrupt
> delivery.
> The question at hand then, is whether to take what we have in the
> 3-level implementation for 4.3, or wait to see how the FIFO
> implementation turns out (taking either it or the 3-level
> implementation in 4.4).
> * The solution in hand: 3-level event channels
> The basic idea behind 3-level event channels is to extend the existing
> 2-level implementation to 3 levels.  Going to 3 levels would give us
> 32k event channels for 32-bit, and 256k for 64-bit.
> One of the advantages of this method is that since it is similar to
> the existing method, the general concepts and race conditions are
> fairly well understood and tested.
> One of the disadvantages that this method inherits from the 2-level
> event channels is the lack of priority.  In the initial implementation
> of event channels, priority was handled by event channel order: scans
> for events always started at 0 and went upwards.  However, this was
> not very scalable, as lower-numbered events could easily completely
> lock out higher-numbered events; and frequently "lower-numbered"
> simply meant "created earlier".  Event channels were forced into a
> priority even if one was not wanted.
> So the implementation was tweaked, so that scans don't start at 0, but
> continue where the last event left off.  This made it so that earlier
> events were not prioritized and removed the starvation issue, but at
> the cost of removing all event priorities.  Certain events, like the
> timer event, are special-cased to be always checked, but this is
> rather a bit of a hack and not very scalable or flexible.

Hm, I actually think that is not in the upstream kernel at all. That
would explain why on very heavily busy guest the hrtimer: interrupt
took XXxXXXXxx ns is printed.

Is this patch somewhere available?

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.