[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [GIT PULL] (xen) stable/for-jens-3.10

Hey Jens,

Please in your spare time (if there is such a thing at a conference)
pull this branch:


for your v3.10 branch. Sorry for being so late with this.

It has the 'feature-max-indirect-segments' implemented in both backend
and frontend. The current problem with the backend and frontend is that the
segment size is limited to 11 pages. It means we can at most squeeze in 44kB per
request. The ring can hold 32 (next power of two below 36) requests, meaning we
can do 1.4M of outstanding requests. Nowadays that is not enough.

The problem in the past was addressed in two ways - but neither one went 
The first solution to this proposed by Justin from Spectralogic was to negotiate
the segment size.  This means that the âstruct blkif_sring_entryâ is now a 
variable size.
It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes
(256 pages of data - so 1MB). It is a simple extension by just making the array 
in the
request expand from 11 to a variable size negotiated. But it had limits: this 
still limits the number of segments per request to 255 (as the total number 
must be
specified in the request, which only has an 8-bit field for that purpose).

The other solution (from Intel - Ronghui) was to create one extra ring that 
only has the
âstruct blkif_request_segmentâ in them. The âstruct blkif_requestâ would be 
changed to have
an index in said âsegment ringâ. There is only one segment ring. This means 
that the size of
the initial ring is still the same. The requests would point to the segment and 
enumerate out
how many of the indexes it wants to use. The limit is of course the size of the 
If one assumes a one-page segment this means we can in one request cover ~4MB.
Those patches were posted as RFC and the author never followed up on the ideas 
on changing
it to be a bit more flexible.

There is yet another mechanism that could be employed Â(which these patches 
implement) - and it
borrows from VirtIO protocol. And that is the âindirect descriptorsâ. This very 
similar to
what Intel suggests, but with a twist. The twist is to negotiate how many of 
'segment' pages (aka indirect descriptor pages) we want to support (in reality 
we negotiate
how many entries in the segment we want to cover, and we module the number if 
it is
bigger than the segment size).

This means that with the existing 36 slots in the ring (single page) we can 
32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we 
ample space
in the blkif_request_indirect to span more than one indirect page, that number 
can be also multiplied by eight = 512MB. 

Roger Pau Monne took the idea and implemented them in these patches. They work
great and the corner cases (migration between backends with and without this 
work nicely. The backend has a limit right now off how many indirect entries
it can handle: one indirect page, and at maximum 256 entries (out of 512 - so  
50% of the page
is used). That comes out to 32 slots * 256 entries in a indirect page * 1 
indirect page
per request * 4096 = 32MB.

This is a conservative number that can change in the future. Right now it 
a good balance between giving excellent performance, memory usage in the 
backend, and
balancing the needs of many guests.

In the patchset there is also the split of the blkback structure to be per-VBD.
This means that the spinlock contention we had with many guests trying to do 
I/O and
all the blkback threads hitting the same lock has been eliminated.

Anyhow, please pull and if possible include the nice overview I typed up in the
merge commit.

 Documentation/ABI/stable/sysfs-bus-xen-backend |  18 +
 drivers/block/xen-blkback/blkback.c            | 843 ++++++++++++++++---------
 drivers/block/xen-blkback/common.h             | 145 ++++-
 drivers/block/xen-blkback/xenbus.c             |  38 ++
 drivers/block/xen-blkfront.c                   | 490 +++++++++++---
 include/xen/interface/io/blkif.h               |  53 ++
 6 files changed, 1188 insertions(+), 399 deletions(-)

Roger Pau Monne (7):
      xen-blkback: print stats about persistent grants
      xen-blkback: use balloon pages for all mappings
      xen-blkback: implement LRU mechanism for persistent grants
      xen-blkback: move pending handles list from blkbk to pending_req
      xen-blkback: make the queue of free requests per backend
      xen-blkback: expand map/unmap functions
      xen-block: implement indirect descriptors

Attachment: pgp3av90AZ2S5.pgp
Description: PGP signature

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.