[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices

To: "Justin Gibbs" <justing@xxxxxxxxxxxxxxxx>
From: "Jan Beulich" <JBeulich@xxxxxxxx>
Date: Wed, 14 Mar 2012 08:35:24 +0000
Cc: "jeremy@xxxxxxxx" <jeremy@xxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>, "konrad.wilk@xxxxxxxxxx" <konrad.wilk@xxxxxxxxxx>, "waldi@xxxxxxxxxx" <waldi@xxxxxxxxxx>, "joe.jin@xxxxxxxxxx" <joe.jin@xxxxxxxxxx>, "rusty@xxxxxxxxxxxxxxx" <rusty@xxxxxxxxxxxxxxx>, "weiyi.huang@xxxxxxxxx" <weiyi.huang@xxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "jbarnes@xxxxxxxxxxxxxxxx" <jbarnes@xxxxxxxxxxxxxxxx>, "virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx" <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>, "paul.gortmaker@xxxxxxxxxxxxx" <paul.gortmaker@xxxxxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, DavidVrabel <david.vrabel@xxxxxxxxxx>, Santosh Jodh <Santosh.Jodh@xxxxxxxxxx>, "linux-pci@xxxxxxxxxxxxxxx" <linux-pci@xxxxxxxxxxxxxxx>, "<konrad@xxxxxxxxxx>" <konrad@xxxxxxxxxx>, "akpm@xxxxxxxxxxxxxxxxxxxx" <akpm@xxxxxxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "lersek@xxxxxxxxxx" <lersek@xxxxxxxxxx>, "dgdegra@xxxxxxxxxxxxx" <dgdegra@xxxxxxxxxxxxx>
Delivery-date: Fri, 16 Mar 2012 15:41:49 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 14.03.12 at 07:32, Justin Gibbs <justing@xxxxxxxxxxxxxxxx> wrote:
> There's another problem here that I brought up during the Xen
> Hack-a-thon.  The ring macros require that the ring element count
> be a power of two.  This doesn't mean that the ring will be a power
> of 2 pages in size.  To illustrate this point, I modified the FreeBSD
> blkback driver to provide negotiated ring stats via sysctl.
> 
> Here's a connection to a Windows VM running the Citrix PV drivers:
> 
>     dev.xbbd.2.max_requests: 128
>     dev.xbbd.2.max_request_segments: 11
>     dev.xbbd.2.max_request_size: 45056
>     dev.xbbd.2.ring_elem_size: 108  <= 32bit ABI
>     dev.xbbd.2.ring_pages: 4
>     dev.xbbd.2.ring_elements: 128
>     dev.xbbd.2.ring_waste: 2496
> 
> Over half a page is wasted when ring-page-order is 2.  I'm sure you
> can see where this is going.  :-)
> 
> Here are the limits published by our backend to the XenStore:
> 
>     max-ring-pages = "113"
>     max-ring-page-order = "7"
>     max-requests = "256"
>     max-request-segments = "129"
>     max-request-size = "524288"
> 
> Because we allow so many concurrent, large requests in our product,
> the ring wastage really adds up if the front end doesn't support
> the "ring-pages" variant of the extension.  However, you only need
> a ring-page-order of 3 with this protocol to start seeing pages of
> wasted ring space.
> 
> You don't really want to negotiate "ring-pages" either.  The backends
> often need to support multiple ABIs.  I can easily construct a set
> of limits for the FreeBSD blkback driver which will cause the ring
> limits to vary by a page between the 32bit and 64bit ABIs.
> 
> With all this in mind, the backend must do a dance of rounding up,
> taking the max of the ring sizes for the different ABIs, and then
> validating the front-end published limits taking its ABI into
> account.  The front-end does some of this too.  Its way too messy
> and error prone because we don't communicate the ring element limit
> directly.
> 
> "max-ring-element-order" anyone? :-)

Interesting observation - yes, I think deprecating both pre-existing
methods in favor of something along those lines would be desirable.
(But I'd favor not using the term "order" here as it is - at least in
Linux - usually implied to be used on pages. "max-ringent-log2"
perhaps?)

What you say also implies that all currently floating around Linux
backend patches are flawed in their way of calculating the number
of ring entries, as this number really depends on the protocol the
frontend advertises.

Further, if you're concerned about wasting ring space (and
particularly in the context of your request number/size/segments
extension), shouldn't we bother to define pairs (or larger groups)
of struct blkif_request_segment (as currently a quarter of the space
is mere padding)? Or split grefs from {first,last}_sect altogether?

Finally, while looking at all this again, I stumbled across the use
of blkif_vdev_t in the ring structures: At least Linux'es blkback
completely ignores this field - {xen_,}vbd_translate() simply
overwrites what dispatch_rw_block_io() put there (and with this,
struct phys_req's dev and bdev members seem rather pointless too).
Does anyone recall what the original intention with this request field
was? Allowing I/O on multiple devices over a single ring?

Bottom line - shouldn't we define a blkif2 interface to cleanly
accommodate all the various extensions (and do away with the
protocol variations)?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
  - From: Santosh Jodh
- Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
  - From: Justin Gibbs

Prev by Date: Re: [Xen-devel] service xend and libvirtd not start
Next by Date: Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
Previous by thread: Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
Next by thread: Re: [Xen-devel] [PATCH 0001/001] xen: multi page ring support for block devices
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.