[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] blkback global resources

>>> On 26.03.12 at 18:53, Daniel Stodden <daniel.stodden@xxxxxxxxxxxxxx> wrote:
> On Mon, 2012-03-26 at 17:06 +0100, Keir Fraser wrote:
>> Cc'ing Daniel for you on this one, Jan.
>>  K.
>> On 26/03/2012 16:56, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
>> > All the resources allocated based on xen_blkif_reqs are global in
>> > blkback. While (without having measured anything) I think that this
>> > is bad from a QoS perspective (not the least implied from a warning
>> > issued by Citrix'es multi-page-ring patches:
>> > 
>> > if (blkif_reqs < BLK_RING_SIZE(order))
>> > printk(KERN_WARNING "WARNING: "
>> >       "I/O request space (%d reqs) < ring order %ld, "
>> >       "consider increasing %s.reqs to >= %ld.",
>> >       blkif_reqs, order, KBUILD_MODNAME,
>> >       roundup_pow_of_two(BLK_RING_SIZE(order)));
>> > 
>> > indicating that this _is_ a bottleneck), I'm otoh hesitant to convert
>> > this to per-instance allocations, as the amount of memory taken
>> > away from Dom0 for this may be not insignificant when there are
>> > many devices.
>> > 
>> > Does anyone have an opinion here, in particular regarding the
>> > original authors' decision to make this global vs. the apparently
>> > made observation (by Daniel Stodden, the author of said patch,
>> > who I don't have any current email of to ask directly), but also
>> > in the context of multi-page rings, the purpose of which is to
>> > allow for larger amounts of in-flight I/O?
>> > 
>> > Thanks, Jan
> Re-CC'ing Andrei Lifchits, I think there's been some work going on at
> Citrix regarding that matter.
> Yes, just allocating a pfn pool per backend instance is way too much
> memory balooned out. Otherwise this stuff would have never looked the
> way it does now.

This of course could be accounted for by having an initially non-empty
(large enough) balloon (not sure how easy it is these days to do this
for pv-ops, but it has always been trivial with the legacy code). That
wouldn't help a 32-bit kernel much (where generally the initial balloon
is all in highmem, yet the vacated pages need to be in lowmem), but
for 64-bit kernels it should be fine.

> Regarding the right balance, note that on the other extreme end, if PFN
> space were infinite, there's not much expected performance gain from
> rendering virtual backends fully independent. Beyond controller queue
> depth, these requests are all just going to pile up, waiting.

Is there a way to look through the queue stack to find out how many
distinct ones there are that the backend is running on top of as well
as - for a particular I/O path - the one with the smallest depth? Or can
one assume that the top most one (generally loop's or blktap2's) won't
advertise a queue deeper than what is going to be accepted
downstream (probably not, I'd guess)?

And - what you say would similarly apply to the usefulness of multi-page
rings afaict.

> XenServer has some support for decoupling in blktap.ko [1] which worked
> relatively well: Use frame 'pool' kobjects. A bunch of pages, mapped to
> sysfs object. Name was arbitrary. Size configurable, even at runtime. 
> Sysfs meant stuff was easily set up by shell or python code, or
> manually. To become operational, every backend must be bound to a pool
> (initially, the global 'default' one, for tool compat). Backends can be
> relinked arbitrarily before entering Connected state.
> Then let the userland toolstack set things up according to physical I/O
> topology and properties probed. Basically every physical backend (say, a
> volume group, or a HBA) would start out by allocating and dimensioning a
> dedicated pool (named after the backend), and every backend instance
> fired up gets bound to the pool it belongs to.

Having userland do all that seems like a fallback solution only to me - I
would hope that sufficient information is available directly to the drivers.

Thanks in any case for responding so quickly,

> There's a lot of additional optimizations one could consider, e.g.
> autogrowing the pool (log(nbackends) or so?) and the like. To improve
> locality, having backends which look ahead in their request queue and
> allocate whole batches is probably a good idea too, etc, etc.
> HTH,
> Daniel
> [1]
> http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git 
>  mostly in drivers/block/blktap/sysfs.c (show/store_pool) and request.c.
>  Note that these are based on mempools, not the frame pools blkback
>  would take.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.