[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] blkback global resources
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Tuesday, March 27, 2012 8:27 AM > To: Daniel Stodden > Cc: Andrei Lifchits; xen-devel > Subject: Re: [Xen-devel] blkback global resources > > >>> On 26.03.12 at 18:53, Daniel Stodden <daniel.stodden@xxxxxxxxxxxxxx> > wrote: > > On Mon, 2012-03-26 at 17:06 +0100, Keir Fraser wrote: > >> Cc'ing Daniel for you on this one, Jan. > >> > >> K. > >> > >> On 26/03/2012 16:56, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >> > >> > All the resources allocated based on xen_blkif_reqs are global in > >> > blkback. While (without having measured anything) I think that this > >> > is bad from a QoS perspective (not the least implied from a warning > >> > issued by Citrix'es multi-page-ring patches: > >> > > >> > if (blkif_reqs < BLK_RING_SIZE(order)) printk(KERN_WARNING > >> > "WARNING: " > >> > "I/O request space (%d reqs) < ring order %ld, " > >> > "consider increasing %s.reqs to >= %ld.", > >> > blkif_reqs, order, KBUILD_MODNAME, > >> > roundup_pow_of_two(BLK_RING_SIZE(order))); > >> > > >> > indicating that this _is_ a bottleneck), I'm otoh hesitant to > >> > convert this to per-instance allocations, as the amount of memory > >> > taken away from Dom0 for this may be not insignificant when there > >> > are many devices. > >> > > >> > Does anyone have an opinion here, in particular regarding the > >> > original authors' decision to make this global vs. the apparently > >> > made observation (by Daniel Stodden, the author of said patch, who > >> > I don't have any current email of to ask directly), but also in the > >> > context of multi-page rings, the purpose of which is to allow for > >> > larger amounts of in-flight I/O? > >> > > >> > Thanks, Jan > > > > Re-CC'ing Andrei Lifchits, I think there's been some work going on at > > Citrix regarding that matter. > > > > Yes, just allocating a pfn pool per backend instance is way too much > > memory balooned out. Otherwise this stuff would have never looked the > > way it does now. > > This of course could be accounted for by having an initially non-empty (large > enough) balloon (not sure how easy it is these days to do this for pv-ops, but > it has always been trivial with the legacy code). That wouldn't help a 32-bit > kernel much (where generally the initial balloon is all in highmem, yet the > vacated pages need to be in lowmem), but for 64-bit kernels it should be > fine. > > > Regarding the right balance, note that on the other extreme end, if > > PFN space were infinite, there's not much expected performance gain > > from rendering virtual backends fully independent. Beyond controller > > queue depth, these requests are all just going to pile up, waiting. > > Is there a way to look through the queue stack to find out how many distinct > ones there are that the backend is running on top of as well as - for a > particular I/O path - the one with the smallest depth? Or can one assume > that the top most one (generally loop's or blktap2's) won't advertise a queue > deeper than what is going to be accepted downstream (probably not, I'd > guess)? Hm, I don't remember seeing anything relating to that off the top of my head in the blkback code, so I don't think so. (I'm not sure the benefit would be that great, anyways). > And - what you say would similarly apply to the usefulness of multi-page > rings afaict. > > > XenServer has some support for decoupling in blktap.ko [1] which > > worked relatively well: Use frame 'pool' kobjects. A bunch of pages, > > mapped to sysfs object. Name was arbitrary. Size configurable, even at > runtime. I have added a similar functionality to blkback (pools configurable through xenstore, with userland tools creating one pool per SR), which is now out in the form of a limited-availability hotfix and will be there in the next XenServer release. Felipe (CC'd) measured the effects on performance and found that it helps. > > Sysfs meant stuff was easily set up by shell or python code, or > > manually. To become operational, every backend must be bound to a pool > > (initially, the global 'default' one, for tool compat). Backends can > > be relinked arbitrarily before entering Connected state. > > > > Then let the userland toolstack set things up according to physical > > I/O topology and properties probed. Basically every physical backend > > (say, a volume group, or a HBA) would start out by allocating and > > dimensioning a dedicated pool (named after the backend), and every > > backend instance fired up gets bound to the pool it belongs to. > > Having userland do all that seems like a fallback solution only to me - I > would > hope that sufficient information is available directly to the drivers. You're probably right. > Thanks in any case for responding so quickly, Jan > > > There's a lot of additional optimizations one could consider, e.g. > > autogrowing the pool (log(nbackends) or so?) and the like. To improve > > locality, having backends which look ahead in their request queue and > > allocate whole batches is probably a good idea too, etc, etc. > > > > HTH, > > Daniel > > > > [1] > > http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git > > mostly in drivers/block/blktap/sysfs.c (show/store_pool) and request.c. > > Note that these are based on mempools, not the frame pools blkback > > would take. > Cheers, Andrei _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |