[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] Persistent grant maps for xen blk drivers



On 19/10/12 12:46, James Harper wrote:
>>
>> On 19/10/12 03:34, James Harper wrote:
>>>>
>>>> This patch implements persistent grants for the xen-blk{front,back}
>>>> mechanism. The effect of this change is to reduce the number of unmap
>>>> operations performed, since they cause a (costly) TLB shootdown. This
>>>> allows the I/O performance to scale better when a large number of VMs
>>>> are performing I/O.
>>>>
>>>> Previously, the blkfront driver was supplied a bvec[] from the
>>>> request queue. This was granted to dom0; dom0 performed the I/O and
>>>> wrote directly into the grant-mapped memory and unmapped it; blkfront
>>>> then removed foreign access for that grant. The cost of unmapping
>>>> scales badly with the number of CPUs in Dom0. An experiment showed
>>>> that when
>>>> Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
>>>> ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
>>>> (at which point
>>>> 650,000 IOPS are being performed in total). If more than 5 guests are
>>>> used, the performance declines. By 10 guests, only
>>>> 400,000 IOPS are being performed.
>>>>
>>>> This patch improves performance by only unmapping when the
>> connection
>>>> between blkfront and back is broken.
>>>
>>> I assume network drivers would suffer from the same affliction... Would a
>> more general persistent map solution be worth considering (or be possible)?
>> So a common interface to this persistent mapping allowing the persistent
>> pool to be shared between all drivers in the DomU?
>>
>> Yes, there are plans to implement the same for network drivers. I would
>> generally avoid having a shared pool of grants for all the devices of a DomU,
>> as said in the description of the patch:
>>
>> Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black
>> tree. As the grefs are not known apriori, and provide no guarantees on their
>> ordering, we have to perform a search through this tree to find the page, for
>> every gref we receive. This operation takes O(log n) time in the worst case.
>>
>> Having a shared pool with all grants would mean that n will become much
>> higher, and so the search time for a grant would increase.
> 
> I'm asking because I vaguely started a similar project a while back, but 
> didn't get much further than investigating data structures. I had something 
> like the following:
> 
> . redefined gref so that high bit indicates a persistent mapping (on the 
> basis that no DomU is ever going to have >2^31 grants). High bit set 
> indicates a persistent grant which is handled differently.

I don't understand why you need to change the way to pass a gref
arround, this will break compatibility with non-persistent backends,
unless you negotiate the use of persistent grants before actually
starting the data tranfer, but if you do that you already know you are
using persistent grants, so there's no need to set any bit in the gref.

> . New hypercall mem-op's to allocate/deallocate a persistent grant, returning 
> a handle from Dom0 (with high bit set). Dom0 maintains a table of mapped 
> grants with the handle being the index. Ref counting tracks usage so that an 
> unmap won't be allowed when ref>0. I was taking the approach that a chunk of 
> persistent grants would be allocated at boot time and so the actual map/unmap 
> is not done often so the requirement of a hypercall wasn't a big deal. I 
> hadn't figured out how to manage the size of this table yet.

The so called persistent grants are no different from normal grants,
it's just that we agree in blk{front/back} that the same set of grants
will be used for all transations, there's no need to introduce any new
hypercalls, since they are just "regular" grants.

I agree that we could allocate them when initializing blkfront, but I
prefer to allocate them on request, since we won't probably use the
maximum number (RING_SIZE * SEGMENTS_PER_REQUEST).

> . Mapping a gref with the high bit set in Dom0 becomes a lookup into the 
> persistent table and a ref++ rather than an actual mapping operation. 
> Unmapping becomes a ref--.
>
>> Also, if the pool is
>> shared some kind of concurrency control should be added, which will make it
>> even slower.
>>
> 
> Yes, but I think I only needed to worry about that for the actual 
> alloc/dealloc of the persistent map entry which would be an infrequent event. 
> As I said, I never got much further than the above concept so I hadn't fully 
> explored that - at the time I was chasing an imaginary problem with grant 
> tables which turned out to be freelist contention in DomU.

As far as I can see (correct me if I'm wrong), you are proposing a
solution that involves changes to both the guests and the hypervisor
side, I think this introduces uncessary complexity to a problem that can
be solved by merely changing the way blk{front/back} behaves, without
requiring the hypervisor to know if we are using persistent grants or not.

> James
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.