[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DOC v1] Xen transport for 9pfs



On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote:
> On Thu, 1 Dec 2016, Dario Faggioli wrote:
> > 
> > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > > 
> > >     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> > > 
> > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash
> > before
> > the index). Not a big deal, I guess, but I thought it could be nice
> > to
> > be a bit more uniform.
> 
> Sure, but in this case each ring-ref-%u is used to map a different
> ring.
>
Yeah, right. So it may even be a good thing to differentiate, indeed...

> That said, I can make the change.
> 
I don't know. I, FWIW, thought it would be good, now I'm not so sure
any longer. Yours and maintainers' call, I guess. :-)

> > If it is, what's the typical envisioned use of these multiple
> > rings, if
> > I can ask?
> 
> They are used to handle multiple read/write requests in parallel.
> Let's
> assume that we configure the rings to be 8K each. Given that the data
> is
> transmitted over the ring, each ring can hold only one outstanding 4K
> write request (there is an header for each write request).
> 
Ok.

> With two 8K rings, we can have two outstanding 4K write requests,
> each
> of them processed in parallel on a different vcpu.
> 
> The system is completely configurable in terms of number and size of
> rings, so a user can configure it to only export one 4K ring for
> example or go as far as several 2MB rings.
> 
Right. So, it is indeed similar to blkif multiqueueing, with which it
also shares the idea/objective of exploiting parallelism at the (v)CPU
level, but without (quite obviously, in this case) any direct link to
hardware queues in disk controllers, and without the protocol itself
giving any direction or indication of how to actually use all this.

Got it. Nice.

FWIW, I think a few words --just a shorter version of what you just
said-- may be useful if present in this document.

> > >     /* not actually C compliant (ring_order changes from socket
> > > to
> > > socket) */
> > >     struct ring_data {
> > >         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > >         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > >     };
> > > 
> > Sorry, what does "ring_order changes from socket to socket" mean?
> 
> Woops, copy/paste error from PVCalls. I meant "ring_order changes
> from
> ring to ring".
> 
Ah, yes, now it makes sense. :-)

BTW, what's the reason for putting ring_order inside xen_9pfs_intf,
instead of having a ring-page-order (well, actually, a
ring-%u-page-order) xenstore key?

> > > The binary layout of `struct xen_9pfs_intf` follows:
> > > 
> > >     0         4         8         12        16        20
> > >     +---------+---------+---------+---------+---------+
> > >     | in_cons | in_prod |out_cons |out_prod |ring_orde|
> > >     +---------+---------+---------+---------+---------+
> > > 
> > >     20        24        26      4092      4096
> > >     +---------+---------+----//---+---------+
> > >     |  ref[0] |  ref[1] |         |  ref[N] |
> > >     +---------+---------+----//---+---------+
> > > 
> > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given
> > > that
> > > N
> > > needs to be a power of two, actually max N is 512.
> > > 
> > It may again be me being still too naive, but I'd quickly add at
> > least
> > another example, with the value of N computed for a multiple pages
> > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."
> 
> For 4 pages, N is 4. N is the number of pages that make up the ring.
> 
> Maybe there is a misunderstanding, let me try to explain it again:
> each
> page shared via xenstore contains information to handle one new ring,
> including grant references for the pages making up the multipage ring
> itself. I'll repeat: pages shared via xenstore are not used as a
> ring,
> they are used to setup the rings, each page has the info to setup a
> new
> ring.
> 
Right, I got this. And indeed I expressed myself very badly above.

So, the descriptor of 1 ring is one page. Such page contains, in signle
page rings, the reference to another page, which is the actual ring. If
the ring is multi-page, the descriptor page contains an array of page
references which, together, are the actual ring.

Such array --of which N is, in the diagram above, the last index-- can
be, as you say, up to 1019 elements big (the available space in a ring
descriptor page). Therefore, the math I was asking about is really the
relationship between N and max-ring-page-order. That is, a ring can
have at most 2^max-ring-page-order pages, and N can be at most 1019
(well, I think it's 1018 if, as in diagram above, you count from 0, but
that does not matter much); so:

 2^max-ring-page-order <= N
 lb(2^max-ring-page-order) <= lb(N)  //lb(): base 2 logarithm
 max-ring-page-order <= lb(N)

and, considering that max-ring-page-order must be a natural number:

 max-ring-page-order <= floor(lb(N))
 max-ring-page-order <= floor(lb(1018))
 max-ring-page-order <= floor(9.9915)
 max-ring-page-order <= 9

so a ring can be at most 2^9 pages big, which indeed matches with your
own calculations, and bring us to the fact that the maximum size of a
ring is 512*4Kb=2Mb

So, to recap (sorry for being so long!), I think that saying:

"**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
N needs to be a power of two, actually max N is 512."

is indeed correct, and probably makes it enough clear that the maximum
ring size is 2MB. It's not equally easy, IMO, to map this back to the
fact that this also mean max-ring-page-order must be at most 9, and
that is not spelled out anywhere else, AFAICT.

Therefore, an example of how things look with a couple of different
values of ring_order, or some shorter and less boring version of this
reasoning and calculations may help with that. That's what I'm trying
to say. :-)

> The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
> page is completely separate and independent from the others. Given
> that
> one page is just 4096 bytes, it can contain max 512 grant refs (see
> calculation above). So the max size of one multipage ring is 512
> pages =
> 2MB.
> 
> Does it make sense?
>
It does, and this is probably me being a mix of, not too used to this,
and too picky... If that's the case, sorry for the noise. :-D

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.