[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC v1: Xen block protocol overhaul - problem statement (with pictures!)
> > > Which has a nice power of two ring to it. (Ah the puns!) > > > > > > I like the idea of putting the request on a diet - but too much > > > could cause us to miss the opportunity to insert other flags on it. > > > If I recall correctly, the DIF/DIX only need 8 bytes of data. > > > If we make the assumption that: > > > I/O request = one ring entry > > > > So we only need to reserve 8bytes for each DIF/IDX, even if the request > > contains a variable number of data? (I mean, block requests can at a > > minimum contain 4096bytes, or much more) > > I need to double check with Martin (CC-ed here). But my recollection > is that it is just attached the the 'bio'. So if the BIO is 4K or 1MB - > it would only have one DIF/DIX data type. And that is semi-correct. If the user did a horrible job (say using dd) the pages are chained together - and we end up with a link list of bio's. The last bio would point to a page filled with 'sector's worth of data has a checksum. Each checksum occupies 8 bytes. So if the total 'bio' length is say 1MB, this last page is filled with 256 of checksums - so 2048 bytes of data. > > Hmm, but then we operate on the 'struct request' so that might not > be the case.. > > > > > and the "one ring entry" can use the the '4' grants if we just have a > > > 16KB I/O request, but if it is more than that - we use the indirect page > > > > Well, on my purpose I've limited the number of segments of a "rw" > > requests to 2, so it's only 8K, anything bigger has to use indirect > > descriptors, which can fit 4M of data (because I'm passing 4 grant > > frames full of "blkif_request_indirect_entry" entries). > > <nods> > > > > > and can stuff 1MB of data in there. > > > The extra 32-bytes of space for such things as 'DIF/DIX'. This also > > > means we could unify the 'struct request' with the 'discard' operation > > > and it could utilize the 32-bytes of extra unused payload data. > > > > > >>> > > >>> > > >>> The âoperationâ would be BLKIF_OP_INDIRECT. The read/write/discard, > > >>> etc operation would now be in indirect.op. The indirect.gref points to > > >>> a page that is filled with: > > >>> > > >>> > > >>> struct blkif_request_indirect_entry { > > >>> blkif_sector_t sector_number; > > >>> struct blkif_request_segment seg; > > >>> } __attribute__((__packed__)); > > >>> //16 bytes, so we can fit in a page 256 of these structures. > > >>> > > >>> > > >>> This means that with the existing 36 slots in the ring (single page) > > >>> we can cover: 32 slots * each blkif_request_indirect covers: 256 * 4096 > > >>> ~= 32M. If we donât want to use indirect descriptor we can still use > > >>> up to 4 pages of the request (as it has enough space to contain four > > >>> segments and the structure will still be cache-aligned). > > >>> Martin asked me why we even do this via these entries. Meaning why have this tuple of information for each page: <lba, first_sect, last_sect, gref>. The lba on the next subsequent indirect entry is going to be incremented by one. The first_sect and last_sect too... So why not just do: struct blkif_request_indirect { uint8_t operation; blkif_vdev_t handle; /* only for read/write requests */ #ifdef CONFIG_X86_64 uint32_t _pad1; /* offsetof(blkif_request,u.rw.id) == 8 */ #endif uint64_t id; /* private guest value, echoed in resp */ blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ grant_ref_t indirect_desc; uint16_t nr_elems; } And the 'indirect_desc' would point to a page that looks quite close to what the scatterlist looks like: struct indirect_chain { uint16_t op_flag; //*Can D_NEXT, D_START, D_END ? uint16_t next; uint16_t offset; uint16_t length; uint32_t gref; uint32_t _pad; // Need this in case we ever want to // make gref + _pad be a physical addr. } And the page itself would be: struct indirect_chain[256]; the 'next' would just contain the index inside in indirect_chain page - so from 0->256. The offset and length would reference wherein the page the data is contained. This way the 'lba' information is part of the 'blkif_request_indirect' and the payload info is all in the indirect descriptors. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |