[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 04/21/16 01:04, Jan Beulich wrote:
> >>> On 21.04.16 at 07:09, <haozhong.zhang@xxxxxxxxx> wrote:
> > On 04/12/16 16:45, Haozhong Zhang wrote:
> >> On 04/08/16 09:52, Jan Beulich wrote:
> >> > >>> On 08.04.16 at 07:02, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > > On 03/29/16 04:49, Jan Beulich wrote:
> >> > >> >>> On 29.03.16 at 12:10, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > >> > On 03/29/16 03:11, Jan Beulich wrote:
> >> > >> >> >>> On 29.03.16 at 10:47, <haozhong.zhang@xxxxxxxxx> wrote:
> >> > > [..]
> >> > >> >> > I still cannot find a neat approach to manage guest permissions 
> >> > >> >> > for
> >> > >> >> > nvdimm pages. A possible one is to use a per-domain bitmap to 
> >> > >> >> > track
> >> > >> >> > permissions: each bit corresponding to an nvdimm page. The 
> >> > >> >> > bitmap can
> >> > >> >> > save lots of spaces and even be stored in the normal ram, but
> >> > >> >> > operating it for a large nvdimm range, especially for a 
> >> > >> >> > contiguous
> >> > >> >> > one, is slower than rangeset.
> >> > >> >> 
> >> > >> >> I don't follow: What would a single bit in that bitmap mean? Any
> >> > >> >> guest may access the page? That surely wouldn't be what we
> >> > >> >> need.
> >> > >> >>
> >> > >> > 
> >> > >> > For a host having a N pages of nvdimm, each domain will have a N 
> >> > >> > bits
> >> > >> > bitmap. If the m'th bit of a domain's bitmap is set, then that 
> >> > >> > domain
> >> > >> > has the permission to access the m'th host nvdimm page.
> >> > >> 
> >> > >> Which will be more overhead as soon as there are enough such
> >> > >> domains in a system.
> >> > >>
> >> > > 
> >> > > Sorry for the late reply.
> >> > > 
> >> > > I think we can make some optimization to reduce the space consumed by
> >> > > the bitmap.
> >> > > 
> >> > > A per-domain bitmap covering the entire host NVDIMM address range is
> >> > > wasteful especially if the actual used ranges are congregated. We may
> >> > > take following ways to reduce its space.
> >> > > 
> >> > > 1) Split the per-domain bitmap into multiple sub-bitmap and each
> >> > >    sub-bitmap covers a smaller and contiguous sub host NVDIMM address
> >> > >    range. In the beginning, no sub-bitmap is allocated for the
> >> > >    domain. If the access permission to a host NVDIMM page in a sub
> >> > >    host address range is added to a domain, only the sub-bitmap for
> >> > >    that address range is allocated for the domain. If access
> >> > >    permissions to all host NVDIMM pages in a sub range are removed
> >> > >    from a domain, the corresponding sub-bitmap can be freed.
> >> > > 
> >> > > 2) If a domain has access permissions to all host NVDIMM pages in a
> >> > >    sub range, the corresponding sub-bitmap will be replaced by a range
> >> > >    struct. If range structs are used to track adjacent ranges, they
> >> > >    will be merged into one range struct. If access permissions to some
> >> > >    pages in that sub range are removed from a domain, the range struct
> >> > >    should be converted back to bitmap segment(s).
> >> > > 
> >> > > 3) Because there might be lots of above bitmap segments and range
> >> > >    structs per-domain, we can organize them in a balanced interval
> >> > >    tree to quickly search/add/remove an individual structure.
> >> > > 
> >> > > In the worst case that each sub range has non-contiguous pages
> >> > > assigned to a domain, above solution will use all sub-bitmaps and
> >> > > consume more space than a single bitmap because of the extra space for
> >> > > organization. I assume that the sysadmin should be responsible to
> >> > > ensure the host nvdimm ranges assigned to each domain as contiguous
> >> > > and congregated as possible in order to avoid the worst case. However,
> >> > > if the worst case does happen, xen hypervisor should refuse to assign
> >> > > nvdimm to guest when it runs out of memory.
> >> > 
> >> > To be honest, this all sounds pretty unconvincing wrt not using
> >> > existing code paths - a lot of special treatment, and hence a lot
> >> > of things that can go (slightly) wrong.
> >> > 
> >> 
> >> Well, using existing range struct to manage guest access permissions
> >> to nvdimm could consume too much space which could not fit in either
> >> memory or nvdimm. If the above solution looks really error-prone,
> >> perhaps we can still come back to the existing one and restrict the
> >> number of range structs each domain could have for nvdimm
> >> (e.g. reserve one 4K-page per-domain for them) to make it work for
> >> nvdimm, though it may reject nvdimm mapping that is terribly
> >> fragmented.
> > 
> > Hi Jan,
> > 
> > Any comments for this?
> 
> Well, nothing new, i.e. my previous opinion on the old proposal didn't
> change. I'm really opposed to any artificial limitations here, as I am to
> any secondary (and hence error prone) code paths. IOW I continue
> to think that there's no reasonable alternative to re-using the existing
> memory management infrastructure for at least the PMEM case.

By re-using the existing memory management infrastructure, do you mean
re-using the existing model of MMIO for passthrough PCI devices to
handle the permission of pmem?

> The
> only open question remains to be where to place the control structures,
> and I think the thresholding proposal of yours was quite sensible.

I'm little confused here. Is 'restrict the number of range structs' in
my previous reply the 'thresholding proposal' you mean? Or it's one of
'artificial limitations'?

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.