[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 02/16/16 05:55, Jan Beulich wrote:
> >>> On 16.02.16 at 12:14, <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Mon, 15 Feb 2016, Zhang, Haozhong wrote:
> >> On 02/04/16 20:24, Stefano Stabellini wrote:
> >> > On Thu, 4 Feb 2016, Haozhong Zhang wrote:
> >> > > On 02/03/16 15:22, Stefano Stabellini wrote:
> >> > > > On Wed, 3 Feb 2016, George Dunlap wrote:
> >> > > > > On 03/02/16 12:02, Stefano Stabellini wrote:
> >> > > > > > On Wed, 3 Feb 2016, Haozhong Zhang wrote:
> >> > > > > >> Or, we can make a file system on /dev/pmem0, create files on 
> >> > > > > >> it, set
> >> > > > > >> the owner of those files to xen-qemuuser-domid$domid, and then 
> >> > > > > >> pass
> >> > > > > >> those files to QEMU. In this way, non-root QEMU should be able 
> >> > > > > >> to
> >> > > > > >> mmap those files.
> >> > > > > >
> >> > > > > > Maybe that would work. Worth adding it to the design, I would 
> >> > > > > > like to
> >> > > > > > read more details on it.
> >> > > > > >
> >> > > > > > Also note that QEMU initially runs as root but drops privileges 
> >> > > > > > to
> >> > > > > > xen-qemuuser-domid$domid before the guest is started. Initially 
> >> > > > > > QEMU
> >> > > > > > *could* mmap /dev/pmem0 while is still running as root, but then 
> >> > > > > > it
> >> > > > > > wouldn't work for any devices that need to be mmap'ed at run time
> >> > > > > > (hotplug scenario).
> >> > > > >
> >> > > > > This is basically the same problem we have for a bunch of other 
> >> > > > > things,
> >> > > > > right?  Having xl open a file and then pass it via qmp to qemu 
> >> > > > > should
> >> > > > > work in theory, right?
> >> > > >
> >> > > > Is there one /dev/pmem? per assignable region?
> >> > > 
> >> > > Yes.
> >> > > 
> >> > > BTW, I'm wondering whether and how non-root qemu works with xl disk
> >> > > configuration that is going to access a host block device, e.g.
> >> > >      disk = [ '/dev/sdb,,hda' ]
> >> > > If that works with non-root qemu, I may take the similar solution for
> >> > > pmem.
> >> >  
> >> > Today the user is required to give the correct ownership and access mode
> >> > to the block device, so that non-root QEMU can open it. However in the
> >> > case of PCI passthrough, QEMU needs to mmap /dev/mem, as a consequence
> >> > the feature doesn't work at all with non-root QEMU
> >> > (http://marc.info/?l=xen-devel&m=145261763600528).
> >> > 
> >> > If there is one /dev/pmem device per assignable region, then it would be
> >> > conceivable to change its ownership so that non-root QEMU can open it.
> >> > Or, better, the file descriptor could be passed by the toolstack via
> >> > qmp.
> >> 
> >> Passing file descriptor via qmp is not enough.
> >> 
> >> Let me clarify where the requirement for root/privileged permissions
> >> comes from. The primary workflow in my design that maps a host pmem
> >> region or files in host pmem region to guest is shown as below:
> >>  (1) QEMU in Dom0 mmap the host pmem (the host /dev/pmem0 or files on
> >>      /dev/pmem0) to its virtual address space, i.e. the guest virtual
> >>      address space.
> >>  (2) QEMU asks Xen hypervisor to map the host physical address, i.e. SPA
> >>      occupied by the host pmem to a DomU. This step requires the
> >>      translation from the guest virtual address (where the host pmem is
> >>      mmaped in (1)) to the host physical address. The translation can be
> >>      done by either
> >>     (a) QEMU that parses its own /proc/self/pagemap,
> >>      or
> >>     (b) Xen hypervisor that does the translation by itself [1] (though
> >>         this choice is not quite doable from Konrad's comments [2]).
> >> 
> >> [1] 
> >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00434.html 
> >> [2] 
> >> http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00606.html 
> >> 
> >> For 2-a, reading /proc/self/pagemap requires CAP_SYS_ADMIN capability
> >> since linux kernel 4.0. Furthermore, if we don't mlock the mapped host
> >> pmem (by adding MAP_LOCKED flag to mmap or calling mlock after mmap),
> >> pagemap will not contain all mappings. However, mlock may require
> >> privileged permission to lock memory larger than RLIMIT_MEMLOCK. Because
> >> mlock operates on memory, the permission to open(2) the host pmem files
> >> does not solve the problem and therefore passing file descriptor via qmp
> >> does not help.
> >> 
> >> For 2-b, from Konrad's comments [2], mlock is also required and
> >> privileged permission may be required consequently.
> >> 
> >> Note that the mapping and the address translation are done before QEMU
> >> dropping privileged permissions, so non-root QEMU should be able to work
> >> with above design until we start considering vNVDIMM hotplug (which has
> >> not been supported by the current vNVDIMM implementation in QEMU). In
> >> the hotplug case, we may let Xen pass explicit flags to QEMU to keep it
> >> running with root permissions.
> > 
> > Are we all good with the fact that vNVDIMM hotplug won't work (unless
> > the user explicitly asks for it at domain creation time, which is
> > very unlikely otherwise she could use coldplug)?
> 
> No, at least there needs to be a road towards hotplug, even if
> initially this may not be supported/implemented.

Guangrong: any plan or design for vNVDIMM hotplug in QEMU?

Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.