[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC Design Doc] Add vNVDIMM support for Xen



On 02/03/16 10:47, Konrad Rzeszutek Wilk wrote:
> > > > >  Open: It seems no system call/ioctl is provided by Linux kernel to
> > > > >        get the physical address from a virtual address.
> > > > >        /proc/<qemu_pid>/pagemap provides information of mapping from
> > > > >        VA to PA. Is it an acceptable solution to let QEMU parse this
> > > > >        file to get the physical address?
> > > >
> > > > Does it work in a non-root scenario?
> > > >
> > >
> > > Seemingly no, according to Documentation/vm/pagemap.txt in Linux kernel:
> > > | Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get 
> > > PFNs.
> > > | In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
> > > | 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
> > > | Reason: information about PFNs helps in exploiting Rowhammer 
> > > vulnerability.
>
> Ah right.
> > >
> > > A possible alternative is to add a new hypercall similar to
> > > XEN_DOMCTL_memory_mapping but receiving virtual address as the address
> > > parameter and translating to machine address in the hypervisor.
> >
> > That might work.
>
> That won't work.
>
> This is a userspace VMA - which means the once the ioctl is done we swap
> to kernel virtual addresses. Now we may know that the prior cr3 has the
> userspace virtual address and walk it down - but what if the domain
> that is doing this is PVH? (or HVM) - the cr3 of userspace is tucked somewhere
> inside the kernel.
>
> Which means this hypercall would need to know the Linux kernel task structure
> to find this.
>

Thanks for pointing out this. Really it's not a workable solution.

> May I propose another solution - an stacking driver (similar to loop). You
> setup it up (ioctl /dev/pmem0/guest.img, get some /dev/mapper/guest.img 
> created).
> Then mmap the /dev/mapper/guest.img - all of the operations are the same - 
> except
> it may have an extra ioctl - get_pfns - which would provide the data in 
> similar
> form to pagemap.txt.
>

I'll have a look at this, thanks!

> But folks will then ask - why don't you just use pagemap? Could the pagemap
> have an extra security capability check? One that can be set for
> QEMU?
>

Basically for the concern on whether non-root QEMU could work as in
Stefano's comments.

> >
> >
> > > > >  Open: For a large pmem, mmap(2) is very possible to not map all SPA
> > > > >        occupied by pmem at the beginning, i.e. QEMU may not be able to
> > > > >        get all SPA of pmem from buf (in virtual address space) when
> > > > >        calling XEN_DOMCTL_memory_mapping.
> > > > >        Can mmap flag MAP_LOCKED or mlock(2) be used to enforce the
> > > > >        entire pmem being mmaped?
> > > >
> > > > Ditto
> > > >
> > >
> > > No. If I take the above alternative for the first open, maybe the new
> > > hypercall above can inject page faults into dom0 for the unmapped
> > > virtual address so as to enforce dom0 Linux to create the page
> > > mapping.
>
> Ugh. That sounds hacky. And you wouldn't neccessarily be safe.
> Imagine that the system admin decides to defrag the /dev/pmem filesystem.
> Or move the files (disk images) around. If they do that - we may
> still have the guest mapped to system addresses which may contain filesystem
> metadata now, or a different guest image. We MUST mlock or lock the file
> during the duration of the guest.
>
>

So mlocking or locking the mmaped file, or other ways to 'pin' the
mmaped file on pmem is a necessity.

Thanks,
Haozhong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.