Xen project Mailing List

Re: [Xen-devel] [RFC XEN PATCH v2 00/15] Add vNVDIMM support to HVM domains

To: Dan Williams <dan.j.williams@xxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Tue, 4 Apr 2017 14:05:45 -0400

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>

Delivery-date: Tue, 04 Apr 2017 18:06:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Apr 04, 2017 at 10:59:01AM -0700, Dan Williams wrote: > On Tue, Apr 4, 2017 at 10:34 AM, Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx> wrote: > > On Tue, Apr 04, 2017 at 10:16:41AM -0700, Dan Williams wrote: > >> On Tue, Apr 4, 2017 at 10:00 AM, Konrad Rzeszutek Wilk > >> <konrad.wilk@xxxxxxxxxx> wrote: > >> > On Sat, Apr 01, 2017 at 08:45:45AM -0700, Dan Williams wrote: > >> >> On Sat, Apr 1, 2017 at 4:54 AM, Konrad Rzeszutek Wilk > >> >> <konrad@xxxxxxxxxx> wrote: > >> >> > ..snip.. > >> >> >> >> Is there a resource I can read more about why the hypervisor > >> >> >> >> needs to > >> >> >> >> have this M2P mapping for nvdimm support? > >> >> >> > > >> >> >> > M2P is basically an array of frame numbers. It's indexed by the > >> >> >> > host > >> >> >> > page frame number, or the machine frame number (MFN) in Xen's > >> >> >> > definition. The n'th entry records the guest page frame number > >> >> >> > that is > >> >> >> > mapped to MFN n. M2P is one of the core data structures used in Xen > >> >> >> > memory management, and is used to convert MFN to guest PFN. A > >> >> >> > read-only version of M2P is also exposed as part of ABI to guest. > >> >> >> > In > >> >> >> > the previous design discussion, we decided to put the management of > >> >> >> > NVDIMM in the existing Xen memory management as much as possible, > >> >> >> > so > >> >> >> > we need to build M2P for NVDIMM as well. > >> >> >> > > >> >> >> > >> >> >> Thanks, but what I don't understand is why this M2P lookup is needed? > >> >> > > >> >> > Xen uses it to construct the EPT page tables for the guests. > >> >> > > >> >> >> Does Xen establish this metadata for PCI mmio ranges as well? What > >> >> >> Xen > >> >> > > >> >> > It doesn't have that (M2P) for PCI MMIO ranges. For those it has an > >> >> > ranges construct (since those are usually contingous and given > >> >> > in ranges to a guest). > >> >> > >> >> So, I'm confused again. This patchset / enabling requires both M2P and > >> >> contiguous PMEM ranges. If the PMEM is contiguous it seems you don't > >> >> need M2P and can just reuse the MMIO enabling, or am I missing > >> >> something? > >> > > >> > I think I am confusing you. > >> > > >> > The patchset (specifically 04/15] xen/x86: add > >> > XEN_SYSCTL_nvdimm_pmem_setup to setup host pmem ) > >> > adds a hypercall to tell Xen where on the NVDIMM it can put > >> > the M2P array and as well the frametables ('struct page'). > >> > > >> > There is no range support. The reason is that if break up > >> > an NVDIMM in various chunks (and then put a filesystem on top of it) - > >> > and > >> > then figure out which of the SPAs belong to the file - and then > >> > "expose" that file to a guest as NVDIMM - it's SPAs won't > >> > be contingous. Hence the hypervisor would need to break down > >> > the 'ranges' structure down in either a bitmap or an M2P > >> > and also store it. This can get quite tricky so you may > >> > as well just start with an M2P and 'struct page'. > >> > >> Ok... but the problem then becomes that the filesystem is free to > >> change the file-offset to SPA mapping any time it wants. So the M2P > >> support is broken if it expects static relationships. > > > > Can't you flock an file and that will freeze it? Or mlock it since > > one is rather mmap-ing it? > > Unfortunately no. This dovetails with the discussion we have been > having with filesystem folks about the need to call msync() after > every write. Whenever the filesystem sees a write fault it is free to > move blocks around in the file, think allocation or copy-on-write > operations like reflink. The filesystem depends on the application > calling msync/fsync before it makes the writes from those faults > durable against crash / powerloss. Also, actions like online defrag > can change these offset to physical address relationships without any > involvement from the application. There's currently no mechanism to > lock out this behavior because the filesystem assumes that it can just > invalidate mappings to make the application re-fault. > > >> > >> > The placement of those datastructures is " > >> > v2 patch > >> > series relies on users/admins in Dom0 instead of Dom0 driver to > >> > indicate the > >> > location to store the frametable and M2P of pmem. > >> > " > >> > > >> > Hope this helps? > >> > >> It does, but it still seems we're stuck between either 1/ not needing > >> M2P if we can pass a whole pmem-namespace through to the guest or 2/ > >> M2P being broken by non-static file-offset to physical address > >> mappings. > > > > Aye. So how can 2/ be fixed? I am assuming you would have the same > > issue with KVM - if the file is 'moving' underneath (and the file-offset > > to SPA has changed) won't that affect the EPT and other page entries? > > I don't think KVM has the same issue, but honestly I don't have the > full mental model of how KVM supports mmap. I've at least been able to > run a guest where the "pmem" is just dynamic page cache on the host > side so the physical memory mapping is changing all the time due to > swap. KVM does not have this third-party M2P mapping table to keep up > to date so I assume it is just handled by the standard mmap support > for establishing a guest physical address range and the standard > mapping-invalidate + remap mechanism just works. Could it be possible to have an Xen driver that would listen on these notifications and percolate those changes this driver. Then this driver would make the appropiate hypercalls to update the M2P ? That would solve the 2/ I think? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.