[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
On Wed, Jan 27, 2016 at 03:16:59AM -0700, Jan Beulich wrote: > >>> On 26.01.16 at 20:32, <konrad.wilk@xxxxxxxxxx> wrote: > > On Tue, Jan 26, 2016 at 09:34:13AM -0700, Jan Beulich wrote: > >> >>> On 26.01.16 at 16:57, <haozhong.zhang@xxxxxxxxx> wrote: > >> > On 01/26/16 08:37, Jan Beulich wrote: > >> >> >>> On 26.01.16 at 15:44, <konrad.wilk@xxxxxxxxxx> wrote: > >> >> >> Last year at Linux Plumbers Conference I attended a session > >> >> >> dedicated > >> >> >> to NVDIMM support. I asked the very same question and the INTEL guy > >> >> >> there told me there is indeed something like a partition table meant > >> >> >> to describe the layout of the memory areas and their contents. > >> >> > > >> >> > It is described in details at pmem.io, look at Documents, see > >> >> > http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf see Namespaces > >> >> > section. > >> >> > >> >> Well, that's about how PMEM and PBLK ranges get marked, but not > >> >> about how use of the space inside a PMEM range is coordinated. > >> >> > >> > > >> > How a NVDIMM is partitioned into pmem and pblk is described by ACPI NFIT > >> > table. > >> > Namespace to pmem is something like partition table to disk. > >> > >> But I'm talking about sub-dividing the space inside an individual > >> PMEM range. > > > > The namespaces are it. > > > > Once you have done them you can mount the PMEM range under say /dev/pmem0 > > and then put a filesystem on it (ext4, xfs) - and enable DAX support. > > The DAX just means that the FS will bypass the page cache and write directly > > to the virtual address. > > > > then one can create giant 'dd' images on this filesystem and pass it > > to QEMU to .. expose as NVDIMM to the guest. Because it is a file - the > > blocks > > (or MFNs) for the contents of the file are most certainly discontingous. > > And what's the advantage of this over PBLK? I.e. why would one > want to separate PMEM and PBLK ranges if everything gets used > the same way anyway? Speed. PBLK emulates hardware - by having a sliding window of the DIMM. The OS can only write to a ring-buffer with the system address and the payload (64bytes I think?) - and the hardware (or firmware) picks it up and does the writes to NVDIMM. The only motivation behind this is to deal with errors. Normal PMEM writes do not report errors. As in if the media is busted - the hardware will engage its remap logic and write somewhere else - until all of its remap blocks have been exhausted. At that point writes (I presume, not sure) and reads will report an error - but via an #MCE. Part of this Xen design will be how to handle that :-) With an PBLK - I presume the hardware/firmware will read the block after it has written it - and if there are errors it will report it right away. Which means you can easily hook PBLK nicely in RAID setups right away. It will be slower than PMEM, but it does give you the normal error reporting. That is until the MCE#->OS->fs errors logic gets figured out. The MCE# logic code is being developed right now by Tony Luck on LKML - and the last I saw the MCE# has the system address - and the MCE code would tag the pages with some bit so that the applications would get a signal. > > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |