[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu



On 01/26/16 23:30, Haozhong Zhang wrote:
> On 01/26/16 05:44, Jan Beulich wrote:
> > >>> On 26.01.16 at 12:44, <George.Dunlap@xxxxxxxxxxxxx> wrote:
> > > On Thu, Jan 21, 2016 at 2:52 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> > >>>>> On 21.01.16 at 15:01, <haozhong.zhang@xxxxxxxxx> wrote:
> > >>> On 01/21/16 03:25, Jan Beulich wrote:
> > >>>> >>> On 21.01.16 at 10:10, <guangrong.xiao@xxxxxxxxxxxxxxx> wrote:
> > >>>> > c) hypervisor should mange PMEM resource pool and partition it to 
> > >>>> > multiple
> > >>>> >     VMs.
> > >>>>
> > >>>> Yes.
> > >>>>
> > >>>
> > >>> But I Still do not quite understand this part: why must pmem resource
> > >>> management and partition be done in hypervisor?
> > >>
> > >> Because that's where memory management belongs. And PMEM,
> > >> other than PBLK, is just another form of RAM.
> > > 
> > > I haven't looked more deeply into the details of this, but this
> > > argument doesn't seem right to me.
> > > 
> > > Normal RAM in Xen is what might be called "fungible" -- at boot, all
> > > RAM is zeroed, and it basically doesn't matter at all what RAM is
> > > given to what guest.  (There are restrictions of course: lowmem for
> > > DMA, contiguous superpages, &c; but within those groups, it doesn't
> > > matter *which* bit of lowmem you get, as long as you get enough to do
> > > your job.)  If you reboot your guest or hand RAM back to the
> > > hypervisor, you assume that everything in it will disappear.  When you
> > > ask for RAM, you can request some parameters that it will have
> > > (lowmem, on a specific node, &c), but you can't request a specific
> > > page that you had before.
> > > 
> > > This is not the case for PMEM.  The whole point of PMEM (correct me if
> > > I'm wrong) is to be used for long-term storage that survives over
> > > reboot.  It matters very much that a guest be given the same PRAM
> > > after the host is rebooted that it was given before.  It doesn't make
> > > any sense to manage it the way Xen currently manages RAM (i.e., that
> > > you request a page and get whatever Xen happens to give you).
> > 
> > Interesting. This isn't the usage model I have been thinking about
> > so far. Having just gone back to the original 0/4 mail, I'm afraid
> > we're really left guessing, and you guessed differently than I did.
> > My understanding of the intentions of PMEM so far was that this
> > is a high-capacity, slower than DRAM but much faster than e.g.
> > swapping to disk alternative to normal RAM. I.e. the persistent
> > aspect of it wouldn't matter at all in this case (other than for PBLK,
> > obviously).
> >
> 
> Of course, pmem could be used in the way you thought because of its
> 'ram' aspect. But I think the more meaningful usage is from its
> persistent aspect. For example, the implementation of some journal
> file systems could store logs in pmem rather than the normal ram, so
> that if a power failure happens before those in-memory logs are
> completely written to the disk, there would still be chance to restore
> them from pmem after next booting (rather than abandoning all of
> them).
> 
> (I'm still writing the design doc which will include more details of
> underlying hardware and the software interface of nvdimm exposed by
> current linux)
> 
> > However, thinking through your usage model I have problems
> > seeing it work in a reasonable way even with virtualization left
> > aside: To my knowledge there's no established protocol on how
> > multiple parties (different versions of the same OS, or even
> > completely different OSes) would arbitrate using such memory
> > ranges. And even for a single OS it is, other than for disks (and
> > hence PBLK), not immediately clear how it would communicate
> > from one boot to another what information got stored where,
> > or how it would react to some or all of this storage having
> > disappeared (just like a disk which got removed, which - unless
> > it held the boot partition - would normally have pretty little
> > effect on the OS coming back up).
> >
> 
> Label storage area is a persistent area on NVDIMM and can be used to
> store partitions information. It's not included in pmem (that part
> that is mapped into the system address space). Instead, it can be only
> accessed through NVDIMM _DSM method [1]. However, what contents are
> stored and how they are interpreted are left to software. One way is
> to follow NVDIMM Namespace Specification [2] to store an array of
> labels that describe the start address (from the base 0 of pmem) and
> the size of each partition, which is called as namespace. On Linux,
> each namespace is exposed as a /dev/pmemXX device.
> 
> In the virtualization, the (virtual) label storage area of vNVDIMM and
> the corresponding _DSM method are emulated by QEMU. The virtual label
> storage area is not written to the host one. Instead, we can reserve a
> piece area on pmem for the virtual one.
> 
> Besides namespaces, we can also create DAX file systems on pmem and
> use files to partition.
>

Forgot references:
[1] NVDIMM DSM Interface Examples, 
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2] NVDIMM Namespace Specification, 
http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

> Haozhong
> 
> > > So if Xen is going to use PMEM, it will have to invent an entirely new
> > > interface for guests, and it will have to keep track of those
> > > resources across host reboots.  In other words, it will have to
> > > duplicate all the work that Linux already does.  What do we gain from
> > > that duplication?  Why not just leverage what's already implemented in
> > > dom0?
> > 
> > Indeed if my guessing on the intentions was wrong, then the
> > picture completely changes (also for the points you've made
> > further down).
> > 
> > Jan
> > 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.