[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>
Date: Tue, 26 Jan 2016 23:30:00 +0800
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Tue, 26 Jan 2016 15:30:15 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>
Mail-followup-to: Jan Beulich <JBeulich@xxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

On 01/26/16 05:44, Jan Beulich wrote:
> >>> On 26.01.16 at 12:44, <George.Dunlap@xxxxxxxxxxxxx> wrote:
> > On Thu, Jan 21, 2016 at 2:52 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
> >>>>> On 21.01.16 at 15:01, <haozhong.zhang@xxxxxxxxx> wrote:
> >>> On 01/21/16 03:25, Jan Beulich wrote:
> >>>> >>> On 21.01.16 at 10:10, <guangrong.xiao@xxxxxxxxxxxxxxx> wrote:
> >>>> > c) hypervisor should mange PMEM resource pool and partition it to 
> >>>> > multiple
> >>>> >     VMs.
> >>>>
> >>>> Yes.
> >>>>
> >>>
> >>> But I Still do not quite understand this part: why must pmem resource
> >>> management and partition be done in hypervisor?
> >>
> >> Because that's where memory management belongs. And PMEM,
> >> other than PBLK, is just another form of RAM.
> > 
> > I haven't looked more deeply into the details of this, but this
> > argument doesn't seem right to me.
> > 
> > Normal RAM in Xen is what might be called "fungible" -- at boot, all
> > RAM is zeroed, and it basically doesn't matter at all what RAM is
> > given to what guest.  (There are restrictions of course: lowmem for
> > DMA, contiguous superpages, &c; but within those groups, it doesn't
> > matter *which* bit of lowmem you get, as long as you get enough to do
> > your job.)  If you reboot your guest or hand RAM back to the
> > hypervisor, you assume that everything in it will disappear.  When you
> > ask for RAM, you can request some parameters that it will have
> > (lowmem, on a specific node, &c), but you can't request a specific
> > page that you had before.
> > 
> > This is not the case for PMEM.  The whole point of PMEM (correct me if
> > I'm wrong) is to be used for long-term storage that survives over
> > reboot.  It matters very much that a guest be given the same PRAM
> > after the host is rebooted that it was given before.  It doesn't make
> > any sense to manage it the way Xen currently manages RAM (i.e., that
> > you request a page and get whatever Xen happens to give you).
> 
> Interesting. This isn't the usage model I have been thinking about
> so far. Having just gone back to the original 0/4 mail, I'm afraid
> we're really left guessing, and you guessed differently than I did.
> My understanding of the intentions of PMEM so far was that this
> is a high-capacity, slower than DRAM but much faster than e.g.
> swapping to disk alternative to normal RAM. I.e. the persistent
> aspect of it wouldn't matter at all in this case (other than for PBLK,
> obviously).
>

Of course, pmem could be used in the way you thought because of its
'ram' aspect. But I think the more meaningful usage is from its
persistent aspect. For example, the implementation of some journal
file systems could store logs in pmem rather than the normal ram, so
that if a power failure happens before those in-memory logs are
completely written to the disk, there would still be chance to restore
them from pmem after next booting (rather than abandoning all of
them).

(I'm still writing the design doc which will include more details of
underlying hardware and the software interface of nvdimm exposed by
current linux)

> However, thinking through your usage model I have problems
> seeing it work in a reasonable way even with virtualization left
> aside: To my knowledge there's no established protocol on how
> multiple parties (different versions of the same OS, or even
> completely different OSes) would arbitrate using such memory
> ranges. And even for a single OS it is, other than for disks (and
> hence PBLK), not immediately clear how it would communicate
> from one boot to another what information got stored where,
> or how it would react to some or all of this storage having
> disappeared (just like a disk which got removed, which - unless
> it held the boot partition - would normally have pretty little
> effect on the OS coming back up).
>

Label storage area is a persistent area on NVDIMM and can be used to
store partitions information. It's not included in pmem (that part
that is mapped into the system address space). Instead, it can be only
accessed through NVDIMM _DSM method [1]. However, what contents are
stored and how they are interpreted are left to software. One way is
to follow NVDIMM Namespace Specification [2] to store an array of
labels that describe the start address (from the base 0 of pmem) and
the size of each partition, which is called as namespace. On Linux,
each namespace is exposed as a /dev/pmemXX device.

In the virtualization, the (virtual) label storage area of vNVDIMM and
the corresponding _DSM method are emulated by QEMU. The virtual label
storage area is not written to the host one. Instead, we can reserve a
piece area on pmem for the virtual one.

Besides namespaces, we can also create DAX file systems on pmem and
use files to partition.

Haozhong

> > So if Xen is going to use PMEM, it will have to invent an entirely new
> > interface for guests, and it will have to keep track of those
> > resources across host reboots.  In other words, it will have to
> > duplicate all the work that Linux already does.  What do we gain from
> > that duplication?  Why not just leverage what's already implemented in
> > dom0?
> 
> Indeed if my guessing on the intentions was wrong, then the
> picture completely changes (also for the points you've made
> further down).
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Haozhong Zhang

References:
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Xiao Guangrong
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Xiao Guangrong
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Xiao Guangrong
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Haozhong Zhang
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: George Dunlap
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
Next by Date: [Xen-devel] [linux-3.10 baseline-only test] 38702: trouble: blocked/broken
Previous by thread: Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Next by thread: Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.