[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx>
Date: Thu, 21 Jan 2016 00:55:37 +0800
Cc: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>, Keir Fraser <keir@xxxxxxx>
Delivery-date: Wed, 20 Jan 2016 17:03:15 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>



On 01/21/2016 12:47 AM, Konrad Rzeszutek Wilk wrote:

On Thu, Jan 21, 2016 at 12:25:08AM +0800, Xiao Guangrong wrote:



On 01/20/2016 11:47 PM, Konrad Rzeszutek Wilk wrote:

On Wed, Jan 20, 2016 at 11:29:55PM +0800, Xiao Guangrong wrote:



On 01/20/2016 07:20 PM, Jan Beulich wrote:

On 20.01.16 at 12:04, <haozhong.zhang@xxxxxxxxx> wrote:

On 01/20/16 01:46, Jan Beulich wrote:

On 20.01.16 at 06:31, <haozhong.zhang@xxxxxxxxx> wrote:

Secondly, the driver implements a convenient block device interface to
let software access areas where NVDIMM devices are mapped. The
existing vNVDIMM implementation in QEMU uses this interface.

As Linux NVDIMM driver has already done above, why do we bother to
reimplement them in Xen?


See above; a possibility is that we may need a split model (block
layer parts on Dom0, "normal memory" parts in the hypervisor.
Iirc the split is being determined by firmware, and hence set in
stone by the time OS (or hypervisor) boot starts.


For the "normal memory" parts, do you mean parts that map the host
NVDIMM device's address space range to the guest? I'm going to
implement that part in hypervisor and expose it as a hypercall so that
it can be used by QEMU.


To answer this I need to have my understanding of the partitioning
being done by firmware confirmed: If that's the case, then "normal"
means the part that doesn't get exposed as a block device (SSD).
In any event there's no correlation to guest exposure here.


Firmware does not manage NVDIMM. All the operations of nvdimm are handled
by OS.

Actually, there are lots of things we should take into account if we move
the NVDIMM management to hypervisor:


If you remove the block device part and just deal with pmem part then this
gets smaller.


Yes indeed. But xen can not benefit from NVDIMM BLK, i think it is not a long
time plan. :)

Also the _DSM operations - I can't see them being in hypervisor - but only
in the dom0 - which would have the right software to tickle the correct
ioctl on /dev/pmem to do the "management" (carve the NVDIMM, perform
an SMART operation, etc).


Yes, it is reasonable to put it in dom 0 and it makes management tools happy.

a) ACPI NFIT interpretation
    A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the
    base information of NVDIMM devices which includes PMEM info, PBLK
    info, nvdimm device interleave, vendor info, etc. Let me explain it one
    by one.


And it is a static table. As in part of the MADT.


Yes, it is, but we need to fetch updated nvdimm info from _FIT in SSDT/DSDT 
instead
if a nvdimm device is hotpluged, please see below.


    PMEM and PBLK are two modes to access NVDIMM devices:
    1) PMEM can be treated as NV-RAM which is directly mapped to CPU's address
       space so that CPU can r/w it directly.
    2) as NVDIMM has huge capability and CPU's address space is limited, NVDIMM
       only offers two windows which are mapped to CPU's address space, the data
       window and access window, so that CPU can use these two windows to access
       the whole NVDIMM device.

    NVDIMM device is interleaved whose info is also exported so that we can
    calculate the address to access the specified NVDIMM device.


Right, along with the serial numbers.


    NVDIMM devices from different vendor can have different function so that the
    vendor info is exported by NFIT to make vendor's driver work.


via _DSM right?


Yes.


b) ACPI SSDT interpretation
    SSDT offers _DSM method which controls NVDIMM device, such as label 
operation,
    health check etc and hotplug support.


Sounds like the control domain (dom0) would be in charge of that.


Yup. Dom0 is a better place to handle it.


c) Resource management
    NVDIMM resource management challenged as:
    1) PMEM is huge and it is little slower access than RAM so it is not 
suitable
       to manage it as page struct (i think it is not a big problem in Xen
       hypervisor?)
    2) need to partition it to it be used in multiple VMs.
    3) need to support PBLK and partition it in the future.


That all sounds to me like an control domain (dom0) decisions. Not Xen 
hypervisor.


Sure, so let dom0 handle this is better, we are on the same page. :)


d) management tools support
    S.M.A.R.T? error detection and recovering?

c) hotplug support


How does that work? Ah the _DSM will point to the new ACPI NFIT for the OS
to scan. That would require the hypervisor also reading this for it to
update it's data-structures.


Similar as you said. The NVDIMM root device in SSDT/DSDT dedicates a new 
interface,
_FIT, which return the new NFIT once new device hotplugged. And yes, domain 0 is
the better place handing this case too.


That one is a bit difficult. Both the OS and the hypervisor would need to know 
about
this (I think?). dom0 since it gets the ACPI event and needs to process it. Then
the hypervisor needs to be told so it can slurp it up.


Can dom0 receive the interrupt triggered by device hotplug? If yes, we can let 
dom0
handle all the things like native. If it can not, dom0 can interpret ACPI and 
fetch
the irq info out and tell hypervior to pass the irq to dom0, it is doable?


However I don't know if the hypervisor needs to know all the details of an
NVDIMM - or just the starting and ending ranges so that when an guest is created
and the VT-d is constructed - it can be assured that the ranges are valid.

I am not an expert on the P2M code - but I think that would need to be looked
at to make sure it is OK with stitching an E820_NVDIMM type "MFN" into an guest 
PFN.


We do better do not use "E820" as it lacks some advantages of ACPI, such as, 
NUMA, hotplug,
lable support (namespace)...

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Konrad Rzeszutek Wilk

References:
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Haozhong Zhang
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Haozhong Zhang
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Haozhong Zhang
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Xiao Guangrong
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Xiao Guangrong
- Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
Next by Date: Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Previous by thread: Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Next by thread: Re: [Xen-devel] [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.