[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy



Wei,

Ping... :)

Thanks
Tiejun

On 2015/5/15 9:52, Chen, Tiejun wrote:
On 2015/5/11 22:54, Wei Liu wrote:
On Mon, May 11, 2015 at 01:35:06PM +0800, Chen, Tiejun wrote:
On 2015/5/8 21:04, Wei Liu wrote:
Sorry for the late review.


Really thanks for taking your time :)

On Fri, Apr 10, 2015 at 05:21:52PM +0800, Tiejun Chen wrote:
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
     rdm = [ 'host, reserve=force/try' ]
Per-device RDM parameter:
     pci = [ 'sbdf, rdm_reserve=force/try' ]

Global RDM parameter allows user to specify reserved regions
explicitly,
e.g. using 'host' to include all reserved regions reported on this
platform
which is good to handle hotplug scenario. In the future this parameter
may be further extended to allow specifying random regions, e.g. even
those belonging to another platform as a preparation for live
migration
with passthrough devices.

'force/try' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'force' means an
immediate error
so VM will be killed, while 'try' allows moving forward with a warning
message thrown out.

Default per-device RDM policy is 'force', while default global RDM
policy
is 'try'. When both policies are specified on a given region,
'force' is
always preferred.

Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
---
  docs/man/xl.cfg.pod.5       | 44 +++++++++++++++++++++++++
  docs/misc/vtd.txt           | 34 ++++++++++++++++++++
  tools/libxl/libxl_create.c  |  5 +++
  tools/libxl/libxl_types.idl | 18 +++++++++++
  tools/libxl/libxlu_pci.c    | 78
+++++++++++++++++++++++++++++++++++++++++++++
  tools/libxl/libxlutil.h     |  4 +++
  tools/libxl/xl_cmdimpl.c    | 21 +++++++++++-
  7 files changed, 203 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 408653f..9ed3055 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -583,6 +583,36 @@ assigned slave device.

  =back

+=item B<rdm=[ "TYPE", "RDM_RESERVE_STRING", ... ]>
+

Shouldn't this be "TYPE,RDM_RESERVE_STRIGN" according to your commit
message? If the only available config is just one string, you probably
don't need a list for this?

Yes, based on that design we don't need a list. So

=item B<rdm=[ "RDM_RESERVE_STRING" ]>


Note that this is still a list (enclosed by "[]"). Maybe you mean

    rdm = "RDM_RESERVE_STRING"

?

Yes, I'll do this.



+(HVM/x86 only) Specifies the information about Reserved Device
Memory (RDM),
+which is necessary to enable robust device passthrough usage. One
example of
+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
+structure on x86 platform.
+Each B<RDM_CHECK_STRING> has the form
C<["TYPE",KEY=VALUE,KEY=VALUE,...> where:
+

RDM_CHECK_STRING?

And here should be corrected like this,

B<RDM_RESERVE_STRING> has the form ...


+=over 4
+
+=item B<"TYPE">
+
+Currently we just have one type. 'host' means all reserved device
memory on
+this platform should be reserved in this VM's pfn space.
+

What are other possible types? If there is only one type then we can

Currently we just have one type and looks that design doesn't make this
clear.

simply ignore the type?

I just think we may introduce something else specific to live
migration in
the future... But I'm really not sure right now.


Fair enough. I was just wondering if there would be any other types. If
so we do need provisioning.

In any case, the "type" argument you proposed is a positional argument
(you require it to be the first element of the spec string").
I think you can just make it a key-value pair to make parsing easier.

Do you mean this statement?

=item B<rdm= "RDM_RESERVE_STRING" >

...

B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:

=over 4

=item B<KEY=VALUE>

Possible B<KEY>s are:

=over 4

=item B<type="STRING">

Currently we just have one type. "host" means all reserved device memory on
this platform should be reserved in this VM's pfn space.

=over 4

=item B<reserve="STRING">
...




+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<reserve="STRING">
+
+Conflict may be detected when reserving reserved device memory in
gfn space.
+'force' means an unsolved conflict leads to immediate VM destroy,
while

Do you mean "immediate VM crash"?

Yes. So I guess I should replace this.


+'try' allows VM moving forward with a warning message thrown out.
'try'
+is default.

Can you please your double quotes for "force", "try" etc.

Sure. Just note we'd like to use "strict"/"relaxed" to replace
"force"/"try"
from next revision according to Jan's suggestion.


No problem.


+
+Note this may be overrided by another sub item, rdm_reserve, in
pci device.
+
  =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>

  Specifies the host PCI devices to passthrough to this guest. Each
B<PCI_SPEC_STRING>
@@ -645,6 +675,20 @@ dom0 without confirmation.  Please use with care.
  D0-D3hot power management states for the PCI device. False (0) by
  default.

+=item B<rdm_check="STRING">
+
+(HVM/x86 only) Specifies the information about Reserved Device
Memory (RDM),
+which is necessary to enable robust device passthrough usage. One
example of
+RDM is reported through ACPI Reserved Memory Region Reporting (RMRR)
+structure on x86 platform.
+
+Conflict may be detected when reserving reserved device memory in
gfn space.
+'force' means an unsolved conflict leads to immediate VM destroy,
while
+'try' allows VM moving forward with a warning message thrown out.
'force'
+is default.
+
+Note this would override another global item, rdm = [''].
+

Note this would override global B<rdm> option.

Fixed.


  =back

  =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..d7434d6 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,40 @@ in the config file:
  To override for a specific device:
      pci = [ '01:00.0,msitranslate=0', '03:00.0' ]

+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to
perform
+PS2 emulation. The regions of memory used for these devices are
marked
+reserved in the e820 map. When we turn on DMA translation, DMA to
those
+regions will fail. Hence BIOS uses RMRR to specify these regions
along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access
these regions.
+
+While creating a VM we should reserve them in advance, and avoid
any conflicts.
+So we introduce user configurable parameters to specify RDM
resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = [ 'host, reserve=force/try' ]
+

The "force/try" should be called "policy". And then you explain what
policies we have.

Do you mean we should rename this?

rdm = [ 'host, policy=force/try' ]


No, I didn't ask you to rename that.

The above line is an example which should reflect the correct syntax.
"force/try" is not the *actual syntax*, i.e. you won't write that in
your config file.

I meant to changes it to "reserve=POLICY". Then you explain the possible
values of POLICY.


Understood so what about this,

To enable this globally, add "rdm" in the config file:

     rdm = [ 'host, reserve=<POLICY>' ]

Or just for a specific device:

     pci = [ '01:00.0,rdm_reserve=<POLICY>', '03:00.0' ]

Global RDM parameter allows user to specify reserved regions explicitly.
Using "host" to include all reserved regions reported on this platform
which is good to handle hotplug scenario. In the future this parameter
may be further extended to allow specifying random regions, e.g. even
those belonging to another platform as a preparation for live migration
with passthrough devices.

Currently "POLICY" includes two options, "strict" and "relaxed". It
decides how to handle conflict when reserving RDM regions in pfn space.
If conflict ...

This is really a policy but 'reserve' may can reflect our action
explicitly,
right?


+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_reserve=force/try', '03:00.0' ]

And you also can see this.

But anyway, if you're really stick to rename this, I'm going to be
fine as
well but we should ping every one to check this point since this name is
from our previous discussion.

+
+Global RDM parameter allows user to specify reserved regions
explicitly.
+Using 'host' to include all reserved regions reported on this
platform
+which is good to handle hotplug scenario. In the future this
parameter
+may be further extended to allow specifying random regions, e.g. even
+those belonging to another platform as a preparation for live
migration
+with passthrough devices.
+
+'force/try' policy decides how to handle conflict when reserving RDM
+regions in pfn space. If conflict exists, 'force' means an
immediate error
+so VM will be killed, while 'try' allows moving forward with a
warning

Be killed by whom? I think it's hvmloader crashes voluntarily, right?

s/VM will be kille/hvmloader crashes voluntarily


+message thrown out.
+

  Caveat on Conventional PCI Device Passthrough
  ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 98687bd..9ed40d4 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1407,6 +1407,11 @@ static void domcreate_attach_pci(libxl__egc
*egc, libxl__multidev *multidev,
      }

      for (i = 0; i < d_config->num_pcidevs; i++) {
+        /*
+         * If the rdm global policy is 'force' we should override
each device.
+         */
+        if (d_config->b_info.rdm.reserve ==
LIBXL_RDM_RESERVE_FLAG_FORCE)
+            d_config->pcidevs[i].rdm_reserve =
LIBXL_RDM_RESERVE_FLAG_FORCE;
          ret = libxl__device_pci_add(gc, domid,
&d_config->pcidevs[i], 1);
          if (ret < 0) {
              LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 47af340..5786455 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -71,6 +71,17 @@ libxl_domain_type = Enumeration("domain_type", [
      (2, "PV"),
      ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")

+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ])
+
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "force"),
+    (1, "try"),
+    ])

If you don't set init_val, the default value would be "force" (0),
is this

Yes.

want you want?

We have a little bit of complexity here,

"Default per-device RDM policy is 'force', while default global RDM
policy
is 'try'. When both policies are specified on a given region, 'force' is
always preferred."


This is going to be done in actual code anyway.

This type is used both in global and per-device setting, so I envisage

Yes.

this to have an invalid value to start with. Appropriate default values

Sounds I should set this,

+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
+


should be done in libxl_TYPE_setdefault functions. And the logic to
detect conflict and preferences done in your construct function.

What do you think?


+
  libxl_channel_connection = Enumeration("channel_connection", [
      (0, "UNKNOWN"),
      (1, "PTY"),
@@ -356,6 +367,11 @@ libxl_domain_sched_params =
Struct("domain_sched_params",[
      ("budget",       integer, {'init_val':
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
      ])

+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("type",    libxl_rdm_reserve_type),
+    ("reserve",   libxl_rdm_reserve_flag),
+    ])
+
  libxl_domain_build_info = Struct("domain_build_info",[
      ("max_vcpus",       integer),
      ("avail_vcpus",     libxl_bitmap),
@@ -401,6 +417,7 @@ libxl_domain_build_info =
Struct("domain_build_info",[
      ("kernel",           string),
      ("cmdline",          string),
      ("ramdisk",          string),
+    ("rdm",     libxl_rdm_reserve),
      ("u", KeyedUnion(None, libxl_domain_type, "type",
                  [("hvm", Struct(None, [("firmware",         string),
                                         ("bios",
libxl_bios_type),
@@ -521,6 +538,7 @@ libxl_device_pci = Struct("device_pci", [
      ("power_mgmt", bool),
      ("permissive", bool),
      ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
      ])

  libxl_device_vtpm = Struct("device_vtpm", [
diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..45be0d9 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,8 @@ static int pcidev_struct_fill(libxl_device_pci
*pcidev, unsigned int domain,
  #define STATE_OPTIONS_K 6
  #define STATE_OPTIONS_V 7
  #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_CHECK_FLAG      10
  int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev,
const char *str)
  {
      unsigned state = STATE_DOMAIN;
@@ -143,6 +145,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg,
libxl_device_pci *pcidev, const char *str
                      pcidev->permissive = atoi(tok);
                  }else if ( !strcmp(optkey, "seize") ) {
                      pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdm_reserve") ) {
+                    if ( !strcmp(tok, "force") ) {
+                        pcidev->rdm_reserve =
LIBXL_RDM_RESERVE_FLAG_FORCE;
+                    } else if ( !strcmp(tok, "try") ) {
+                        pcidev->rdm_reserve =
LIBXL_RDM_RESERVE_FLAG_TRY;
+                    } else {
+                        pcidev->rdm_reserve =
LIBXL_RDM_RESERVE_FLAG_FORCE;
+                        XLU__PCI_ERR(cfg, "Unknown PCI RDM
property flag value:"
+                                          " %s, so goes 'force' by
default.",

If this is not an error, you don't need XLU__PCI_ERR.

But I would say we should  just treat this as an error and
abort/exit/report (whatever the parser should do in this case).

In our case we just want to post a message to set a appropriate flag to
recover this behavior like we write here,

                         XLU__PCI_ERR(cfg, "Unknown PCI RDM property
flag
value:"
                                           " %s, so goes 'strict' by
default.",
                                      tok);

I suggest we just abort in this case and not second guess what the admin
wants.

Okay,
                     } else {
                         XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM
property"
                                           " flag: 'strict' or 'relaxed'.",
                                      tok);
                         abort();




This may just be a warning? But I don't we have this sort of definition,
XLU__PCI_WARN, ...

So what LOG format can be adopted here?

Feel free to introduce XLU__PCI_WARN if it turns out to be necessary.

If it goes to abort(), I think XLU__PCI_ERR() should be good.




+                                     tok);
+                    }
                  }else{
                      XLU__PCI_ERR(cfg, "Unknown PCI BDF option:
%s", optkey);
                  }
@@ -167,6 +180,71 @@ parse_error:
      return ERROR_INVAL;
  }

+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const
char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if ( NULL == (buf2 = ptr = strdup(str)) )
+        return ERROR_NOMEM;
+
+    for(tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {

Coding style. I haven't checked what actual style this file uses, but
there is inconsistency in this function by itself.

I just refer to xlu_pci_parse_bdf() to generate xlu_rdm_parse(), and
they are in the same file...

Anyway, I should change this line,

for ( tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++ ) {


+        case STATE_TYPE:
+            if ( *ptr == '\0' || *ptr == ',' ) {
+                state = STATE_CHECK_FLAG;
+                *ptr = '\0';
+                if ( !strcmp(tok, "host") ) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option:
%s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_CHECK_FLAG:
+            if ( *ptr == '=' ) {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if ( strcmp(tok, "reserve") ) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value:
%s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if ( *ptr == ',' || *ptr == '\0' ) {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if ( !strcmp(tok, "force") ) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_FORCE;
+                } else if ( !strcmp(tok, "try") ) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_TRY;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property flag
value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if ( tok != ptr || state != STATE_TERMINAL )
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
  /*
   * Local variables:
   * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 0333e55..80497f8 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -93,6 +93,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs,
const char *const *specs,
   */
  int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev,
const char *str);

+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const
char *str);

  /*
   * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 04faf98..9a58464 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -988,7 +988,7 @@ static void parse_config_data(const char
*config_source,
      const char *buf;
      long l;
      XLU_Config *config;
-    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids,
*vtpms;
+    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids,
*vtpms, *rdms;
      XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian;
      int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
      int pci_power_mgmt = 0;
@@ -1727,6 +1727,23 @@ skip_vfb:
          xlu_cfg_get_defbool(config, "e820_host",
&b_info->u.pv.e820_host, 0);
      }

+    /*
+     * By default our global policy is to query all rdm entries, and
+     * force reserve them.
+     */
+    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST;
+    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_TRY;

This should probably to into the _setdefault function of
libxl_domain_build_info.

Sorry, I just see this

libxl_domain_build_info_init()
     |
     + libxl_rdm_reserve_init(&p->rdm);
    |
    + memset(p, '\0', sizeof(*p));

But this should be generated automatically, right? So how to
implement your
idea? Could you give me a show?


Check libxl_domain_build_info_setdefault.

To use libxl types. You normally do:

   libxl_TYPE_init
   libxl_TYPE_setdefault

   DO STUFF

   libxl_TYPE_dispose

_init and _dispose are auto-generated. _setdefault is not.

So in our case, maybe we can do this,

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f0da7dc..461606c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -100,6 +100,17 @@ static int sched_params_valid(libxl__gc *gc,
      return 1;
  }

+void libxl__device_rdm_setdefault(libxl__gc *gc,
+                                  libxl_domain_build_info *b_info)
+{
+    /*
+     * By default our global policy is to query all rdm entries, and
+     * force reserve them.
+     */
+    b_info->rdm.type = LIBXL_RDM_RESERVE_TYPE_HOST;
+    b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+}
+
  int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                          libxl_domain_build_info *b_info)
  {
@@ -410,6 +421,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                     libxl_domain_type_to_string(b_info->type));
          return ERROR_INVAL;
      }
+
+    libxl__device_rdm_setdefault(gc, b_info);
      return 0;
  }




+    if (!xlu_cfg_get_list (config, "rdm", &rdms, 0, 0)) {
+        if ((buf = xlu_cfg_get_listitem (rdms, 0)) != NULL ) {
+            libxl_rdm_reserve rdm;
+            if (!xlu_rdm_parse(config, &rdm, buf))
+            {
+                b_info->rdm.type = rdm.type;
+                b_info->rdm.reserve = rdm.reserve;
+            }

You only have one rdm in b_info, so there is no need to use a list for
it in config file.


Is this fine?

     if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {

         libxl_rdm_reserve rdm;

         if (!xlu_rdm_parse(config, &rdm, buf)) {
             b_info->rdm.type = rdm.type;

             b_info->rdm.reserve = rdm.reserve;

         }
     }


I think it is fine. But you'd better wait a little bit for other people
to voice their opinions.


Sure.

Thanks
Tiejun

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.