[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM



On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:
> On 2015/5/8 22:43, Wei Liu wrote:
> >Sorry for the late review. This series fell through the crack.
> >
> 
> Thanks for your review.
> 
> >On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:
> >>While building a VM, HVM domain builder provides struct hvm_info_table{}
> >>to help hvmloader. Currently it includes two fields to construct guest
> >>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> >>check them to fix any conflict with RAM.
> >>
> >>RMRR can reside in address space beyond 4G theoretically, but we never
> >>see this in real world. So in order to avoid breaking highmem layout
> >
> >How does this break highmem layout?
> 
> In most cases highmen is always continuous like [4G, ...] but RMRR is
> theoretically beyond 4G but very rarely. Especially we don't see this
> happened in real world. So we don't want to such a case breaking the
> highmem.
> 

The problem is  we take this approach just because this rarely happens
*now* is not future proof.  It needs to be clearly documented somewhere
in the manual (or any other Intel docs) and be referenced in the code.
Otherwise we might end up in a situation that no-one knows how it is
supposed to work and no-one can fix it if it breaks in the future, that
is, every single device on earth requires RMRR > 4G overnight (I'm
exaggerating).

Or you can just make it works with highmem. How much more work do you
envisage?

(If my above comment makes no sense do let me know. I only have very
shallow understanding of RMRR)

> >
> >>we don't solve highmem conflict. Note this means highmem rmrr could still
> >>be supported if no conflict.
> >>
> >
> >Aren't you actively trying to avoid conflict in libxl?
> 
> RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some
> good policies to address RMRR. In the case of highmemt, that simple policy
> should be enough?
> 

Whatever policy you and HV maintainers agree on. Just clearly document
it.

> >
> >>But in the case of lowmem, RMRR probably scatter the whole RAM space.
> >>Especially multiple RMRR entries would worsen this to lead a complicated
> >>memory layout. And then its hard to extend hvm_info_table{} to work
> >>hvmloader out. So here we're trying to figure out a simple solution to
> >>avoid breaking existing layout. So when a conflict occurs,
> >>
> >>     #1. Above a predefined boundary (default 2G)
> >>         - move lowmem_end below reserved region to solve conflict;
> >>
> >
> >I hope this "predefined boundary" is user tunable. I will check later in
> >this patch if it is the case.
> 
> Yes. As we clarified in that comments,
> 
> * TODO: Its batter to provide a config parameter for this boundary value.
> 
> This mean I would provide a patch address this since currently I just think
> this is not a big deal?
> 

Yes please provide a config option to override that. It's reasonable
that user wants to change that.

> >
> >>     #2 Below a predefined boundary (default 2G)
> >>         - Check force/try policy.
> >>         "force" policy leads to fail libxl. Note when both policies
> >>         are specified on a given region, 'force' is always preferred.
> >>         "try" policy issue a warning message and also mask this entry 
> >> INVALID
> >>         to indicate we shouldn't expose this entry to hvmloader.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
> >>---
> >>  tools/libxc/include/xenctrl.h  |   8 ++
> >>  tools/libxc/include/xenguest.h |   3 +-
> >>  tools/libxc/xc_domain.c        |  40 +++++++++
> >>  tools/libxc/xc_hvm_build_x86.c |  28 +++---
> >>  tools/libxl/libxl_create.c     |   2 +-
> >>  tools/libxl/libxl_dm.c         | 195 
> >> +++++++++++++++++++++++++++++++++++++++++
> >>  tools/libxl/libxl_dom.c        |  27 +++++-
> >>  tools/libxl/libxl_internal.h   |  11 ++-
> >>  tools/libxl/libxl_types.idl    |   7 ++
> >>  9 files changed, 303 insertions(+), 18 deletions(-)
> >>
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> >>index 59bbe06..299b95f 100644
> >>--- a/tools/libxc/include/xenctrl.h
> >>+++ b/tools/libxc/include/xenctrl.h
> >>@@ -2053,6 +2053,14 @@ int xc_get_device_group(xc_interface *xch,
> >>                       uint32_t *num_sdevs,
> >>                       uint32_t *sdev_array);
> >>
> >>+struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(xc_interface *xch,
> >>+                   uint32_t flag,
> >>+                   uint16_t seg,
> >>+                   uint8_t bus,
> >>+                   uint8_t devfn,
> >>+                   unsigned int *nr_entries);
> >>+
> >>  int xc_test_assign_device(xc_interface *xch,
> >>                            uint32_t domid,
> >>                            uint32_t machine_sbdf);

[...]

> >
> >>      uint64_t mem_target;         /* Memory target in bytes. */
> >>      uint64_t mmio_size;          /* Size of the MMIO hole in bytes. */
> >>      const char *image_file_name; /* File name of the image to load. */
> >>diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> >>index 4f8383e..85b18ea 100644
> >>--- a/tools/libxc/xc_domain.c
> >>+++ b/tools/libxc/xc_domain.c
> >>@@ -1665,6 +1665,46 @@ int xc_assign_device(
> >>      return do_domctl(xch, &domctl);
> >>  }
> >>
> >>+struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(xc_interface *xch,
> >>+                   uint32_t flag,
> >>+                   uint16_t seg,
> >>+                   uint8_t bus,
> >>+                   uint8_t devfn,
> >>+                   unsigned int *nr_entries)
> >>+{
> >>+    struct xen_reserved_device_memory *xrdm = NULL;
> >>+    int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, 
> >>xrdm,
> >>+                                           nr_entries);
> >>+
> >>+    if ( rc < 0 )
> >>+    {
> >>+        if ( errno == ENOBUFS )
> >>+        {
> >>+            if ( (xrdm = malloc(*nr_entries *
> >>+                                sizeof(xen_reserved_device_memory_t))) == 
> >>NULL )
> >>+            {
> >>+                PERROR("Could not allocate memory.");
> >>+                goto out;
> >>+            }
> >
> >Don't you leak origin xrdm in this case?
> 
> The caller to xc_device_get_rdm always frees this.
> 

I think I misunderstood how this function works. I thought xrdm was
passed in by caller, which is clearly not the case. Sorry!


In that case, the `if ( rc < 0 )' is not needed because the call should
always return rc < 0. An assert is good enough.

> >
> >And, this style is not very good. Shouldn't the caller allocate enough
> >memory before hand?
> 
> Are you saying the caller to xc_device_get_rdm()? If so, any caller don't
> know this, too.
> 

I see.

> Actually this is just a wrapper of that fundamental hypercall,
> xc_reserved_device_memory_map() in patch #2, and based on that, we always
> have to first call this to inquire how much memory we really need. And this
> is why we have this wrapper since we don't want to duplicate more codes.
> 
> One error handler of this wrapper is just handling ENOBUFS since the caller
> never know how much memory we should allocate. So oftentimes we always set
> 'entries = 0' to inquire firstly.
> 
> Here Jan suggested we may need to figure out a good way to consolidate
> xc_reserved_device_memory_map() and its wrapper, xc_device_get_rdm().
> 
> But in some ways, that wrapper likes a static function so we just need to
> move this into that file associated to its caller, right?
> 

Yes, if there is only one user at the moment, make a static function.

> >
> >>+            rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, 
> >>xrdm,
> >>+                                               nr_entries);
> >>+            if ( rc )
> >>+            {
> >>+                PERROR("Could not get reserved device memory maps.");
> >>+                free(xrdm);
> >>+                xrdm = NULL;
> >>+            }
> >>+        }
> >>+        else {
> >>+            PERROR("Could not get reserved device memory maps.");
> >>+        }
> >>+    }
> >>+
> >>+ out:
> >>+    return xrdm;
> >>+}
> >>+
> >>  int xc_get_device_group(
> >>      xc_interface *xch,
> >>      uint32_t domid,
> >>diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> >>index c81a25b..3f87bb3 100644
> >>--- a/tools/libxc/xc_hvm_build_x86.c
> >>+++ b/tools/libxc/xc_hvm_build_x86.c
> >>@@ -89,19 +89,16 @@ static int modules_init(struct xc_hvm_build_args *args,
> >>  }
> >>
> >>  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> >>-                           uint64_t mmio_start, uint64_t mmio_size)
> >>+                           uint64_t lowmem_end)
> >>  {
> >>      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
> >>          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> >>-    uint64_t lowmem_end = mem_size, highmem_end = 0;
> >>+    uint64_t highmem_end = 0;
> >>      uint8_t sum;
> >>      int i;
> >>
> >>-    if ( lowmem_end > mmio_start )
> >>-    {
> >>-        highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
> >>-        lowmem_end = mmio_start;
> >>-    }
> >>+    if ( mem_size > lowmem_end )
> >>+        highmem_end = (1ull<<32) + (mem_size - lowmem_end);
> >>
> >>      memset(hvm_info_page, 0, PAGE_SIZE);

[...]

> >
> >>+                                   struct xc_hvm_build_args *args)
> >
> >This function does more than "checking", so a better name is needed.
> >
> >May be you should split this function to one "build" function and one
> >"check" function? What do you think?
> 
> We'd better keep this big one since this can make our policy understandable,
> but I agree we need to rename this like,
> 
> libxl__domain_device_construct_rdm()
> 
> construct = check + build :)

I'm fine with this.

> 
> >
> >>+{
> >>+    int i, j, conflict;
> >>+    libxl_ctx *ctx = libxl__gc_owner(gc);
> >
> >You can just use CTX macro.

[...]

> >
> >>+    if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && !d_config->num_pcidevs)
> >>+        return 0;
> >>+
> >>+    /* Collect all rdm info if exist. */
> >>+    xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST,
> >>+                             0, 0, 0, &nr_all_rdms);
> >>+    if (!nr_all_rdms)
> >>+        return 0;
> >>+    d_config->rdms = libxl__calloc(gc, nr_all_rdms,
> >>+                                   sizeof(libxl_device_rdm));
> >
> >Note that if you use "gc" here the allocated memory will be, well,
> >garbage collected at some point. If you don't want them to be gc'ed you
> >should use NOGC.
> 
> Sorry, what does that mean by 'garbage collected'?
> 

That means the memory allocated with gc will be freed at some point by
GC_FREE, because those memory regions are meant to be temporary and used
internally.

When entering a libxl public API function (those start wtih
libxl_), that function calls GC_INIT to initialise a garbage collector.
When that function exits, it calls GC_FREE to free all memory allocated
with that gc.

Since d_config is very likely to be used by libxl user (xl, libvirt
etc), you probably don't want to fill it with gc allocated memory.

> >
> >>+    memset(d_config->rdms, 0, sizeof(libxl_device_rdm));
> >>+
> >>+    /* Query all RDM entries in this platform */
> >>+    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
> >>+        d_config->num_rdms = nr_all_rdms;
> >>+        for (i = 0; i < d_config->num_rdms; i++) {
> >>+            d_config->rdms[i].start =
> >>+                                (uint64_t)xrdm[i].start_pfn << 
> >>XC_PAGE_SHIFT;
> >>+            d_config->rdms[i].size =
> >>+                                (uint64_t)xrdm[i].nr_pages << 
> >>XC_PAGE_SHIFT;
> >>+            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
> >>+        }
> >>+    } else {
> >>+        d_config->num_rdms = 0;
> >>+    }
> >
> >And you should move d_config->rdms = libxl__calloc inside that `if'.
> >That is, don't allocate memory if you don't need it.
> 
> We can't since in all cases we need to preallocate this, and then we will
> handle this according to our policy.
> 

How would it ever be used again if you set d_config->num_rdms to 0? How
do you know the exact size of your array again?

> >
> >>+    free(xrdm);
> >>+
> >>+    /* Query RDM entries per-device */
> >>+    for (i = 0; i < d_config->num_pcidevs; i++) {
> >>+        unsigned int nr_entries = 0;
> 
> Maybe I should restate this,
>       unsigned int nr_entries;
> 
> >>+        bool new = true;
> >>+        seg = d_config->pcidevs[i].domain;
> >>+        bus = d_config->pcidevs[i].bus;
> >>+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, 
> >>d_config->pcidevs[i].func);
> >>+        nr_entries = 0;
> >
> >You've already initialised this variable.
> 
> We need to set this as zero to start.
> 

Either of the tow works for me. Just don't want redundant
initialisation.

> >
> >>+        xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_NONE,
> >>+                                 seg, bus, devfn, &nr_entries);
> >>+        /* No RDM to associated with this device. */
> >>+        if (!nr_entries)
> >>+            continue;
> >>+
> >>+        /* Need to check whether this entry is already saved in the array.
> >>+         * This could come from two cases:
> >>+         *
> >>+         *   - user may configure to get all RMRRs in this platform, which
> >>+         * is already queried before this point
> >
> >Formatting.
> 
> Are you saying this?

I mean you need to move "is already..." to the right go align with
previous line.

> 
> +        /* Need to check whether this entry is already saved in the array.
> 
> =>

The CODING_STYLE in libxl doesn't seem to enforce this, so you can just
follow other examples.

>         /*
> 
>          * Need to check whether this entry is already saved in the array.
>          * This could come from two cases:
> 
> >
> >>+         *   - or two assigned devices may share one RMRR entry
> >>+         *
> >>+         * different policies may be configured on the same RMRR due to 
> >>above
> >>+         * two cases. We choose a simple policy to always favor stricter 
> >>policy
> >>+         */
> >>+        for (j = 0; j < d_config->num_rdms; j++) {
> >>+            if (d_config->rdms[j].start ==
> >>+                                (uint64_t)xrdm[0].start_pfn << 
> >>XC_PAGE_SHIFT)
> >>+             {
> >>+                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_FORCE)
> >>+                    d_config->rdms[j].flag = 
> >>d_config->pcidevs[i].rdm_reserve;
> >>+                new = false;
> >>+                break;
> >>+            }
> >>+        }
> >>+
> >>+        if (new) {
> >>+            if (d_config->num_rdms > nr_all_rdms - 1) {
> >>+                LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Exceed rdm array 
> >>boundary!\n");
> >
> >LOG(ERROR, ...)
> 
> Fixed.
> 
> >
> >>+                free(xrdm);
> >>+                return -1;
> >
> >Please use goto out idiom.
> 
> We just have two 'return -1' differently so I'm not sure its worth doing
> this.
> 

Yes, please comply with libxl idiom.

> >
> >>+            }
> >>+
> >>+            /*
> >>+             * This is a new entry.
> >>+             */
> >
> >/* This is a new entry. */
> 
> Fixed.
> 
> >
> >>+            d_config->rdms[d_config->num_rdms].start =
> >>+                                (uint64_t)xrdm[0].start_pfn << 
> >>XC_PAGE_SHIFT;
> >>+            d_config->rdms[d_config->num_rdms].size =
> >>+                                (uint64_t)xrdm[0].nr_pages << 
> >>XC_PAGE_SHIFT;
> >>+            d_config->rdms[d_config->num_rdms].flag = 
> >>d_config->pcidevs[i].rdm_reserve;
> >>+            d_config->num_rdms++;
> >
> >Does this work? I don't see you reallocate memory.
> 
> Like I replied above we always preallocate this at the beginning.
> 

Ah, OK.

But please don't do this. It's hard to see you don't overrun the
buffer. Please allocate memory only when you need it.

> >
> >>+        }
> >>+        free(xrdm);
> >
> >Bug: you free xrdm several times.
> 
> No any conflict.
> 
> What I did is that I would free once I finish to calling every
> xc_device_get_rdm().
> 

OK. I misread. Sorry.


[...]

> 
> >
> >>+    /* Next step is to check and avoid potential conflict between RDM 
> >>entries
> >>+     * and guest RAM. To avoid intrusive impact to existing memory layout
> >>+     * {lowmem, mmio, highmem} which is passed around various function 
> >>blocks,
> >>+     * below conflicts are not handled which are rare and handling them 
> >>would
> >>+     * lead to a more scattered layout:
> >>+     *  - RMRR in highmem area (>4G)
> >>+     *  - RMRR lower than a defined memory boundary (e.g. 2G)
> >>+     * Otherwise for conflicts between boundary and 4G, we'll simply move 
> >>lowmem
> >>+     * end below reserved region to solve conflict.
> >>+     *
> >>+     * If a conflict is detected on a given RMRR entry, an error will be
> >>+     * returned.
> >>+     * If 'force' policy is specified. Or conflict is treated as a warning 
> >>if
> >>+     * 'try' policy is specified, and we also mark this as INVALID not to 
> >>expose
> >>+     * this entry to hvmloader.
> >>+     *
> >>+     * Firstly we should check the case of rdm < 4G because we may need to
> >>+     * expand highmem_end.
> >
> >Is this strategy agreed in previous discussion? How future-proof is this
> 
> Yes, this is based on that design.
> 

OK.

[...]

> >>
> >>  int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> >>-              libxl_domain_build_info *info,
> >>+              libxl_domain_config *d_config,
> >>                libxl__domain_build_state *state)
> >>  {
> >>      libxl_ctx *ctx = libxl__gc_owner(gc);
> >>      struct xc_hvm_build_args args = {};
> >>      int ret, rc = ERROR_FAIL;
> >>+    libxl_domain_build_info *const info = &d_config->b_info;
> >>+    uint64_t rdm_mem_boundary, mmio_start;

I didn't mention this in the first pass. You seem to have inserted some
tabs? We use space to indent.


Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.