|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM
On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:
> On 2015/5/8 22:43, Wei Liu wrote:
> >Sorry for the late review. This series fell through the crack.
> >
>
> Thanks for your review.
>
> >On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:
> >>While building a VM, HVM domain builder provides struct hvm_info_table{}
> >>to help hvmloader. Currently it includes two fields to construct guest
> >>e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> >>check them to fix any conflict with RAM.
> >>
> >>RMRR can reside in address space beyond 4G theoretically, but we never
> >>see this in real world. So in order to avoid breaking highmem layout
> >
> >How does this break highmem layout?
>
> In most cases highmen is always continuous like [4G, ...] but RMRR is
> theoretically beyond 4G but very rarely. Especially we don't see this
> happened in real world. So we don't want to such a case breaking the
> highmem.
>
The problem is we take this approach just because this rarely happens
*now* is not future proof. It needs to be clearly documented somewhere
in the manual (or any other Intel docs) and be referenced in the code.
Otherwise we might end up in a situation that no-one knows how it is
supposed to work and no-one can fix it if it breaks in the future, that
is, every single device on earth requires RMRR > 4G overnight (I'm
exaggerating).
Or you can just make it works with highmem. How much more work do you
envisage?
(If my above comment makes no sense do let me know. I only have very
shallow understanding of RMRR)
> >
> >>we don't solve highmem conflict. Note this means highmem rmrr could still
> >>be supported if no conflict.
> >>
> >
> >Aren't you actively trying to avoid conflict in libxl?
>
> RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some
> good policies to address RMRR. In the case of highmemt, that simple policy
> should be enough?
>
Whatever policy you and HV maintainers agree on. Just clearly document
it.
> >
> >>But in the case of lowmem, RMRR probably scatter the whole RAM space.
> >>Especially multiple RMRR entries would worsen this to lead a complicated
> >>memory layout. And then its hard to extend hvm_info_table{} to work
> >>hvmloader out. So here we're trying to figure out a simple solution to
> >>avoid breaking existing layout. So when a conflict occurs,
> >>
> >> #1. Above a predefined boundary (default 2G)
> >> - move lowmem_end below reserved region to solve conflict;
> >>
> >
> >I hope this "predefined boundary" is user tunable. I will check later in
> >this patch if it is the case.
>
> Yes. As we clarified in that comments,
>
> * TODO: Its batter to provide a config parameter for this boundary value.
>
> This mean I would provide a patch address this since currently I just think
> this is not a big deal?
>
Yes please provide a config option to override that. It's reasonable
that user wants to change that.
> >
> >> #2 Below a predefined boundary (default 2G)
> >> - Check force/try policy.
> >> "force" policy leads to fail libxl. Note when both policies
> >> are specified on a given region, 'force' is always preferred.
> >> "try" policy issue a warning message and also mask this entry
> >> INVALID
> >> to indicate we shouldn't expose this entry to hvmloader.
> >>
> >>Signed-off-by: Tiejun Chen <tiejun.chen@xxxxxxxxx>
> >>---
> >> tools/libxc/include/xenctrl.h | 8 ++
> >> tools/libxc/include/xenguest.h | 3 +-
> >> tools/libxc/xc_domain.c | 40 +++++++++
> >> tools/libxc/xc_hvm_build_x86.c | 28 +++---
> >> tools/libxl/libxl_create.c | 2 +-
> >> tools/libxl/libxl_dm.c | 195
> >> +++++++++++++++++++++++++++++++++++++++++
> >> tools/libxl/libxl_dom.c | 27 +++++-
> >> tools/libxl/libxl_internal.h | 11 ++-
> >> tools/libxl/libxl_types.idl | 7 ++
> >> 9 files changed, 303 insertions(+), 18 deletions(-)
> >>
> >>diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> >>index 59bbe06..299b95f 100644
> >>--- a/tools/libxc/include/xenctrl.h
> >>+++ b/tools/libxc/include/xenctrl.h
> >>@@ -2053,6 +2053,14 @@ int xc_get_device_group(xc_interface *xch,
> >> uint32_t *num_sdevs,
> >> uint32_t *sdev_array);
> >>
> >>+struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(xc_interface *xch,
> >>+ uint32_t flag,
> >>+ uint16_t seg,
> >>+ uint8_t bus,
> >>+ uint8_t devfn,
> >>+ unsigned int *nr_entries);
> >>+
> >> int xc_test_assign_device(xc_interface *xch,
> >> uint32_t domid,
> >> uint32_t machine_sbdf);
[...]
> >
> >> uint64_t mem_target; /* Memory target in bytes. */
> >> uint64_t mmio_size; /* Size of the MMIO hole in bytes. */
> >> const char *image_file_name; /* File name of the image to load. */
> >>diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> >>index 4f8383e..85b18ea 100644
> >>--- a/tools/libxc/xc_domain.c
> >>+++ b/tools/libxc/xc_domain.c
> >>@@ -1665,6 +1665,46 @@ int xc_assign_device(
> >> return do_domctl(xch, &domctl);
> >> }
> >>
> >>+struct xen_reserved_device_memory
> >>+*xc_device_get_rdm(xc_interface *xch,
> >>+ uint32_t flag,
> >>+ uint16_t seg,
> >>+ uint8_t bus,
> >>+ uint8_t devfn,
> >>+ unsigned int *nr_entries)
> >>+{
> >>+ struct xen_reserved_device_memory *xrdm = NULL;
> >>+ int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn,
> >>xrdm,
> >>+ nr_entries);
> >>+
> >>+ if ( rc < 0 )
> >>+ {
> >>+ if ( errno == ENOBUFS )
> >>+ {
> >>+ if ( (xrdm = malloc(*nr_entries *
> >>+ sizeof(xen_reserved_device_memory_t))) ==
> >>NULL )
> >>+ {
> >>+ PERROR("Could not allocate memory.");
> >>+ goto out;
> >>+ }
> >
> >Don't you leak origin xrdm in this case?
>
> The caller to xc_device_get_rdm always frees this.
>
I think I misunderstood how this function works. I thought xrdm was
passed in by caller, which is clearly not the case. Sorry!
In that case, the `if ( rc < 0 )' is not needed because the call should
always return rc < 0. An assert is good enough.
> >
> >And, this style is not very good. Shouldn't the caller allocate enough
> >memory before hand?
>
> Are you saying the caller to xc_device_get_rdm()? If so, any caller don't
> know this, too.
>
I see.
> Actually this is just a wrapper of that fundamental hypercall,
> xc_reserved_device_memory_map() in patch #2, and based on that, we always
> have to first call this to inquire how much memory we really need. And this
> is why we have this wrapper since we don't want to duplicate more codes.
>
> One error handler of this wrapper is just handling ENOBUFS since the caller
> never know how much memory we should allocate. So oftentimes we always set
> 'entries = 0' to inquire firstly.
>
> Here Jan suggested we may need to figure out a good way to consolidate
> xc_reserved_device_memory_map() and its wrapper, xc_device_get_rdm().
>
> But in some ways, that wrapper likes a static function so we just need to
> move this into that file associated to its caller, right?
>
Yes, if there is only one user at the moment, make a static function.
> >
> >>+ rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn,
> >>xrdm,
> >>+ nr_entries);
> >>+ if ( rc )
> >>+ {
> >>+ PERROR("Could not get reserved device memory maps.");
> >>+ free(xrdm);
> >>+ xrdm = NULL;
> >>+ }
> >>+ }
> >>+ else {
> >>+ PERROR("Could not get reserved device memory maps.");
> >>+ }
> >>+ }
> >>+
> >>+ out:
> >>+ return xrdm;
> >>+}
> >>+
> >> int xc_get_device_group(
> >> xc_interface *xch,
> >> uint32_t domid,
> >>diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> >>index c81a25b..3f87bb3 100644
> >>--- a/tools/libxc/xc_hvm_build_x86.c
> >>+++ b/tools/libxc/xc_hvm_build_x86.c
> >>@@ -89,19 +89,16 @@ static int modules_init(struct xc_hvm_build_args *args,
> >> }
> >>
> >> static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> >>- uint64_t mmio_start, uint64_t mmio_size)
> >>+ uint64_t lowmem_end)
> >> {
> >> struct hvm_info_table *hvm_info = (struct hvm_info_table *)
> >> (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> >>- uint64_t lowmem_end = mem_size, highmem_end = 0;
> >>+ uint64_t highmem_end = 0;
> >> uint8_t sum;
> >> int i;
> >>
> >>- if ( lowmem_end > mmio_start )
> >>- {
> >>- highmem_end = (1ull<<32) + (lowmem_end - mmio_start);
> >>- lowmem_end = mmio_start;
> >>- }
> >>+ if ( mem_size > lowmem_end )
> >>+ highmem_end = (1ull<<32) + (mem_size - lowmem_end);
> >>
> >> memset(hvm_info_page, 0, PAGE_SIZE);
[...]
> >
> >>+ struct xc_hvm_build_args *args)
> >
> >This function does more than "checking", so a better name is needed.
> >
> >May be you should split this function to one "build" function and one
> >"check" function? What do you think?
>
> We'd better keep this big one since this can make our policy understandable,
> but I agree we need to rename this like,
>
> libxl__domain_device_construct_rdm()
>
> construct = check + build :)
I'm fine with this.
>
> >
> >>+{
> >>+ int i, j, conflict;
> >>+ libxl_ctx *ctx = libxl__gc_owner(gc);
> >
> >You can just use CTX macro.
[...]
> >
> >>+ if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && !d_config->num_pcidevs)
> >>+ return 0;
> >>+
> >>+ /* Collect all rdm info if exist. */
> >>+ xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST,
> >>+ 0, 0, 0, &nr_all_rdms);
> >>+ if (!nr_all_rdms)
> >>+ return 0;
> >>+ d_config->rdms = libxl__calloc(gc, nr_all_rdms,
> >>+ sizeof(libxl_device_rdm));
> >
> >Note that if you use "gc" here the allocated memory will be, well,
> >garbage collected at some point. If you don't want them to be gc'ed you
> >should use NOGC.
>
> Sorry, what does that mean by 'garbage collected'?
>
That means the memory allocated with gc will be freed at some point by
GC_FREE, because those memory regions are meant to be temporary and used
internally.
When entering a libxl public API function (those start wtih
libxl_), that function calls GC_INIT to initialise a garbage collector.
When that function exits, it calls GC_FREE to free all memory allocated
with that gc.
Since d_config is very likely to be used by libxl user (xl, libvirt
etc), you probably don't want to fill it with gc allocated memory.
> >
> >>+ memset(d_config->rdms, 0, sizeof(libxl_device_rdm));
> >>+
> >>+ /* Query all RDM entries in this platform */
> >>+ if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
> >>+ d_config->num_rdms = nr_all_rdms;
> >>+ for (i = 0; i < d_config->num_rdms; i++) {
> >>+ d_config->rdms[i].start =
> >>+ (uint64_t)xrdm[i].start_pfn <<
> >>XC_PAGE_SHIFT;
> >>+ d_config->rdms[i].size =
> >>+ (uint64_t)xrdm[i].nr_pages <<
> >>XC_PAGE_SHIFT;
> >>+ d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
> >>+ }
> >>+ } else {
> >>+ d_config->num_rdms = 0;
> >>+ }
> >
> >And you should move d_config->rdms = libxl__calloc inside that `if'.
> >That is, don't allocate memory if you don't need it.
>
> We can't since in all cases we need to preallocate this, and then we will
> handle this according to our policy.
>
How would it ever be used again if you set d_config->num_rdms to 0? How
do you know the exact size of your array again?
> >
> >>+ free(xrdm);
> >>+
> >>+ /* Query RDM entries per-device */
> >>+ for (i = 0; i < d_config->num_pcidevs; i++) {
> >>+ unsigned int nr_entries = 0;
>
> Maybe I should restate this,
> unsigned int nr_entries;
>
> >>+ bool new = true;
> >>+ seg = d_config->pcidevs[i].domain;
> >>+ bus = d_config->pcidevs[i].bus;
> >>+ devfn = PCI_DEVFN(d_config->pcidevs[i].dev,
> >>d_config->pcidevs[i].func);
> >>+ nr_entries = 0;
> >
> >You've already initialised this variable.
>
> We need to set this as zero to start.
>
Either of the tow works for me. Just don't want redundant
initialisation.
> >
> >>+ xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_NONE,
> >>+ seg, bus, devfn, &nr_entries);
> >>+ /* No RDM to associated with this device. */
> >>+ if (!nr_entries)
> >>+ continue;
> >>+
> >>+ /* Need to check whether this entry is already saved in the array.
> >>+ * This could come from two cases:
> >>+ *
> >>+ * - user may configure to get all RMRRs in this platform, which
> >>+ * is already queried before this point
> >
> >Formatting.
>
> Are you saying this?
I mean you need to move "is already..." to the right go align with
previous line.
>
> + /* Need to check whether this entry is already saved in the array.
>
> =>
The CODING_STYLE in libxl doesn't seem to enforce this, so you can just
follow other examples.
> /*
>
> * Need to check whether this entry is already saved in the array.
> * This could come from two cases:
>
> >
> >>+ * - or two assigned devices may share one RMRR entry
> >>+ *
> >>+ * different policies may be configured on the same RMRR due to
> >>above
> >>+ * two cases. We choose a simple policy to always favor stricter
> >>policy
> >>+ */
> >>+ for (j = 0; j < d_config->num_rdms; j++) {
> >>+ if (d_config->rdms[j].start ==
> >>+ (uint64_t)xrdm[0].start_pfn <<
> >>XC_PAGE_SHIFT)
> >>+ {
> >>+ if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_FORCE)
> >>+ d_config->rdms[j].flag =
> >>d_config->pcidevs[i].rdm_reserve;
> >>+ new = false;
> >>+ break;
> >>+ }
> >>+ }
> >>+
> >>+ if (new) {
> >>+ if (d_config->num_rdms > nr_all_rdms - 1) {
> >>+ LIBXL__LOG(CTX, LIBXL__LOG_ERROR, "Exceed rdm array
> >>boundary!\n");
> >
> >LOG(ERROR, ...)
>
> Fixed.
>
> >
> >>+ free(xrdm);
> >>+ return -1;
> >
> >Please use goto out idiom.
>
> We just have two 'return -1' differently so I'm not sure its worth doing
> this.
>
Yes, please comply with libxl idiom.
> >
> >>+ }
> >>+
> >>+ /*
> >>+ * This is a new entry.
> >>+ */
> >
> >/* This is a new entry. */
>
> Fixed.
>
> >
> >>+ d_config->rdms[d_config->num_rdms].start =
> >>+ (uint64_t)xrdm[0].start_pfn <<
> >>XC_PAGE_SHIFT;
> >>+ d_config->rdms[d_config->num_rdms].size =
> >>+ (uint64_t)xrdm[0].nr_pages <<
> >>XC_PAGE_SHIFT;
> >>+ d_config->rdms[d_config->num_rdms].flag =
> >>d_config->pcidevs[i].rdm_reserve;
> >>+ d_config->num_rdms++;
> >
> >Does this work? I don't see you reallocate memory.
>
> Like I replied above we always preallocate this at the beginning.
>
Ah, OK.
But please don't do this. It's hard to see you don't overrun the
buffer. Please allocate memory only when you need it.
> >
> >>+ }
> >>+ free(xrdm);
> >
> >Bug: you free xrdm several times.
>
> No any conflict.
>
> What I did is that I would free once I finish to calling every
> xc_device_get_rdm().
>
OK. I misread. Sorry.
[...]
>
> >
> >>+ /* Next step is to check and avoid potential conflict between RDM
> >>entries
> >>+ * and guest RAM. To avoid intrusive impact to existing memory layout
> >>+ * {lowmem, mmio, highmem} which is passed around various function
> >>blocks,
> >>+ * below conflicts are not handled which are rare and handling them
> >>would
> >>+ * lead to a more scattered layout:
> >>+ * - RMRR in highmem area (>4G)
> >>+ * - RMRR lower than a defined memory boundary (e.g. 2G)
> >>+ * Otherwise for conflicts between boundary and 4G, we'll simply move
> >>lowmem
> >>+ * end below reserved region to solve conflict.
> >>+ *
> >>+ * If a conflict is detected on a given RMRR entry, an error will be
> >>+ * returned.
> >>+ * If 'force' policy is specified. Or conflict is treated as a warning
> >>if
> >>+ * 'try' policy is specified, and we also mark this as INVALID not to
> >>expose
> >>+ * this entry to hvmloader.
> >>+ *
> >>+ * Firstly we should check the case of rdm < 4G because we may need to
> >>+ * expand highmem_end.
> >
> >Is this strategy agreed in previous discussion? How future-proof is this
>
> Yes, this is based on that design.
>
OK.
[...]
> >>
> >> int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> >>- libxl_domain_build_info *info,
> >>+ libxl_domain_config *d_config,
> >> libxl__domain_build_state *state)
> >> {
> >> libxl_ctx *ctx = libxl__gc_owner(gc);
> >> struct xc_hvm_build_args args = {};
> >> int ret, rc = ERROR_FAIL;
> >>+ libxl_domain_build_info *const info = &d_config->b_info;
> >>+ uint64_t rdm_mem_boundary, mmio_start;
I didn't mention this in the first pass. You seem to have inserted some
tabs? We use space to indent.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |