[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] (v2) Design proposal for RMRR fix



On Tue, Jan 13, 2015 at 11:03 AM, Tian, Kevin <kevin.tian@xxxxxxxxx> wrote:
>> Right; so the "report" in this case is "report to the guest".
>>
>> As I said, I think that's confusing terminology; after all, we want to
>> report to the guest all holes that we make, and only the holes that we
>> make.  The question isn't then which ones we report, but which ones we
>> make holes for. :-)
>
> originally I use 'report' to describe the hypercall which hypervisor composes
> the actual information about RMRR, so it can be 'report to libxl' or 'report
> to the guest' regarding to who invokes that hypercall.

Yes, that's what I thought "report" meant; which I think disposes one
to think of one specific way the system would work: i.e., libxl asks
Xen for the RMRRs, Xen filters out which ones to give it, and libxl
passes on all RMRRs reported to it by Xen.  Since I think libxl is the
right place to "filter" RMRRs, then we shouldn't think about this as
Xen "reporting" RMRRs, but as libxl "querying" RMRRs, and then
choosing which ones to pass in (perhaps all of them, perhaps a
subset).

>> So for this discussion, maybe "rmrr-host" (meaning, copy RMRRs from
>> the host) or "rmrr-sel" (meaning, specify a selection of RMRRs, which
>> may be from this host, or even another host)?
>
> the counterpart of 'rmrr-host' gives me feeling of 'rmrr-guest'. :-)

Think about the "e820_host" option for a bit (which tries to make the
guest's e820 memory regions look like the hosts') and maybe it will
make more sense. :-)

>> Well it will have an impact on the overall design of the code; but
>> you're right, if RMRRs really can (and will) be anywhere in memory,
>> then the domain builder will need to know what RMRRs are going to be
>> reserved for this VM and avoid populating those.  If, on the other
>> hand, we can make some fairly small assumptions about where there will
>> not be any RMRRs, then we can get away with handling everything in
>> hvmloader.
>
> I'm not sure such fairly small assumptions can be made. For example,
> host RMRR may include one or several regions in host PCI MMIO
> space (say >3G), then hvmloader has to understand such knowledge
> to avoid allocating them for guest PCI MMIO.

Yes, I'm talking here about Jan's idea of having the domain builder in
libxc do the minimal amount of work to get hvmloader to run, and then
having hvmloader populate the rest of the address space. So the
comparison is:

1. Both libxc and hvmloader know about RMRRs.  libxc uses this
information to avoid placing the hvmloader over an RMRR region,
hvmloader uses the information to populate the memory map and place
the MMIO ranges such that neither overlap with RMRRs.

2. Only hvmloader knows about RMRRs.  libxc places hvmloader in a
location in RAM basically guaranteed never to overlap with RMRRs;
hvmloader uses the information to populate memory map and place the
MMIO ranges such that neither overlap with RMRRs.

#2 is only possible if we can find a region of the physical address
space almost guaranteed never to overlap with RMRRs.  Otherwise, we
may have to fall back to #1.

>> > and state my major intention again. I don't think the preparation (i.e.
>> > detect confliction and make holes) for device assignment should be a
>> > a blocking failure.  Throw warning should be enough (i.e. in libxc). We
>> > should let actual device assignment path to make final call based on
>> > admin's configuration (default 'fail' w/ 'warn' override). Based on that
>> > policy I think 'report-all' (making holes for all host RMRRs) is an
>> > acceptable approach, w/ small impact on possibly more warning
>> > messages (actually not bad to help admin understand the hotplug
>> > possibility on this platform) and show more reserved regions to the
>> > end user (but he shouldn't make any assumption here). :-)
>>
>> I don't really understand what you're talking about here.
>>
>> When the libxc domain builder runs, there is *no* guest memory mapped.
>> So if it has the RMRRs, then it can *avoid* conflict; and if it
>> doesn't have the RMRRs, it can't even *detect* conflict.  So there is
>> no reason for libxc to either give a warning, or cause a failure.
>
> not all the conflicts can or will be avoided. e.g. USB may report a
> region conflicting with guest BIOS which is a hard conflict. another
> example (from one design option) is that we may want to keep
> current cross-component structure (one lowmem + one highmem)
> so conflict in the middle (e.g. 2G) is a problem (avoiding it will break
> lowmem or make lowmem too small).

Ah, this is the missing bit of information.  So can you expand on this
a bit -- are you saying that the guest BIOS has a fixed place in RAM
it must be loaded, and that area can't be changed?  And that
furthermore, for some reason, this may overlap with RMRRs for
passed-through devices?

>> I'm also not clear what assumptions "he" may be making: you mean, the
>> existence of an RMRR in the e820 map shouldn't be taken to mean that
>> he has a specific device assigned?  No, indeed, he should not make
>> such an assumption. :-)
>
> I meant 'he' shouldn't make assumption on how many reserved regions
> should exist in e820 based on exposed devices. Jan has a concern exposing
> more reserved regions in e820 than necessary is not a good thing. I'm
> trying to convince him it should be fine. :-)

Right -- well there is a level of practicality here: if in fact many
operating systems ignore the e820 map and base their ideas on what
devices are present, then we would have to try to work around that.

But since this is actually done by the OS and not the driver, in the
absence of any major OSes that actually behave this way, it seems to
me like taking the simpler option of assuming that the guest OS will
honor the e820 map should be OK.

> Then I hope you understand now about our discussion in libxl/xen/
> hvmloader, based on the fact that conflict may not be avoided.
> That's the major open in original discussion with Jan. I'd like to
> give an example of the flow here per Jan's suggestion, starting
> from domain builder after reserved regions have been specified
> by high level libxl.
>
> Let's take an synthetic platform w/ two devices each reported
> with one RMRR reserved region:
>         (D1): [0xe0000, 0xeffff] in <1MB area
>         (D2): [0xa0000000, 0xa37fffff] in ~2.75G area
>
> The guest is configured with 4G memory, and assigned with D2.
> due to libxl policy (say for migration and hotplug) in total 3
> ranges are reported:
>         (hotplug): [0xe0000, 0xeffff] in <1MB area in this node
>         (migration): [0x40000000, 0x40003fff] in ~1G area in another node
>         (static-assign): [0xa0000000, 0xa37fffff] in ~2.75G area in this node
>
> let's use xenstore to save such information (assume accessible to both
> domain builder and hvmloader?)

IIRC xenstore is available to hvmloader, but *not* to the libxc domain
builder.  (I could be wrong about that.)

>
> STEP-1. domain builder
>
> say the default layout w/o reserved regions would be:
>         lowmem:         [0, 0xbfffffff]
>         mmio hole:      [0xc0000000, 0xffffffff]
>         highmem:        [0x100000000, 0x140000000]
>
> domain builder then queries reserved regions from xenstore,
> and tries to avoid conflicts.
>
> For [0xad000000, 0xaf7fffff], it can be avoided by reducing
> lowmem to 0xad000000 and increase highmem:
>         lowmem:         [0, 0x9fffffff]
>         mmio hole:      [0xa0000000, 0xffffffff]
>         highmem:        [0x100000000, 0x160000000]
>
>
> For [0x40000000, 0x40003fff], leave it as a conflict since either
> reducing lowmem to 1G is not nice to guest which doesn't use
> highmem or we have to break lowmem into two trunks so more
> structure changes are required.

Why can we not just leave that area of memory unpopulated?

> For [0xe0000, 0xeffff], leave it as a conflict (w/ guest BIOS)

And we can't move the guest BIOS in any way?

So in your example here, libxc continues to do the main work of laying
out the address space, and hvmloader only has to deal with laying out
the MMIO regions.

What do you think of Jan's idea, of changing things so that libxc only
does just enough work to set up hvmloader, and then hvmloader
populates the guest memory -- avoiding putting either RAM or MMIO
regions into RMRRs?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.