[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps



On 2014/10/27 18:17, Jan Beulich wrote:
On 27.10.14 at 09:09, <tiejun.chen@xxxxxxxxx> wrote:
On 2014/10/24 22:56, Jan Beulich wrote:
On 24.10.14 at 09:34, <tiejun.chen@xxxxxxxxx> wrote:
We need to check to reserve all reserved device memory maps in e820
to avoid any potential guest memory conflict.

Currently, if we can't insert RDM entries directly, we may need to handle
several ranges as follows:
a. Fixed Ranges --> BUG()
   lowmem_reserved_base-0xA0000: reserved by BIOS implementation,
   BIOS region,
   RESERVED_MEMBASE ~ 0x100000000,

This seems conceptually wrong to me, and I said so before:
Depending on host characteristics this approach may mean you're
going to be unable to build any HVM guests. Minimally there needs
to be a way to avoid these checks (resulting in devices associated
with RMRRs not being assignable to such a guest). I'm therefore

I just use 'err' to indicate if these fixed range overlaps RMRR,

+    /* These overlap may issue guest can't work well. */
+    if ( err )
+    {
+        printf("Guest can't work with some reserved device memory overlap!\n");
+        BUG();
+    }

As I understand, these fixed ranges don't like RAM that we can move
safely out any RMRR overlap. And actually its rare to overlap with those
fixed ranges.

Again - one of my systems has RMRRs in the Ex000 range, which
certainly risks overlapping with the BIOS image should that one be
larger than 64k. Plus with RMRRs being in that region, I can
certainly see (physical) systems with small enough BIOS images
to place RMRRs even in the low Fx000 range, which then quite
certainly would overlap with the (virtual) BIOS range.

But I can remove BUG if you insist on this point.

Whether removing the BUG() here is correct and/or sufficient to
address my concern I can't immediately tell. What I insist on is that

Okay.

_no matter_ what RMRRs a physical host has, it should not prevent
the creation of guests (the worst that may result is that passing
through certain devices doesn't work anymore, and even then the
operator needs to be given a way of circumventing this if (s)he
knows that the device won't access the range post-boot, or if it's
being deemed acceptable for it to do so).

As we know just legacy USB and GFX need these RMRR ranges. Especially, I believe just USB need << 1M space, so it may be possible to be placed below 1M. But I think we can ask BIOS to reallocate them upwards like my real platform,

RMRR region: base_addr ab80a000 end_address ab81dfff

I don't know what platform you're using, maybe its a legacy machine? But anyway it should be feasible to update BIOS. And even we can ask BIOS do this as a normal rule in the future.

For GFX, oftentimes it need dozens of MB,

RMRR region: base_addr ad000000 end_address af7fffff

So it shouldn't be overlapped with <1M.


+            /* If we're going last RAM:Hole range */
+            else if ( end < next_start &&
+                      rdm_start > start &&
+                      rdm_end < next_start &&
+                      type == E820_RAM )
+            {
+                if ( do_insert )
+                {
+                    memmove(&e820[j+1], &e820[j],
+                            (sum_nr - j) * sizeof(struct e820entry));
+
+                    e820[j].size = rdm_start - e820[j].addr;
+                    e820[j].type = E820_RAM;
+
+                    e820[j+1].addr = rdm_start;
+                    e820[j+1].size = rdm_end - rdm_start;
+                    e820[j+1].type = E820_RESERVED;
+                    next_e820_entry_index++;
+                }
+                insert++;
+            }

This if-else-if series looks horrible - is there really no way to consolidate
it? Also, other than punching holes in the E820 map you don't seem to

I know this is ugly but as you know there's no any rule we can make good
use of this case. RMRR can start anywhere so We have to assume any
scenarios,

1. Just amid those remaining e820 entries.
2. Already at the end.
3. If coincide with one RAM range.
4. If we're just aligned with start of one RAM range.
5. If we're just aligned with end of one RAM range.
6. If we're just in of one RAM range.
7. If we're going last RAM:Hole range.

So if you think we're handling correctly, maybe we can continue
optimizing this way once we have a better idea.

I understand that there are various cases to be considered, but
that's no different elsewhere. For example, look at
xen/arch/x86/e820.c:e820_change_range_type() which gets

I don't think this circumstance is same as our requirement.

Here we are trying to insert different multiple entries that they have different range.

Anyway, I can take a further look at if we can improve this.

away with quite a bit shorter an if/else-if sequence.

be doing anything here. And the earlier tools side patches didn't do
anything about this either. Consequently, at the time where it may
become necessary to establish the 1:1 mapping in the P2M, there'll
be the RAM mapping still there, causing the device assignment to fail.

But I already set these range as p2m_access_n, and as you see I also
reserved these range in e820 table. So although the RAM mapping still is
still there but no any actual access.

That's being done in patch 8, but we're talking about patch 6 here.
Also - what size are the RMRRs in your case? The USB ones I know

RMRR region: base_addr ab80a000 end_address ab81dfff
RMRR region: base_addr ad000000 end_address af7fffff

of are typical single or very few page ones, so having the guest
lose that amount of memory may be tolerable. But if the ranges can
get any larger than a couple of pages, or if there can reasonably be
a larger amount of them (like could be the case on e.g. multi-node
systems), simply hiding that memory may not be well received by
our users.

Customers may accept dozens of MB but I'm not sure.


RMRR range:

root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace# xl dmesg |
grep RMRR
(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ab80a000 end_address ab81dfff
(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ad000000 end_address af7fffff
root@tchen0-Shark-Bay-Client-platform:/home/tchen0/workspace#

Without my patch:

(d4) E820 table:
(d4)  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(d4)  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(d4)  HOLE: 00000000:000a0000 - 00000000:000e0000
(d4)  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(d4)  [03]: 00000000:00100000 - 00000000:ab80a000: RAM
(d4)  [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED
(d4)  [05]: 00000000:ab81e000 - 00000000:ad000000: RAM
(d4)  [06]: 00000000:ad000000 - 00000000:af800000: RESERVED

Where would this reserved range come from when you patches
aren't in place?

(d4)  HOLE: 00000000:af800000 - 00000000:fc000000
(d4)  [07]: 00000000:fc000000 - 00000001:00000000: RESERVED


With my patch:

(d2)  f0000-fffff: Main BIOS
(d2) E820 table:
(d2)  [00]: 00000000:00000000 - 00000000:0009e000: RAM
(d2)  [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED
(d2)  HOLE: 00000000:000a0000 - 00000000:000e0000
(d2)  [02]: 00000000:000e0000 - 00000000:00100000: RESERVED
(d2)  [03]: 00000000:00100000 - 00000000:ab80a000: RAM
(d2)  [04]: 00000000:ab80a000 - 00000000:ab81e000: RESERVED
(d2)  [05]: 00000000:ab81e000 - 00000000:ad000000: RAM
(d2)  [06]: 00000000:ad000000 - 00000000:af800000: RESERVED

And this already answers what I asked above: You shouldn't be blindly
hiding 40Mb from the guest.

If we don't reserve these RMRR ranges, so guest may create 1:1 mapping. Then it will affect a device usage in other VM, or a device usage may corrupt these ranges in other VM.

Yes, we really need a policy to do this. So please tell me what you expect.

Thanks
Tiejun

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.