On Tue, 2010-09-14 at 17:42 +0100, Jeremy Fitzhardinge wrote: 
> On 09/14/2010 02:07 AM, Ian Campbell wrote:
> > On Mon, 2010-09-13 at 23:51 +0100, Jeremy Fitzhardinge wrote:
> >> On 09/13/2010 02:17 PM, Dan Magenheimer wrote:
> >>>> As a side-effect, it also works for dom0.  If you set dom0_mem on the
> >>>> Xen command line, then nr_pages is limited to that value, but the
> >>>> kernel
> >>>> can still see the system's real E820 map, and therefore adds all the
> >>>> system's memory to its own balloon driver, potentially allowing dom0 to
> >>>> expand up to take all physical memory.
> >>>>
> >>>> However, this may caused bad side-effects if your system memory is much
> >>>> larger than your dom0_mem, especially if you use a 32-bit dom0.  I may
> >>>> need to add a kernel command line option to limit the max initial
> >>>> balloon size to mitigate this...
> >>> I would call this dom0 functionality a bug.  I think both Citrix
> >>> and Oracle use dom0_mem as a normal boot option for every
> >>> installation and, while I think both employ heuristics to choose
> >>> a larger dom0_mem for larger physical memory, I don't think it
> >>> grows large enough for, say, >256GB physical memory, to accommodate
> >>> the necessarily large number of page tables.
> >>>
> >>> So, I'd vote for NOT allowing dom0 to balloon up to physical
> >>> memory if dom0_mem is specified, and possibly a kernel command
> >>> line option that allows it to grow beyond.  Or, possibly, no
> >>> option and never allow dom0 memory to grow beyond dom0_mem
> >>> unless (possibly) it grows with hot-plug.
> >> Yes, its a bit of a problem.  The trouble is that the kernel can't
> >> really distinguish the two cases; either way, it sees a Xen-supplied
> >> xen_start_info->nr_pages as the amount of initial memory available, and
> >> an E820 table referring to more RAM beyond that.
> >>
> >> I guess there are three options:
> >>
> >>    1. add a "xen_maxmem" (or something) kernel parameter to override
> >>       space specified in the E820 table
> >>    2. ignore E820 if its a privileged domain
> > As it stands I don't think it is currently possible to boot any domain 0
> > kernel pre-ballooned other than by using the native mem= option.
> >
> > I think the Right Thing to do would be for privileged domains to combine
> > the results of XENMEM_machine_memory_map (real e820) and
> > XENMEM_memory_map (pseudo-physical "e820") by clamping the result of
> > XENMEM_machine_memory_map at the maximum given in XENMEM_memory_map (or
> > taking some sort of union).
> Does the dom0 domain builder bother to set a pseudo-phys E820?

I thought the default with XENMEM_memory_map was to construct a fake
0..startinfo->nr_pages size e820, which would have been sensible, but it
turns out that's not what happens. In fact XENMEM_memory_map will return
ENOSYS in that case and guests are expected to construct the fake e820

> > However, although I think that the Right Thing, I don't think having
> > domain 0 cut off its e820 at nr_pages unless overridden by mem= would be
> > a problem in practice and it certainly wins in terms of complexity of
> > reconciling XENMEM_memory_map and XENMEM_machine_memory_map.
> Indeed.  I think adding general 32x limit between base and max size will
> prevent a completely unusable system, and then just suggest using mem=
> to control that more precisely (esp for dom0).

Sounds reasonable.


