|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/domain: adjust limitation on shared_info allocation below 4G
On Thu, Feb 05, 2026 at 09:29:53AM +0100, Jan Beulich wrote:
> On 04.02.2026 17:46, Roger Pau Monné wrote:
> > On Wed, Feb 04, 2026 at 04:08:21PM +0100, Jan Beulich wrote:
> >> On 04.02.2026 15:52, Roger Pau Monné wrote:
> >>> On Wed, Feb 04, 2026 at 03:06:52PM +0100, Jan Beulich wrote:
> >>>> On 04.02.2026 13:25, Roger Pau Monne wrote:
> >>>>> The limitation of shared_info being allocated below 4G to fit in the
> >>>>> start_info field only applies to 32bit PV guests. On 64bit PV guests
> >>>>> the
> >>>>> start_info field is 64bits wide. HVM guests don't use start_info at
> >>>>> all.
> >>>>>
> >>>>> Drop the restriction in arch_domain_create() and instead free and
> >>>>> re-allocate the page from memory below 4G if needed in switch_compat(),
> >>>>> when the guest is set to run in 32bit PV mode.
> >>>>>
> >>>>> Fixes: 3cadc0469d5c ("x86_64: shared_info must be allocated below 4GB
> >>>>> as it is advertised to 32-bit guests via a 32-bit machine address field
> >>>>> in start_info.")
> >>>>
> >>>> The tag is here because there is the (largely theoretical?) possibility
> >>>> for
> >>>> a system to have no memory at all left below 4Gb, in which case creation
> >>>> of
> >>>> a PV64 or non-shadow HVM guest would needlessly fail?
> >>>
> >>> It's kid of an issue we discovered when using strict domain NUMA node
> >>> placement. At that point the toolstack would exhaust all memory on
> >>> node 0 and by doing that inadvertently consume all memory below 4G.
> >>
> >> Right, and hence also my "memory: arrange to conserve on DMA reservation",
> >> where I'm still fighting with myself as to what to do with the comments you
> >> gave there.
> >
> > Better fighting with yourself rather than fighting with me I guess ;).
> >
> > That change would be controversial with what we currently do on
> > XenServer, because we don't (yet) special case the memory below 4G to
> > not account for it in the per node free amount of memory.
> >
> > What would happen when you append the MEMF_no_dma flag as proposed in
> > your commit, but the caller is also passing MEMF_exact_node with
> > target node 0? AFAICT the allocation would still refuse to use the
> > low 4G pool.
>
> Yes, DMA-ability is intended to take higher priority than exact-node
> requests. Another node would then need choosing by the toolstack.
>
> > Also, your commit should also be expanded to avoid staking claims that
> > would drain the DMA pool, as then populate_physmap() won't be able to
> > allocate from there?
>
> Except that upstream claims aren't node-specific, yet, so could be
> fulfilled my taking memory from other nodes?
That's likely to change at some point, but yes, they are not node
specific yet.
> Aiui the problem would
> only occur if that DAM-able memory was the only memory left in the
> system.
Indeed, in that scenario toolstack will be allowed to make claims that
cover that DMA memory, yet populate physmap won't be able to consume
those claims.
I think there are two item that need to be done for us to append
MEMF_no_dma to populate physmap allocations:
* DMA memory is not reachable by claims.
* DMA memory must be reported to the toolstack, so it can account for
it separately from free memory.
Last point could also be solved by subtracting the DMA memory from the
`free_pages` value returned to the toolstack.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |