|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/domain: adjust limitation on shared_info allocation below 4G
On 05.02.2026 15:08, Roger Pau Monné wrote:
> On Thu, Feb 05, 2026 at 09:29:53AM +0100, Jan Beulich wrote:
>> On 04.02.2026 17:46, Roger Pau Monné wrote:
>>> On Wed, Feb 04, 2026 at 04:08:21PM +0100, Jan Beulich wrote:
>>>> On 04.02.2026 15:52, Roger Pau Monné wrote:
>>>>> On Wed, Feb 04, 2026 at 03:06:52PM +0100, Jan Beulich wrote:
>>>>>> On 04.02.2026 13:25, Roger Pau Monne wrote:
>>>>>>> The limitation of shared_info being allocated below 4G to fit in the
>>>>>>> start_info field only applies to 32bit PV guests. On 64bit PV guests
>>>>>>> the
>>>>>>> start_info field is 64bits wide. HVM guests don't use start_info at
>>>>>>> all.
>>>>>>>
>>>>>>> Drop the restriction in arch_domain_create() and instead free and
>>>>>>> re-allocate the page from memory below 4G if needed in switch_compat(),
>>>>>>> when the guest is set to run in 32bit PV mode.
>>>>>>>
>>>>>>> Fixes: 3cadc0469d5c ("x86_64: shared_info must be allocated below 4GB
>>>>>>> as it is advertised to 32-bit guests via a 32-bit machine address field
>>>>>>> in start_info.")
>>>>>>
>>>>>> The tag is here because there is the (largely theoretical?) possibility
>>>>>> for
>>>>>> a system to have no memory at all left below 4Gb, in which case creation
>>>>>> of
>>>>>> a PV64 or non-shadow HVM guest would needlessly fail?
>>>>>
>>>>> It's kid of an issue we discovered when using strict domain NUMA node
>>>>> placement. At that point the toolstack would exhaust all memory on
>>>>> node 0 and by doing that inadvertently consume all memory below 4G.
>>>>
>>>> Right, and hence also my "memory: arrange to conserve on DMA reservation",
>>>> where I'm still fighting with myself as to what to do with the comments you
>>>> gave there.
>>>
>>> Better fighting with yourself rather than fighting with me I guess ;).
>>>
>>> That change would be controversial with what we currently do on
>>> XenServer, because we don't (yet) special case the memory below 4G to
>>> not account for it in the per node free amount of memory.
>>>
>>> What would happen when you append the MEMF_no_dma flag as proposed in
>>> your commit, but the caller is also passing MEMF_exact_node with
>>> target node 0? AFAICT the allocation would still refuse to use the
>>> low 4G pool.
>>
>> Yes, DMA-ability is intended to take higher priority than exact-node
>> requests. Another node would then need choosing by the toolstack.
>>
>>> Also, your commit should also be expanded to avoid staking claims that
>>> would drain the DMA pool, as then populate_physmap() won't be able to
>>> allocate from there?
>>
>> Except that upstream claims aren't node-specific, yet, so could be
>> fulfilled my taking memory from other nodes?
>
> That's likely to change at some point, but yes, they are not node
> specific yet.
>
>> Aiui the problem would
>> only occur if that DAM-able memory was the only memory left in the
>> system.
>
> Indeed, in that scenario toolstack will be allowed to make claims that
> cover that DMA memory, yet populate physmap won't be able to consume
> those claims.
It would be (following said patch of mine), but only in order-0 chunks.
Which would make ...
> I think there are two item that need to be done for us to append
> MEMF_no_dma to populate physmap allocations:
>
> * DMA memory is not reachable by claims.
> * DMA memory must be reported to the toolstack, so it can account for
> it separately from free memory.
>
> Last point could also be solved by subtracting the DMA memory from the
> `free_pages` value returned to the toolstack.
... any of this more difficult. We don't want to completely prevent its
use, we only want to (heuristically) limit it.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |