Xen project Mailing List

Re: [PATCH v2] x86/domain: adjust limitation on shared_info allocation below 4G

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 5 Feb 2026 15:08:29 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AEScy6YTAF/WDEzKaM9B99PZ3QvZ8ya0NLitb9w4QPU=; b=vsDQFyXaGS/Z1s+OJoJXzxX+6OqnvKX6tRJrTzNBMiz0mK7j/jS0gu+SaX6Hw7yyJh4zzt2fFzTINqtElXL3WaajIFijZd6UU4PJohQH6bJKoPuSxEl4xP+BmIcGlRCadGdgB6/CoyTXJowPWkFhw/7paHiBIXJncqhetLjoT0kgdr6tqCdYzuVG4sjUOfjiybLtV/hBi/g/mLbMIFSpxemprN70whqG02yabafW9DfkFRgUnjHy85lQS2f+L4Jnruje61Oiwe1JfGnS38ZFm929c1r3xt/v2dn9eD10xCW5Q7eRcf+ibnCfsuELxr5LTMy+mvRVccOmGZCAgD3PXA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=cbPvlR6uQmWHlWmMFJvXoZM2Z6rn6cUUGgk5kXlkGWmu0Ys8RBZQ7IViI0Z/OEExezim7UVvEw5JTKlp7BGElEmSffPYvBcKCd/9778Ae4sZoUuXpyinppwfQqTvOqphLsSi4LstmXy2irN7zZqDlp1WZSs+NOHs02fSlI0ZMmYYTE9Aqskr62lK8t5fjk4mXfI4iKLNAaM9YHM98OWYvElwzji6C5u8n0h4jC6G3L8KfU3bFcMS66ECpV46LBEfB6o+mVaGQsVS2YZX626zVz3kNX9Q+nERVahCYTTz5zsS6Mz7ALdn+otDuZxUpu4I7i/tPDhN7YNpn+kyqme5/A==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 05 Feb 2026 14:08:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Feb 05, 2026 at 09:29:53AM +0100, Jan Beulich wrote: > On 04.02.2026 17:46, Roger Pau Monné wrote: > > On Wed, Feb 04, 2026 at 04:08:21PM +0100, Jan Beulich wrote: > >> On 04.02.2026 15:52, Roger Pau Monné wrote: > >>> On Wed, Feb 04, 2026 at 03:06:52PM +0100, Jan Beulich wrote: > >>>> On 04.02.2026 13:25, Roger Pau Monne wrote: > >>>>> The limitation of shared_info being allocated below 4G to fit in the > >>>>> start_info field only applies to 32bit PV guests. On 64bit PV guests > >>>>> the > >>>>> start_info field is 64bits wide. HVM guests don't use start_info at > >>>>> all. > >>>>> > >>>>> Drop the restriction in arch_domain_create() and instead free and > >>>>> re-allocate the page from memory below 4G if needed in switch_compat(), > >>>>> when the guest is set to run in 32bit PV mode. > >>>>> > >>>>> Fixes: 3cadc0469d5c ("x86_64: shared_info must be allocated below 4GB > >>>>> as it is advertised to 32-bit guests via a 32-bit machine address field > >>>>> in start_info.") > >>>> > >>>> The tag is here because there is the (largely theoretical?) possibility > >>>> for > >>>> a system to have no memory at all left below 4Gb, in which case creation > >>>> of > >>>> a PV64 or non-shadow HVM guest would needlessly fail? > >>> > >>> It's kid of an issue we discovered when using strict domain NUMA node > >>> placement. At that point the toolstack would exhaust all memory on > >>> node 0 and by doing that inadvertently consume all memory below 4G. > >> > >> Right, and hence also my "memory: arrange to conserve on DMA reservation", > >> where I'm still fighting with myself as to what to do with the comments you > >> gave there. > > > > Better fighting with yourself rather than fighting with me I guess ;). > > > > That change would be controversial with what we currently do on > > XenServer, because we don't (yet) special case the memory below 4G to > > not account for it in the per node free amount of memory. > > > > What would happen when you append the MEMF_no_dma flag as proposed in > > your commit, but the caller is also passing MEMF_exact_node with > > target node 0? AFAICT the allocation would still refuse to use the > > low 4G pool. > > Yes, DMA-ability is intended to take higher priority than exact-node > requests. Another node would then need choosing by the toolstack. > > > Also, your commit should also be expanded to avoid staking claims that > > would drain the DMA pool, as then populate_physmap() won't be able to > > allocate from there? > > Except that upstream claims aren't node-specific, yet, so could be > fulfilled my taking memory from other nodes? That's likely to change at some point, but yes, they are not node specific yet. > Aiui the problem would > only occur if that DAM-able memory was the only memory left in the > system. Indeed, in that scenario toolstack will be allowed to make claims that cover that DMA memory, yet populate physmap won't be able to consume those claims. I think there are two item that need to be done for us to append MEMF_no_dma to populate physmap allocations: * DMA memory is not reachable by claims. * DMA memory must be reported to the toolstack, so it can account for it separately from free memory. Last point could also be solved by subtracting the DMA memory from the `free_pages` value returned to the toolstack. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.