[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Excluding init_on_free for pages for initial balloon down (Xen)


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 2 Mar 2026 12:05:57 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Jürgen Groß <jgross@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, "David Hildenbrand (Arm)" <david@xxxxxxxxxx>
  • Delivery-date: Mon, 02 Mar 2026 11:05:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.03.2026 12:01, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 02, 2026 at 09:40:29AM +0100, David Hildenbrand (Arm) wrote:
>> On 3/2/26 07:36, Jürgen Groß wrote:
>>> On 01.03.26 16:04, Marek Marczykowski-Górecki wrote:
>>>> Hi,
>>>>
>>>> Some time ago I made a change to disable scrubbing pages that are
>>>> ballooned out during system boot. I'll paste the whole commit message as
>>>> it's relevant here:
>>>>
>>>>      197ecb3802c0 xen/balloon: add runtime control for scrubbing
>>>> ballooned out pages
>>>>
>>>>      Scrubbing pages on initial balloon down can take some time,
>>>> especially
>>>>      in nested virtualization case (nested EPT is slow). When HVM/PVH
>>>> guest is
>>>>      started with memory= significantly lower than maxmem=, all the extra
>>>>      pages will be scrubbed before returning to Xen. But since most of
>>>> them
>>>>      weren't used at all at that point, Xen needs to populate them first
>>>>      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
>>>>      this slows down the guest boot by 15-30s with just 1.5GB needed
>>>> to be
>>>>      returned to Xen.
>>>>           Add runtime parameter to enable/disable it, to allow
>>>> initially disabling
>>>>      scrubbing, then enable it back during boot (for example in
>>>> initramfs).
>>>>      Such usage relies on assumption that a) most pages ballooned out
>>>> during
>>>>      initial boot weren't used at all, and b) even if they were, very few
>>>>      secrets are in the guest at that time (before any serious userspace
>>>>      kicks in).
>>>>      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT
>>>> (also
>>>>      enabled by default), controlling default value for the new runtime
>>>>      switch.
>>>>
>>>> Now, I face the same issue with init_on_free/init_on_alloc (not sure
>>>> which one applies here, probably the latter one), which several
>>>> distributions enable by default. The result is (see timestamps):
>>>>
>>>>      [2026-02-24 01:12:55] [    7.485151] xen:balloon: Waiting for
>>>> initial ballooning down having finished.
>>>>      [2026-02-24 01:14:14] [   86.581510] xen:balloon: Initial
>>>> ballooning down finished.
>>>>
>>>> But here the situation is a bit more complicated:
>>>> init_on_free/init_on_alloc applies to any pages, not just those for
>>>> balloon driver. I see two approaches to solve the issue:
>>>> 1. Similar to xen_scrub_pages=, add a runtime switch for
>>>>     init_on_free/init_on_alloc, then force them off during boot, and
>>>>     re-enable early in initramfs.
>>>> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning
>>>>     a page out.
>>>>
>>>> The first approach is likely easier to implement, but also has some
>>>> drawbacks: it may result in some kernel structures that are allocated
>>>> early to remain with garbage data in uninitialized places. While it may
>>>> not matter during early boot, such structures may survive for quite some
>>>> time, and maybe attacker can use them later on to exploit some other
>>>> bug. This wasn't really a concern with xen_scrub_pages, as those pages
>>>> were immediately ballooned out.
>>>>
>>>> The second approach sounds architecturally better, and maybe
>>>> init_on_alloc could be always bypassed during balloon out? The balloon
>>>> driver can scrub the page on its own already (which is enabled by
>>>> default). That of course assumes the issue is only about init_on_alloc,
>>>> not init_on_free (or both) - which I haven't really confirmed yet...
>>>> If going this way, I see the balloon driver does basically
>>>> alloc_page(GFP_BALLOON), where GFP_BALLOON is:
>>>>
>>>>      /* When ballooning out (allocating memory to return to Xen) we
>>>> don't really
>>>>         want the kernel to try too hard since that can trigger the oom
>>>> killer. */
>>>>      #define GFP_BALLOON \
>>>>          (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>>>>
>>>> Would that be about adding some new flag here? Or maybe there is already
>>>> one for this purpose?
>>>
>>> There doesn't seem to be a flag for that.
>>>
>>> But I think adding a new flag __GFP_NO_INIT and testing that in
>>> want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON
>>> would be a sensible approach.
>>
>> People argued against such flags in the past, because it will simply get
>> abused by arbitrary drivers that want to be smart.
> 
> Could it be named differently to discourage such usage? Maybe
> __GFP_BALLOON_OUT ?
> 
>> Whatever leaves the buddy shall be zeroed out. If there is a
>> double-zeroing happen, the latter could get optimized out by checking
>> something like user_alloc_needs_zeroing().
>>
>> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we
>> avoid double-zeroing.
> 
> It isn't just reducing double-zeroing to single zeroing. It's about
> avoiding zeroing such pages at all. If a domU is started with
> populate-on-demand, many (sometimes most) of its pages are populated in
> EPT.

ITYM "unpopulated in EPT"?

Jan

> The idea of PoD is to start guest with high static memory size, but
> low actual allocation and fake it until balloon driver kicks in and make
> the domU really not use more pages than it has. When balloon driver try
> to return those pages to the hypervisor, normally it would just take
> unallocated page one by one and made Linux not use them. But if _any_
> zeroing is happening, each page first needs to be mapped to the guest by
> the hypervisor (one trip through EPT), just to be removed from them a
> moment later...
> 
>>>> Any opinions?
>>>
>>> You are aware of the "init_on_alloc" boot parameter? So if this is fine
>>> for you, you could just use approach 1 above without any kernel patches
>>> needed.
>>
>> I don't think init_on_alloc can be enabled after boot. IIUC, 1) would
>> require a runtime switch.
> 
> Indeed.
> 




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.