|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Excluding init_on_free for pages for initial balloon down (Xen)
On 02.03.2026 12:01, Marek Marczykowski-Górecki wrote: > On Mon, Mar 02, 2026 at 09:40:29AM +0100, David Hildenbrand (Arm) wrote: >> On 3/2/26 07:36, Jürgen Groß wrote: >>> On 01.03.26 16:04, Marek Marczykowski-Górecki wrote: >>>> Hi, >>>> >>>> Some time ago I made a change to disable scrubbing pages that are >>>> ballooned out during system boot. I'll paste the whole commit message as >>>> it's relevant here: >>>> >>>> 197ecb3802c0 xen/balloon: add runtime control for scrubbing >>>> ballooned out pages >>>> >>>> Scrubbing pages on initial balloon down can take some time, >>>> especially >>>> in nested virtualization case (nested EPT is slow). When HVM/PVH >>>> guest is >>>> started with memory= significantly lower than maxmem=, all the extra >>>> pages will be scrubbed before returning to Xen. But since most of >>>> them >>>> weren't used at all at that point, Xen needs to populate them first >>>> (from populate-on-demand pool). In nested virt case (Xen inside KVM) >>>> this slows down the guest boot by 15-30s with just 1.5GB needed >>>> to be >>>> returned to Xen. >>>> Add runtime parameter to enable/disable it, to allow >>>> initially disabling >>>> scrubbing, then enable it back during boot (for example in >>>> initramfs). >>>> Such usage relies on assumption that a) most pages ballooned out >>>> during >>>> initial boot weren't used at all, and b) even if they were, very few >>>> secrets are in the guest at that time (before any serious userspace >>>> kicks in). >>>> Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT >>>> (also >>>> enabled by default), controlling default value for the new runtime >>>> switch. >>>> >>>> Now, I face the same issue with init_on_free/init_on_alloc (not sure >>>> which one applies here, probably the latter one), which several >>>> distributions enable by default. The result is (see timestamps): >>>> >>>> [2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for >>>> initial ballooning down having finished. >>>> [2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial >>>> ballooning down finished. >>>> >>>> But here the situation is a bit more complicated: >>>> init_on_free/init_on_alloc applies to any pages, not just those for >>>> balloon driver. I see two approaches to solve the issue: >>>> 1. Similar to xen_scrub_pages=, add a runtime switch for >>>> init_on_free/init_on_alloc, then force them off during boot, and >>>> re-enable early in initramfs. >>>> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning >>>> a page out. >>>> >>>> The first approach is likely easier to implement, but also has some >>>> drawbacks: it may result in some kernel structures that are allocated >>>> early to remain with garbage data in uninitialized places. While it may >>>> not matter during early boot, such structures may survive for quite some >>>> time, and maybe attacker can use them later on to exploit some other >>>> bug. This wasn't really a concern with xen_scrub_pages, as those pages >>>> were immediately ballooned out. >>>> >>>> The second approach sounds architecturally better, and maybe >>>> init_on_alloc could be always bypassed during balloon out? The balloon >>>> driver can scrub the page on its own already (which is enabled by >>>> default). That of course assumes the issue is only about init_on_alloc, >>>> not init_on_free (or both) - which I haven't really confirmed yet... >>>> If going this way, I see the balloon driver does basically >>>> alloc_page(GFP_BALLOON), where GFP_BALLOON is: >>>> >>>> /* When ballooning out (allocating memory to return to Xen) we >>>> don't really >>>> want the kernel to try too hard since that can trigger the oom >>>> killer. */ >>>> #define GFP_BALLOON \ >>>> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) >>>> >>>> Would that be about adding some new flag here? Or maybe there is already >>>> one for this purpose? >>> >>> There doesn't seem to be a flag for that. >>> >>> But I think adding a new flag __GFP_NO_INIT and testing that in >>> want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON >>> would be a sensible approach. >> >> People argued against such flags in the past, because it will simply get >> abused by arbitrary drivers that want to be smart. > > Could it be named differently to discourage such usage? Maybe > __GFP_BALLOON_OUT ? > >> Whatever leaves the buddy shall be zeroed out. If there is a >> double-zeroing happen, the latter could get optimized out by checking >> something like user_alloc_needs_zeroing(). >> >> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we >> avoid double-zeroing. > > It isn't just reducing double-zeroing to single zeroing. It's about > avoiding zeroing such pages at all. If a domU is started with > populate-on-demand, many (sometimes most) of its pages are populated in > EPT. ITYM "unpopulated in EPT"? Jan > The idea of PoD is to start guest with high static memory size, but > low actual allocation and fake it until balloon driver kicks in and make > the domU really not use more pages than it has. When balloon driver try > to return those pages to the hypervisor, normally it would just take > unallocated page one by one and made Linux not use them. But if _any_ > zeroing is happening, each page first needs to be mapped to the guest by > the hypervisor (one trip through EPT), just to be removed from them a > moment later... > >>>> Any opinions? >>> >>> You are aware of the "init_on_alloc" boot parameter? So if this is fine >>> for you, you could just use approach 1 above without any kernel patches >>> needed. >> >> I don't think init_on_alloc can be enabled after boot. IIUC, 1) would >> require a runtime switch. > > Indeed. >
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |