[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/xen: fix balloon target initialization for PVH dom0


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Wed, 2 Apr 2025 16:23:02 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, x86@xxxxxxxxxx, "H. Peter Anvin" <hpa@xxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
  • Delivery-date: Wed, 02 Apr 2025 14:23:09 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.04.25 13:36, Roger Pau Monne wrote:
PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to
dom0 are re-used as scratch memory to map foreign and grant pages.  Such
logic relies on reporting those unpopulated ranges as RAM to Linux, and
mark them as reserved.  This way Linux creates the underlying page
structures required for metadata management.

Such approach works fine on PV because the initial balloon target is
calculated using specific Xen data, that doesn't take into account the
memory type changes described above.  However on HVM and PVH the initial
balloon target is calculated using get_num_physpages(), and that function
does take into account the unpopulated RAM regions used as scratch space
for remote domain mappings.

This leads to PVH dom0 having an incorrect initial balloon target, which
causes malfunction (excessive memory freeing) of the balloon driver if the
dom0 memory target is later adjusted from the toolstack.

Fix this by using xen_released_pages to account for any pages that are part
of the memory map, but are already unpopulated when the balloon driver is
initialized.  This accounts for any regions used for scratch remote
mappings.

Take the opportunity to unify PV with PVH/HVM guests regarding the usage of
get_num_physpages(), as that avoids having to add different logic for PV vs
PVH in both balloon_add_regions() and arch_xen_unpopulated_init().

Much like a6aa4eb994ee, the code in this changeset should have been part of
38620fc4e893.

Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if 
available')
Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
I think it's easier to unify the PV and PVH/HVM paths here regarding the
usage of get_num_physpages(), as otherwise the fix needs to add further PV
vs HVM divergences in both balloon_add_regions() and
arch_xen_unpopulated_init(), but it also has a higher risk of breaking PV
in subtle ways.
---
  arch/x86/xen/enlighten.c |  7 +++++++
  drivers/xen/balloon.c    | 19 +++++++++++--------
  2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 43dcd8c7badc..651bb206434c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -466,6 +466,13 @@ int __init arch_xen_unpopulated_init(struct resource **res)
                        xen_free_unpopulated_pages(1, &pg);
                }
+ /*
+                * Account for the region being in the physmap but unpopulated.
+                * The value in xen_released_pages is used by the balloon
+                * driver to know how much of the physmap is unpopulated and
+                * set an accurate initial memory target.
+                */
+               xen_released_pages += xen_extra_mem[i].n_pfns;
                /* Zero so region is not also added to the balloon driver. */
                xen_extra_mem[i].n_pfns = 0;
        }
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 163f7f1d70f1..085d418ee6da 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -698,7 +698,15 @@ static void __init balloon_add_regions(void)
                for (pfn = start_pfn; pfn < extra_pfn_end; pfn++)
                        balloon_append(pfn_to_page(pfn));
- balloon_stats.total_pages += extra_pfn_end - start_pfn;
+               /*
+                * Extra regions are accounted for in the physmap, but need
+                * decreasing from current_pages to balloon down the initial
+                * allocation, because they are already accounted for in
+                * total_pages.
+                */
+               BUG_ON(extra_pfn_end - start_pfn >=
+                      balloon_stats.current_pages);

Maybe instead of crashing the system disable ballooning and print some
diagnostics why this happened?

+               balloon_stats.current_pages -= extra_pfn_end - start_pfn;
        }
  }
@@ -711,13 +719,8 @@ static int __init balloon_init(void) pr_info("Initialising balloon driver\n"); -#ifdef CONFIG_XEN_PV
-       balloon_stats.current_pages = xen_pv_domain()
-               ? min(xen_start_info->nr_pages - xen_released_pages, max_pfn)
-               : get_num_physpages();
-#else
-       balloon_stats.current_pages = get_num_physpages();
-#endif
+       BUG_ON(xen_released_pages >= get_num_physpages());

Again, I'd rather just disable ballooning instead of crashing the system.

+       balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
        balloon_stats.target_pages  = balloon_stats.current_pages;
        balloon_stats.balloon_low   = 0;
        balloon_stats.balloon_high  = 0;

Other than that I think your approach is fine.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.