|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH 0/2] xen/mm: Fix offlining pages to avoid corrupting the heap
This series fixes a bug where offlining pages could lead to unaligned
buddies being merged back onto the free list. The result is a chain of
events that can corrupt the heap and trigger a Xen panic after a few
allocations and frees.
For example, an MCE caused by faulty RAM may mark pages as offline.
When a buddy containing offlined pages is freed, those pages
are moved to dedicated isolated page lists.
reserve_offline_page() lacks alignment checks and may grow adjacent
healthy spans into unaligned buddies that violate the fundamental buddy
invariant: buddies of a given order must be aligned to their size.
Consider a valid order-2 buddy (4 pages) with this layout:
+---------------+-----------------+-----------------+----------------+
| head page | tail page 1 | tail page 2 | tail page 3 |
+---------------+-----------------+-----------------+----------------+
reserve_offline_page() then merges unaligned tail pages:
+---------------+-----------------+-----------------+----------------+
| offlined page | head page with a tail page | single page |
+---------------+-----------------+-----------------+----------------+
This leads to a Xen panic, demonstrated by the test case:
1. When a single page is allocated from this buddy, MFN 7 is allocated:
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+----------------+
| offlined page | head page tail page | allocated page |
| | Unaligned buddies are | |
| | an invariant violation! | |
+---------------+-----------------+-----------------+----------------+
2. When MFN 7 is freed, the predecessor merge in free_heap_pages()
kicks in, merging MFN 7 with its naturally aligned predecessor MFN 6:
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+
| offlined page | head page tail page |
| | Unaligned buddies are |
| | an invariant violation! |
+---------------+-----------------+-----------------+----------------+
| head page tail page |
+-----------------+----------------+
As shown, MFN 6 is double-freed. It is in two buddies:
- As the tail page of the unaligned order-1 buddy starting at MFN 5.
- As the head page of the aligned order-1 buddy starting at MFN 6.
3. The next allocations would allocate MFN 7 again, and MFN 6 as well:
Due to the double-free, after the first allocation, MFN 6 remains on
the free list even though its PGC_status is set to in-use.
MFN 4 MFN 5 MFN 6 MFN 7
+---------------+-----------------+-----------------+
| offlined page | head page tail page |
| | Unaligned buddies are |
| | an invariant violation! |
+---------------+-----------------+-----------------+----------------+
| in-use page | in-use page |
+-----------------+----------------+
4. When the next page from this buddy is allocated, get_free_page()
returns the buddy head MFN 5. If the allocation is for order-0,
alloc_heap_pages() splits page 6; otherwise, it keeps the buddy.
Either way, the allocator checks the pages' PGC_status values and
expects them not to be in-use. Because MFN 6 is already in-use,
Xen panics (example panic log):
pg[0] MFN 842adc c=0x4000000000000000 o=0 v=0 t=0
Xen BUG at common/page_alloc.c:1324
I reproduced this while running intensive NUMA claim tests combined
with page offlining. The test case in this series demonstrates the
cascading corruption that leads to the panic without intentionally
having to crash a Xen instance to test for the bug.
Running the test produces the following output (trimmed):
$ make -C tools/tests/native test TARGETS=offline-unaligned |
grep -v ' xen/'
| The buddy #5 is not aligned to order-1!
| <0>pg[0] MFN 00006 c=0x8000000000000001 o=1213 v=0 t=0
| xen/common/page_alloc.c:1324: WE INVOKED a XEN BUG in alloc_heap_pages()
The second patch fixes the root cause and updates the test case to
serve as a regression test.
This series is based on the native test environment v3 for NUMA claims:
https://lists.xen.org/archives/html/xen-devel/2026-05/msg01163.html
It in turn depends on the NUMA claim sets v7 series:
https://lists.xen.org/archives/html/xen-devel/2026-05/msg00363.html
You can pull the series with dependencies for review and testing:
$ git pull git@xxxxxxxxxx:bernhardkaindl/xen.git offline-unaligned-buddies-v1
$ make -C tools/tests/native TARGETS=offline-unaligned test
Fixes: e4865c2315 ('Page offline support in Xen side')
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@xxxxxxxxxx>
Bernhard Kaindl (2):
tools/tests/native: Test for Xen Panic after memory offlining
xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash
tools/tests/native/offline-unaligned.c | 79 ++++++++++++++++++++++++++
xen/common/page_alloc.c | 5 ++
2 files changed, 84 insertions(+)
create mode 100644 tools/tests/native/offline-unaligned.c
--
2.39.5
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |