[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved()



We currently initialize the memmap such that PG_reserved is set and the
refcount of the page is 1. In virtio-mem code, we have to manually clear
that PG_reserved flag to make memory offlining with partially hotplugged
memory blocks possible: has_unmovable_pages() would otherwise bail out on
such pages.

We want to avoid PG_reserved where possible and move to typed pages
instead. Further, we want to further enlighten memory offlining code about
PG_offline: offline pages in an online memory section. One example is
handling managed page count adjustments in a cleaner way during memory
offlining.

So let's initialize the pages with PG_offline instead of PG_reserved.
generic_online_page()->__free_pages_core() will now clear that flag before
handing that memory to the buddy.

Note that the page refcount is still 1 and would forbid offlining of such
memory except when special care is take during GOING_OFFLINE as
currently only implemented by virtio-mem.

With this change, we can now get non-PageReserved() pages in the XEN
balloon list. From what I can tell, that can already happen via
decrease_reservation(), so that should be fine.

HV-balloon should not really observe a change: partial online memory
blocks still cannot get surprise-offlined, because the refcount of these
PageOffline() pages is 1.

Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
hotplugged pages are now PageOffline() instead of PageReserved() before
they are handed over to the buddy.

We'll leave the ZONE_DEVICE case alone for now.

Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
 drivers/hv/hv_balloon.c     |  5 ++---
 drivers/virtio/virtio_mem.c | 18 ++++++++++++------
 drivers/xen/balloon.c       |  9 +++++++--
 include/linux/page-flags.h  | 12 +++++-------
 mm/memory_hotplug.c         | 16 ++++++++++------
 mm/mm_init.c                | 10 ++++++++--
 mm/page_alloc.c             | 32 +++++++++++++++++++++++---------
 7 files changed, 67 insertions(+), 35 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index e000fa3b9f978..c1be38edd8361 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -693,9 +693,8 @@ static void hv_page_online_one(struct hv_hotadd_state *has, 
struct page *pg)
                if (!PageOffline(pg))
                        __SetPageOffline(pg);
                return;
-       }
-       if (PageOffline(pg))
-               __ClearPageOffline(pg);
+       } else if (!PageOffline(pg))
+               return;
 
        /* This frame is currently backed; online the page. */
        generic_online_page(pg, 0);
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index a3857bacc8446..b90df29621c81 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1146,12 +1146,16 @@ static void virtio_mem_set_fake_offline(unsigned long 
pfn,
        for (; nr_pages--; pfn++) {
                struct page *page = pfn_to_page(pfn);
 
-               __SetPageOffline(page);
-               if (!onlined) {
+               if (!onlined)
+                       /*
+                        * Pages that have not been onlined yet were initialized
+                        * to PageOffline(). Remember that we have to route them
+                        * through generic_online_page().
+                        */
                        SetPageDirty(page);
-                       /* FIXME: remove after cleanups */
-                       ClearPageReserved(page);
-               }
+               else
+                       __SetPageOffline(page);
+               VM_WARN_ON_ONCE(!PageOffline(page));
        }
        page_offline_end();
 }
@@ -1166,9 +1170,11 @@ static void virtio_mem_clear_fake_offline(unsigned long 
pfn,
        for (; nr_pages--; pfn++) {
                struct page *page = pfn_to_page(pfn);
 
-               __ClearPageOffline(page);
                if (!onlined)
+                       /* generic_online_page() will clear PageOffline(). */
                        ClearPageDirty(page);
+               else
+                       __ClearPageOffline(page);
        }
 }
 
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index aaf2514fcfa46..528395133b4f8 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -146,7 +146,8 @@ static DECLARE_WAIT_QUEUE_HEAD(balloon_wq);
 /* balloon_append: add the given page to the balloon. */
 static void balloon_append(struct page *page)
 {
-       __SetPageOffline(page);
+       if (!PageOffline(page))
+               __SetPageOffline(page);
 
        /* Lowmem is re-populated first, so highmem pages go at list tail. */
        if (PageHighMem(page)) {
@@ -412,7 +413,11 @@ static enum bp_state increase_reservation(unsigned long 
nr_pages)
 
                xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]);
 
-               /* Relinquish the page back to the allocator. */
+               /*
+                * Relinquish the page back to the allocator. Note that
+                * some pages, including ones added via xen_online_page(), might
+                * not be marked reserved; free_reserved_page() will handle 
that.
+                */
                free_reserved_page(page);
        }
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index f04fea86324d9..e0362ce7fc109 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -30,16 +30,11 @@
  * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
  *   to read/write these pages might end badly. Don't touch!
  * - The zero page(s)
- * - Pages not added to the page allocator when onlining a section because
- *   they were excluded via the online_page_callback() or because they are
- *   PG_hwpoison.
  * - Pages allocated in the context of kexec/kdump (loaded kernel image,
  *   control pages, vmcoreinfo)
  * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
  *   not marked PG_reserved (as they might be in use by somebody else who does
  *   not respect the caching strategy).
- * - Pages part of an offline section (struct pages of offline sections should
- *   not be trusted as they will be initialized when first onlined).
  * - MCA pages on ia64
  * - Pages holding CPU notes for POWER Firmware Assisted Dump
  * - Device memory (e.g. PMEM, DAX, HMM)
@@ -1021,6 +1016,10 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy)
  * The content of these pages is effectively stale. Such pages should not
  * be touched (read/write/dump/save) except by their owner.
  *
+ * When a memory block gets onlined, all pages are initialized with a
+ * refcount of 1 and PageOffline(). generic_online_page() will
+ * take care of clearing PageOffline().
+ *
  * If a driver wants to allow to offline unmovable PageOffline() pages without
  * putting them back to the buddy, it can do so via the memory notifier by
  * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
@@ -1028,8 +1027,7 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy)
  * pages (now with a reference count of zero) are treated like free pages,
  * allowing the containing memory block to get offlined. A driver that
  * relies on this feature is aware that re-onlining the memory block will
- * require to re-set the pages PageOffline() and not giving them to the
- * buddy via online_page_callback_t.
+ * require not giving them to the buddy via generic_online_page().
  *
  * There are drivers that mark a page PageOffline() and expect there won't be
  * any further access to page content. PFN walkers that read content of random
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 27e3be75edcf7..0254059efcbe1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long 
pfn)
 /*
  * Associate the pfn range with the given zone, initializing the memmaps
  * and resizing the pgdat/zone data to span the added pages. After this
- * call, all affected pages are PG_reserved.
+ * call, all affected pages are PageOffline().
  *
  * All aligned pageblocks are initialized to the specified migratetype
  * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related
@@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, 
unsigned long nr_pages,
 
        move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
 
-       for (i = 0; i < nr_pages; i++)
-               SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
+       for (i = 0; i < nr_pages; i++) {
+               struct page *page = pfn_to_page(pfn + i);
+
+               __ClearPageOffline(page);
+               SetPageVmemmapSelfHosted(page);
+       }
 
        /*
         * It might be that the vmemmap_pages fully span sections. If that is
@@ -1959,9 +1963,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages,
         * Don't allow to offline memory blocks that contain holes.
         * Consequently, memory blocks with holes can never get onlined
         * via the hotplug path - online_pages() - as hotplugged memory has
-        * no holes. This way, we e.g., don't have to worry about marking
-        * memory holes PG_reserved, don't need pfn_valid() checks, and can
-        * avoid using walk_system_ram_range() later.
+        * no holes. This way, we don't have to worry about memory holes,
+        * don't need pfn_valid() checks, and can avoid using
+        * walk_system_ram_range() later.
         */
        walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages,
                              count_system_ram_pages_cb);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index feb5b6e8c8875..c066c1c474837 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -892,8 +892,14 @@ void __meminit memmap_init_range(unsigned long size, int 
nid, unsigned long zone
 
                page = pfn_to_page(pfn);
                __init_single_page(page, pfn, zone, nid);
-               if (context == MEMINIT_HOTPLUG)
-                       __SetPageReserved(page);
+               if (context == MEMINIT_HOTPLUG) {
+#ifdef CONFIG_ZONE_DEVICE
+                       if (zone == ZONE_DEVICE)
+                               __SetPageReserved(page);
+                       else
+#endif
+                               __SetPageOffline(page);
+               }
 
                /*
                 * Usually, we want to mark the pageblock MIGRATE_MOVABLE,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e0c8a8354be36..039bc52cc9091 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1225,18 +1225,23 @@ void __free_pages_core(struct page *page, unsigned int 
order,
         * When initializing the memmap, __init_single_page() sets the refcount
         * of all pages to 1 ("allocated"/"not free"). We have to set the
         * refcount of all involved pages to 0.
+        *
+        * Note that hotplugged memory pages are initialized to PageOffline().
+        * Pages freed from memblock might be marked as reserved.
         */
-       prefetchw(p);
-       for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
-               prefetchw(p + 1);
-               __ClearPageReserved(p);
-               set_page_count(p, 0);
-       }
-       __ClearPageReserved(p);
-       set_page_count(p, 0);
-
        if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
            unlikely(context == MEMINIT_HOTPLUG)) {
+               prefetchw(p);
+               for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
+                       prefetchw(p + 1);
+                       VM_WARN_ON_ONCE(PageReserved(p));
+                       __ClearPageOffline(p);
+                       set_page_count(p, 0);
+               }
+               VM_WARN_ON_ONCE(PageReserved(p));
+               __ClearPageOffline(p);
+               set_page_count(p, 0);
+
                /*
                 * Freeing the page with debug_pagealloc enabled will try to
                 * unmap it; some archs don't like double-unmappings, so
@@ -1245,6 +1250,15 @@ void __free_pages_core(struct page *page, unsigned int 
order,
                debug_pagealloc_map_pages(page, nr_pages);
                adjust_managed_page_count(page, nr_pages);
        } else {
+               prefetchw(p);
+               for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
+                       prefetchw(p + 1);
+                       __ClearPageReserved(p);
+                       set_page_count(p, 0);
+               }
+               __ClearPageReserved(p);
+               set_page_count(p, 0);
+
                /* memblock adjusts totalram_pages() ahead of time. */
                atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
        }
-- 
2.45.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.