[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [PATCH v8 0/6] Device tree based NUMA support for Arm - Part#2
(The Arm device tree based NUMA support patch set contains 35 patches. In order to make stuff easier for reviewers, I split them into 3 parts: 1. Preparation. I have re-sorted the patch series. And moved independent patches to the head of the series - merged in [1] 2. Move generically usable code from x86 to common - this series. 3. Add new code to support Arm. This series only contains the second part patches. As the whole NUMA series has been reviewed for 1 round in [2], so this series would be v8) Xen memory allocation and scheduler modules are NUMA aware. But actually, on x86 has implemented the architecture APIs to support NUMA. Arm was providing a set of fake architecture APIs to make it compatible with NUMA awared memory allocation and scheduler. Arm system was working well as a single node NUMA system with these fake APIs, because we didn't have multiple nodes NUMA system on Arm. But in recent years, more and more Arm devices support multiple nodes NUMA system. So now we have a new problem. When Xen is running on these Arm devices, Xen still treat them as single node SMP systems. The NUMA affinity capability of Xen memory allocation and scheduler becomes meaningless. Because they rely on input data that does not reflect real NUMA layout. Xen still think the access time for all of the memory is the same for all CPUs. However, Xen may allocate memory to a VM from different NUMA nodes with different access speeds. This difference can be amplified in workloads inside VM, causing performance instability and timeouts. So in this patch series, we implement a set of NUMA API to use device tree to describe the NUMA layout. We reuse most of the code of x86 NUMA to create and maintain the mapping between memory and CPU, create the matrix between any two NUMA nodes. Except ACPI and some x86 specified code, we have moved other code to common. In next stage, when we implement ACPI based NUMA for Arm64, we may move the ACPI NUMA code to common too, but in current stage, we keep it as x86 only. This patch serires has been tested and booted well on one Arm64 NUMA machine and one HPE x86 NUMA machine. [1] https://lists.xenproject.org/archives/html/xen-devel/2022-06/msg00499.html [2] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01903.html --- v7 -> v8: 1. Rebase code to resolve merge conflict. 2. Change "of shift too small" to "or shift too small". 3. Add numa_fw_nid_name setting in srat_parse_regions after acpi_table_parse succeeded. v6 -> v7: 1. Restore %d for nodeid_t in dump_numa. 2. Use sizeof(page_num_node) for page_num_node size in memset. 3. Add description for using min(PADDR_BITS, BITS_PER_LONG - 1) to calculate the shift when only one node is in the system. 4. Use %pd for domain print in dump_numa. 5. Add __init for arch_numa_unavailable. 6. Use __ro_after_init for mem_hotplug. 7. Use "???" instead of "NONAME" for unset numa_fw_nid_name. 8. Fix code-style. v5 -> v6: 1. Revert arch_numa_broken to arch_numa_disabled, as acpi_numa can be set to -1 by users. So acpi_numa < 0 does not mean a broken firmware. 2. Replace numa_scan_node to numa_process_nodes in commit log. 3. Limit the scope of page_num_node, vnuma and page of numa_setup function. 4. Use memset to init page_num_node instead of for_each_online_node. 5. Use %u instead of %d for nodeid_t and j in numa_setup print messages. 6. Use min(PADDR_BITS, BITS_PER_LONG - 1) to calculate the shift when only one node is in the system. 7. Drop the marco: node_to_first_cpu(node) 8. Use arch_numa_unavailable to replace arch_numa_disabled for acpi_numa <= 0. 9. Remove Kconfig for HAS_NUMA_NODE_FWID. 10. Use numa_fw_nid_name for NUMA implementation to set their fw NUMA node name for print messages. v4 -> v5: 1. Use arch_numa_broken instead of arch_numa_disabled for acpi_numa < 0 check. Because arch_numa_disabled might include acpi_numa < 0 (init failed) and acpi_numa == 0 (no data or data no init) cases. 2. Use nodeid_t instead of uint8_t for memnodemap. 3. Restore to use typeof(*memnodemap) for _memnodemap, this will avoid the further adjustments for _memnodemap's type. 4. Use __ro_after_init for numa_off. 5. Use pointer-to-const for proper function parameters. 6. Use unsigned int for variables that are not realy used for node ID. 7. Fix code comments code-style and adjust the length. 8. Fix code-styles. 9. Rename numa_scan_nodes to numa_process_nodes. 10. Defer introduce arch_numa_disabled for acpi_numa <= 0. And remove the paramter init_as_disable of arch_numa_disabled. 11. Fix typo "expandsion". 12. Fix Indentation for l1tf_safe_maddr. 13. Remove double blank lines. 14. Add a space between for_each_node_mask and '('. Add a space page_list_for_each and '('. 15. Use bool for nodes_cover_memory return value. 16. Use a plain "int ret" to record compute_hash_shift return value. 17. Add a blank line before the function's main "return". 18. Add new Kconfig option HAS_NUMA_NODE_FWID to common/Kconfig. v3 -> v4: 1. Add init_as_disable as arch_numa_disabled parameter in the patche where use it. 2. Drop unnecessary "else" from arch_numa_setup, and fix its indentation. 3. Restore compute_hash_shift's return value to int. 4. Remove unnecessary parentheses for macros. 5. Use unsigned int for proper variables. 6. Fix some code-style. 7. Move arch_get_ram_range function comment to header file. 8. Use bool for found, and add a new "err" for the return value of arch_get_ram_range. 9. Use -ENODATA instead of -EINVAL for non-RAM type ranges. 10. Use bool as return value for functions that only return 0/1 or 0/-EINVAL. 11. Move mem_hotplug to a proper place in mm.h 12. Remove useless "size" in numa_scan_nodes. 13. Add CONFIG_HAS_NUMA_NODE_FWID to gate print the mapping between node id and architectural node id (fw node id). v2 -> v3: 1. Drop enumeration of numa status. 2. Use helpers to get/update acpi_numa. 3. Insert spaces among parameters of strncmp in numa_setup. 4. Drop helpers to access mem_hotplug. Export mem_hotplug for all arch. 5. Remove acpi.h from common/numa.c. 6. Rename acpi_scan_nodes to numa_scan_nodes. 7. Replace u8 by uint8_t for memnodemap. 8. Use unsigned int for memnode_shift and adjust related functions (compute_hash_shift, populate_memnodemap) to use correct types for return values or parameters. 9. Use nodeid_t for nodeid and node numbers. 10. Use __read_mostly and __ro_after_init for appropriate variables. 11. Adjust the __read_mostly and __initdata location for some variables. 12. Convert from plain int to unsigned for cpuid and other proper 13. Remove unnecessary change items in history. 14. Rename arch_get_memory_map to arch_get_ram_range. 15. Use -ENOENT instead of -ENODEV to indicate end of memory map. 16. Add description to code comment that arch_get_ram_range returns RAM range in [start, end) format. 17. Rename bad_srat to numa_fw_bad. 18. Rename node_to_pxm to numa_node_to_arch_nid. 19. Merge patch#7 and #8 into patch#6. 20. Move NR_NODE_MEMBLKS from x86/acpi.h to common/numa.h 22. Use 2-64 for node range. v1 -> v2: 1. Refine the commit messages of several patches. 2. Merge v1 patch#9,10 into one patch. Introduce the new functions in the same patch that this patch will be used first time. 3. Fold if ( end > mem_hotplug ) to mem_hotplug_update_boundary, in this case, we can drop mem_hotplug_boundary. 4. Remove fw_numa, use enumeration to replace numa_off and acpi_numa. 5. Correct return value of srat_disabled. 6. Introduce numa_enabled_with_firmware. 7. Refine the justification of using !node_data[nid].node_spanned_pages. 8. Use ASSERT to replace VIRTUAL_BUG_ON in phys_to_nid. 9. Adjust the conditional express for ASSERT. 10. Move MAX_NUMNODES from xen/numa.h to asm/numa.h for x86. 11. Use conditional macro to gate MAX_NUMNODES for other architectures. 12. Use arch_get_memory_map to replace arch_get_memory_bank_range and arch_get_memory_bank_number. 13. Remove the !start || !end check, because caller guarantee these two pointers will not be NULL. 14. Add code comment for numa_update_node_memblks to explain: Assumes all memory regions belonging to a single node are in one chunk. Holes between them will be included in the node. 15. Merge this single patch instead of serval patches to move x86 SRAT code to common. 16. Export node_to_pxm to keep pxm information in NUMA scan nodes error messages. 17. Change the code style to target file's Xen code-style. 18. Adjust some __init and __initdata for some functions and variables. 19. Replace CONFIG_ACPI_NUMA by CONFIG_NUMA. Replace "SRAT" texts. 20. Turn numa_scan_nodes to static. 21. Change NR_NUMA_NODES upper bound from 4095 to 255. Wei Chen (6): xen/x86: Provide helpers for common code to access acpi_numa xen/x86: move generically usable NUMA code from x86 to common xen/x86: Use ASSERT instead of VIRTUAL_BUG_ON for phys_to_nid xen/x86: use arch_get_ram_range to get information from E820 map xen/x86: move NUMA process nodes nodes code from x86 to common xen: introduce a Kconfig option to configure NUMA nodes number xen/arch/Kconfig | 11 + xen/arch/x86/include/asm/acpi.h | 2 - xen/arch/x86/include/asm/mm.h | 2 - xen/arch/x86/include/asm/numa.h | 61 +-- xen/arch/x86/include/asm/setup.h | 1 - xen/arch/x86/mm.c | 2 - xen/arch/x86/numa.c | 441 +---------------- xen/arch/x86/smpboot.c | 2 +- xen/arch/x86/srat.c | 336 ++----------- xen/common/Makefile | 1 + xen/common/numa.c | 803 +++++++++++++++++++++++++++++++ xen/common/page_alloc.c | 2 + xen/include/xen/mm.h | 2 + xen/include/xen/numa.h | 96 +++- 14 files changed, 961 insertions(+), 801 deletions(-) create mode 100644 xen/common/numa.c -- 2.25.1
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |