[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] arm64: Approach for DT based NUMA and issues





On 26/11/16 06:59, Vijay Kilari wrote:
Hi,

Hi Vijay,

This mail is mixing two distinct problems:
        1) Making Xen NUMA-aware
        2) Make DOM0 NUMA-aware

As mentioned in another part of this thread, those problems should be one by one rather than together.

I will focus on problem 1) while answering this e-mail.

   Below basic write up on DT based NUMA feature support for arm64 platform.
I have attempted to get NUMA support, However I face below issues. I would like
to discuss these issues. Please let me know your comments on this. Yet to look
at ACPI support.

DT based NUMA support for arm64 platform
========================================
For Xen boot on NUMA arm64 platform, Xen needs to parse
CPU and Memory nodes for DT based booting mechanism. Here I would
like to discuss about DT based booting mechanism and the issues
related to it.

1) Parsing CPU and Memory nodes:
---------------------------------------------------

The numa information associated for CPU and Memory are passed in DT
using numa-node-id u32-interger value. More information about NUMA binding
is available in linux kernel @ Documentation/devicetree/bindings/numa.txt

Similar to Linux kernel, cpu and memory nodes of DT are parsed
and numa-node-id information is populated in cpu_parsed and memory_parsed
node_t mask.

When booting in UEFI mode, UEFI passes memory information to Dom0
using EFI memory descriptor table and deletes the memory nodes
from the host DT. However to fetch the memory numa node id, memory DT
node should not be deleted by EFI stub.

ISSUE: When memory node is _NOT_ deleted by EFI stub from host DT,
Xen identifies the memory node [xen/arch/arm/bootfdt.c, early_scan_node() ]
which adds memory ranges to bootinfo.mem structure there by adding duplicate
entry and eventually initialization fails.

Possible Solution: While adding new memory region to bootinfo.mem, check for
duplicate entries and back off if entry is already available from UEFI mem info
table.

I think we should have a different approach. I actually like the approach suggested by Andre in [1]), which is if the UEFI memory mapped exists (i.e bootinfo.mem is already filled), then DT is only used to get NUMA node information.


2) Parsing CPU nodes:
---------------------------------
The CPU nodes are parsed to extract numa-node-id info for each cpu and
cpu_nodemask is populated.

The MPIDR register value is read for each CPU and cpu_to_node[] is populated.

To emphase here, cpu_to_node will be indexed using Xen CPUID and not MPIDR. They can be different and Xen does not have a clue of the MPIDR except in very few places.


3) Parsing Memory nodes:
--------------------------------------
For all the DT memory nodes in the flattend DT, start address, size
and numa-node-id value is extracted and stored in "node_memblk_range[]"
which is of type struct node.

Each bootinfo.mem entry from UEFI is verified against node_memblk_range[] and
NODE_DATA is populated with start PFN, end PFN and nodeid.

Populating memnodemap:

The memnodemap[] is allocated from heap and using the NODE_DATA structure,
the memnodemap[] is populated with nodeid for each page index.

This memnodemap info is used to fetch memory node id for a given page
by calling phys_to_nid() by memory allocator.

ISSUE: phys_to_nid() is called by memory allocator before memnodemap[]
is initialized.

Since memnodemap[] is allocated from heap, and hence boot allocator should
be initialized. The boot_allocator() needs phys_to_nid() which is not
available untill memnodemap[] is initialized. So there is deadlock situation
during initialization. To overcome this phsy_to_nid() should rely on
node_memblk_range[] to get nodeid untill memnodemap[] is initialized.

Looking at the code, boot_allocator() does not need phys_to_nid until the end. So it would be perfectly fine to use alloc_boot_pages to allocate memnodemap.


4) Generating memory nodes for DOM0
---------------------------------------------------------
Linux kernel device drivers that uses devm_zalloc(), tries to allocate memory
from local memory node. So Dom0 needs to have memory allocated on all the
available nodes of the system.

Ex: SMMU driver of device on node 1 tries to allocate memory
on node 1.

ISSUE:
 - Dom0's memory should be split across all the available memory nodes
   of the system and memory nodes should be generated accordingly.
 - Memory DT node generated by Xen for Dom0 should populate numa-node-id
   information.

If you drop numa-node-id property from every node, DOM0 will not try to use NUMA. Is there any specific reason to not do that?

Those properties could be re-introduced later on when vNUMA will be brought up.

Regards,

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-11/msg02499.html

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.