[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
Hi Stefano, > -----Original Message----- > From: Stefano Stabellini <sstabellini@xxxxxxxxxx> > Sent: 2021年9月9日 6:32 > To: Wei Chen <Wei.Chen@xxxxxxx> > Cc: Julien Grall <julien@xxxxxxx>; Stefano Stabellini > <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx; Bertrand Marquis > <Bertrand.Marquis@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx> > Subject: RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse > device tree memory node > > On Wed, 8 Sep 2021, Wei Chen wrote: > > > > >>> @@ -55,6 +57,79 @@ static int __init > > > > >> dtb_numa_processor_affinity_init(nodeid_t node) > > > > >>> return 0; > > > > >>> } > > > > >>> > > > > >>> +/* Callback for parsing of the memory regions affinity */ > > > > >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node, > > > > >>> + paddr_t start, paddr_t size) > > > > >>> +{ > > > > >>> + struct node *nd; > > > > >>> + paddr_t end; > > > > >>> + int i; > > > > >>> + > > > > >>> + if ( srat_disabled() ) > > > > >>> + return -EINVAL; > > > > >>> + > > > > >>> + end = start + size; > > > > >>> + if ( num_node_memblks >= NR_NODE_MEMBLKS ) > > > > >>> + { > > > > >>> + dprintk(XENLOG_WARNING, > > > > >>> + "Too many numa entry, try bigger > NR_NODE_MEMBLKS > > > \n"); > > > > >>> + bad_srat(); > > > > >>> + return -EINVAL; > > > > >>> + } > > > > >>> + > > > > >>> + /* It is fine to add this area to the nodes data it will be > > > used > > > > >> later */ > > > > >>> + i = conflicting_memblks(start, end); > > > > >>> + /* No conflicting memory block, we can save it for later > usage > > > */; > > > > >>> + if ( i < 0 ) > > > > >>> + goto save_memblk; > > > > >>> + > > > > >>> + if ( memblk_nodeid[i] == node ) { > > > > >>> + /* > > > > >>> + * Overlaps with other memblk in the same node, warning > > > here. > > > > >>> + * This memblk will be merged with conflicted memblk > later. > > > > >>> + */ > > > > >>> + printk(XENLOG_WARNING > > > > >>> + "DT: NUMA NODE %u (%"PRIx64 > > > > >>> + "-%"PRIx64") overlaps with itself (%"PRIx64"- > > > > >> %"PRIx64")\n", > > > > >>> + node, start, end, > > > > >>> + node_memblk_range[i].start, > > > node_memblk_range[i].end); > > > > >>> + } else { > > > > >>> + /* > > > > >>> + * Conflict with memblk in other node, this is an error. > > > > >>> + * The NUMA information is invalid, NUMA will be turn > off. > > > > >>> + */ > > > > >>> + printk(XENLOG_ERR > > > > >>> + "DT: NUMA NODE %u (%"PRIx64"-%" > > > > >>> + PRIx64") overlaps with NODE %u (%"PRIx64"- > > > > %"PRIx64")\n", > > > > >>> + node, start, end, memblk_nodeid[i], > > > > >>> + node_memblk_range[i].start, > > > node_memblk_range[i].end); > > > > >>> + bad_srat(); > > > > >>> + return -EINVAL; > > > > >>> + } > > > > >>> + > > > > >>> +save_memblk: > > > > >>> + nd = &nodes[node]; > > > > >>> + if ( !node_test_and_set(node, memory_nodes_parsed) ) { > > > > >>> + nd->start = start; > > > > >>> + nd->end = end; > > > > >>> + } else { > > > > >>> + if ( start < nd->start ) > > > > >>> + nd->start = start; > > > > >>> + if ( nd->end < end ) > > > > >>> + nd->end = end; > > > > >>> + } > > > > >>> + > > > > >>> + printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n", > > > > >>> + node, start, end); > > > > >>> + > > > > >>> + node_memblk_range[num_node_memblks].start = start; > > > > >>> + node_memblk_range[num_node_memblks].end = end; > > > > >>> + memblk_nodeid[num_node_memblks] = node; > > > > >>> + num_node_memblks++; > > > > >> > > > > >> > > > > >> Is it possible to have non-contigous ranges of memory for a > single > > > NUMA > > > > >> node? > > > > >> > > > > >> Looking at the DT bindings and Linux implementation, it seems > > > possible. > > > > >> Here, it seems that node_memblk_range/memblk_nodeid could handle > it, > > > > >> but nodes couldn't. > > > > > > > > > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI > > > allow > > > > > non-contiguous ranges of memory for a single NUMA node too? > > > > > > > > I couldn't find any restriction for ACPI. Although, I only briefly > > > > looked at the spec. > > > > > > > > > If yes, I think > > > > > this will affect x86 ACPI NUMA too. In next version, we plan to > merge > > > > > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init > into > > > a > > > > > neutral function. So we can fix them at the same time. > > > > > > > > > > If not, maybe we have to keep the diversity for dtb and ACPI here. > > > > > > > > I am not entirely sure what you mean. Are you saying if ACPI doesn't > > > > allow non-contiguous ranges of memory, then we should keep the > > > > implementation separated? > > > > > > > > If so, then I disagree with that. It is fine to have code that > supports > > > > more than what a firmware table supports. The main benefit is less > code > > > > and therefore less long term maintenance (with the current solution > we > > > > would need to check both the ACPI and DT implementation if there is > a > > > > bug in one). > > > > > > > > > > Yes, I agree. > > > > > > > I am looking for some methods to address this comment. Current "nodes" > > has not considered the situation of memory addresses of different NUMA > > nodes can be interleaved. > > > > This code exists in x86 NUMA implementation. I think it may be based on > > one early version of Linux x86 NUMA implementation. In recent Linux > > code, both ACPI/numa/srat.c[1] and x86 NUMA code[2] are not using > > "nodes" to record NUMA memory address boundary. They don't depend > > on "nodes" to do sanity check. > > > > To fix it, we'd better to upgrade the x86 NUMA driver. It will make > > a great affect for Xen-x86. And I think it might out of this series > > scope. Can we create another thread to discuss about it? > > > > Or could you give me suggestions that we can use some simple ways > > to fix it? > > It looks like that we would have to replace all the node->start / > node->end checks with node_memblk_range checks. There are a few of them > in valid_numa_range, conflicting_memblks, cutoff_node, > nodes_cover_memory. It wouldn't be trivial. > > Although I do think that non-contiguous memory for NUMA nodes is > important to support, the patch series is already 40 patches. I don't > think it is a good idea to add other significant changes to it. I > wouldn't upgrade the x86 NUMA driver now. If we can't find a better way, > we can proceed as you are doing in this version, with the known gap that > we can't deal with non-contigious memory for NUMA nodes, and fix it with > a follow-up series later. In that case we would want to have an explicit > check for non-contiguous memory for NUMA nodes in > dtb_numa_memory_affinity_init and error out if found. > Yes, I think this may be a more appropriate method at present. I would add some code to do explicit check and give warning/error. > > > Also, on Linux, NUMA implementations for x86 are different from Arm64 > > and RISC-V implementations.[3] > > > > [1] > https://github.com/torvalds/linux/blob/master/drivers/acpi/numa/srat.c > > [2] https://github.com/torvalds/linux/blob/master/arch/x86/mm/numa.c > > [3] > https://github.com/torvalds/linux/blob/master/drivers/base/arch_numa.c > > > In general, I like the idea of sharing code as much as possible between > architectures (x86, ARM, etc.) and between DT/ACPI because it makes the > code maintainance easier and one might even gain certain features for > free. > > However, the excercise of sharing code shouldn't take significant > additional efforts. In fact, it should decrease the overall effort: > instead of writing new code one just take existing code and move it to > common. > > In this instance, I think it would be good to be able to share the NUMA > initialization code between x86/ARM and ACPI/DT if it doesn't cause > extra efforts. > > Here the extra effort that my previous comment might cause doesn't > derive from x86/ARM or DT/ACPI code sharing. It derives from the fact > that our existing code doesn't deal with non-contigous memory for NUMA > nodes unfortunately. That is something we need to find a way to cope > with anyway. Yes. I posted above links didn't mean to create diversity for Arm/x86. Though, I'm not sure exactly why Linux does this. But I think for Xen, we still can try to share code for Arm/x86 and DT/ACPI. Cheers, Wei Chen
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |