[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 00/10] vnuma introduction



Hi! Another new series!

On Fri, Jul 18, 2014 at 01:49:59AM -0400, Elena Ufimtseva wrote:
[...]
> Current problems:
> 
> Warning on CPU bringup on other node
> 
>     The cpus in guest wich belong to different NUMA nodes are configured
>     to chare same l2 cache and thus considered to be siblings and cannot
>     be on the same node. One can see following WARNING during the boot time:
> 
> [    0.022750] SMP alternatives: switching to SMP code
> [    0.004000] ------------[ cut here ]------------
> [    0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:303 
> topology_sane.isra.8+0x67/0x79()
> [    0.004000] sched: CPU #1's smt-sibling CPU #0 is not on the same node! 
> [node: 1 != 0]. Ignoring dependency.
> [    0.004000] Modules linked in:
> [    0.004000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc8+ #43
> [    0.004000]  0000000000000000 0000000000000009 ffffffff813df458 
> ffff88007abe7e60
> [    0.004000]  ffffffff81048963 ffff88007abe7e70 ffffffff8102fb08 
> ffffffff00000100
> [    0.004000]  0000000000000001 ffff8800f6e13900 0000000000000000 
> 000000000000b018
> [    0.004000] Call Trace:
> [    0.004000]  [<ffffffff813df458>] ? dump_stack+0x41/0x51
> [    0.004000]  [<ffffffff81048963>] ? warn_slowpath_common+0x78/0x90
> [    0.004000]  [<ffffffff8102fb08>] ? topology_sane.isra.8+0x67/0x79
> [    0.004000]  [<ffffffff81048a13>] ? warn_slowpath_fmt+0x45/0x4a
> [    0.004000]  [<ffffffff8102fb08>] ? topology_sane.isra.8+0x67/0x79
> [    0.004000]  [<ffffffff8102fd2e>] ? set_cpu_sibling_map+0x1c9/0x3f7
> [    0.004000]  [<ffffffff81042146>] ? numa_add_cpu+0xa/0x18
> [    0.004000]  [<ffffffff8100b4e2>] ? cpu_bringup+0x50/0x8f
> [    0.004000]  [<ffffffff8100b544>] ? cpu_bringup_and_idle+0x1d/0x28
> [    0.004000] ---[ end trace 0e2e2fd5c7b76da5 ]---
> [    0.035371] x86: Booted up 2 nodes, 2 CPUs
> 
> The workaround is to specify cpuid in config file and not use SMT. But soon I 
> will come up
> with some other acceptable solution.
> 

I've also encountered this. I suspect that even if you disble SMT with
cpuid in config file, the cpu topology in guest might still be wrong.
What do hwloc-ls and lscpu show? Do you see any weird topology like one
core belongs to one node while three belong to another? (I suspect not
because your vcpus are already pinned to a specific node)

What I did was to manipulate various "id"s in Linux kernel, so that I
create a topology like 1 core : 1 cpu : 1 socket mapping. In that case
guest scheduler won't be able to make any assumption on individual CPU
sharing caches with each other.

In any case we've already manipulated various ids of CPU0, I don't see
it harm to manipulate other CPUs as well.

Thoughts?

P.S. I'm benchmarking your v5, tell me if you're interested in the
result.

Wei.

(This patch should be applied to Linux and it's by no mean suitable for
upstream as is)
---8<---
From be2b33088e521284c27d6a7679b652b688dba83d Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Tue, 17 Jun 2014 14:51:57 +0100
Subject: [PATCH] XXX: CPU topology hack!

Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx>
---
 arch/x86/xen/smp.c   |   17 +++++++++++++++++
 arch/x86/xen/vnuma.c |    2 ++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..89656fe 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -81,6 +81,15 @@ static void cpu_bringup(void)
        cpu = smp_processor_id();
        smp_store_cpu_info(cpu);
        cpu_data(cpu).x86_max_cores = 1;
+       cpu_physical_id(cpu) = cpu;
+       cpu_data(cpu).phys_proc_id = cpu;
+       cpu_data(cpu).cpu_core_id = cpu;
+       cpu_data(cpu).initial_apicid = cpu;
+       cpu_data(cpu).apicid = cpu;
+       per_cpu(cpu_llc_id, cpu) = cpu;
+       if (numa_cpu_node(cpu) != NUMA_NO_NODE)
+               cpu_data(cpu).phys_proc_id = numa_cpu_node(cpu);
+
        set_cpu_sibling_map(cpu);
 
        xen_setup_cpu_clockevents();
@@ -326,6 +335,14 @@ static void __init xen_smp_prepare_cpus(unsigned int 
max_cpus)
 
        smp_store_boot_cpu_info();
        cpu_data(0).x86_max_cores = 1;
+       cpu_physical_id(0) = 0;
+       cpu_data(0).phys_proc_id = 0;
+       cpu_data(0).cpu_core_id = 0;
+       per_cpu(cpu_llc_id, cpu) = 0;
+       cpu_data(0).initial_apicid = 0;
+       cpu_data(0).apicid = 0;
+       if (numa_cpu_node(0) != NUMA_NO_NODE)
+               per_cpu(x86_cpu_to_node_map, 0) = numa_cpu_node(0);
 
        for_each_possible_cpu(i) {
                zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
index a02f9c6..418ced2 100644
--- a/arch/x86/xen/vnuma.c
+++ b/arch/x86/xen/vnuma.c
@@ -81,7 +81,9 @@ int __init xen_numa_init(void)
        setup_nr_node_ids();
        /* Setting the cpu, apicid to node */
        for_each_cpu(cpu, cpu_possible_mask) {
+               /* Use cpu id as apicid */
                set_apicid_to_node(cpu, cpu_to_node[cpu]);
+               cpu_data(cpu).initial_apicid = cpu;
                numa_set_node(cpu, cpu_to_node[cpu]);
                cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
        }
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.