[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86: correct socket_cpumask allocation for AP



On 07/08/2015 12:17 PM, Dario Faggioli wrote:
On Wed, 2015-07-08 at 16:38 +0100, Jan Beulich wrote:
On 08.07.15 at 17:11, <dario.faggioli@xxxxxxxxxx> wrote:
On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
On 08.07.15 at 11:36, <chao.p.peng@xxxxxxxxxxxxxxx> wrote:
@@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
  static void smp_store_cpu_info(int id)
  {
      struct cpuinfo_x86 *c = cpu_data + id;
+    unsigned int socket;
*c = boot_cpu_data;
      if ( id != 0 )
+    {
          identify_cpu(c);
+ socket = cpu_to_socket(id);
+        if ( !socket_cpumask[socket] )
+        {
+            socket_cpumask[socket] = secondary_socket_cpumask;
+            secondary_socket_cpumask = NULL;
I don't think this will build with small enough NR_CPUS.

And it *does* *not* fix the issue on my box.
I.e. bad analysis (albeit it seemed correct to me)

Same here, and in fact I triple checked that I had the patch really
applied... and, yes, it is, and it's still crashing, with the same
(reported) dump as the one we find in Osstest's failure, as reported by
Ian.

  _and_ new code not tested.

Looking another time, both me and Osstest are probably seeing a
different issue, than the one Boris is facing. I don't see Boris' Oops,
so I can't be sure, but in my case, this is happening in
set_cpu_sibling_map(), called from smp_prepare_cpus() on the boot CPU,
not during secondary CPUs bringup.

I see it from start_secondary():

...
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    16
(XEN) RIP:    e008:[<ffff82d080189051>] set_cpu_sibling_map+0x65/0x37e
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000010
(XEN) rdx: 0000000000000010   rsi: 00000033bc7c8400   rdi: 0000000000000010
(XEN) rbp: ffff83083be0fec0   rsp: ffff83083be0fe60   r8: ffff83083be0fe88
(XEN) r9:  0000000000014000   r10: ffff82cfffdfb0f0   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000009   r14: 0000000000000010
(XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000000426e0
(XEN) cr3: 00000000bd897000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83083be0fe60:
(XEN)    000000103be0fec0 0000000000000046 0000001000000000 0000008400000000
(XEN)    00000000000426e0 000000103caed000 0000000000000009 0000000000000000
(XEN)    0000000000000000 0000000000000009 0000000000000010 0000000000000010
(XEN)    ffff83083be0ff10 ffff82d08018958a 0000000000000000 0000001000000000
(XEN)    0000000000000000 0000000000000001 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000010 ffff8300bdce0000 00000033bc7c8400 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d080189051>] set_cpu_sibling_map+0x65/0x37e
(XEN)    [<ffff82d08018958a>] start_secondary+0x220/0x277
(XEN)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000043ffef063 ffffffffffffffff
(XEN)  L3[0x000] = 000000043ffee063 ffffffffffffffff
(XEN)  L2[0x000] = 000000043ffed063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 16:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.






I think it has to do with the fact that I've got CPU #0 on socket #1,
while Boris' (and perhaps Chao's too) test box have it on socket #0.

I'd be happy to test patches on my box, if that helps (although, I'm
about to leave right now, so that will be tomorrow).

Regards,
Dario


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.