[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Tue, Nov 24, 2015 at 2:34 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote: > Indeed, and I think I had said so. The algorithm does, however, tell > us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both > shifts being zero), which for the whole system results in sockets 1, > 3, and 5 unused. While not explicitly excluded, I'm not sure how far > we should go in expecting all kinds of odd configurations (along those > lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / > MAX_LOCAL_APIC, which for big systems is 4 times the number of > CPUs we support). That's why I thought it reasonable to substitute MAX_APICS for nr_sockets in sizing the socket_cpumask array. > Taking it to set_nr_sockets(), a pretty basic assumption is broken by > the above way of presenting topology: We would have to have more > sockets than there are CPUs. I would have wanted to check what > e.g. Linux does here, but there doesn't seem to be any support of > CAT (and hence any need for per-socket data) there. I looked at Linux, and there is no per-socket bookkeeping, AFAICT. > (I am, btw, now also confused by you saying that e.g. for a 3-CPU > config things work. If the topology data gets presented in similar > ways in that case, I can't see why you wouldn't run into the same > problem. Unless memory corruption occurs silently in one case, but > "loudly" in the other.) For 3, 6 and 12 CPUs, Fusion presents a completely different topology, with 3-core sockets numbered consecutively starting with 0. > Bottom line - for the moment I do not see a reasonable way of > dealing with that situation. The closest I could see would be what > we iirc had temporarily during the review cycles of the initial CAT > series: A command line option to specify the number of sockets. Or > make all accesses to socket_cpumask[] conditional upon PSR being > enabled (which would have the bad side effect of making future > uses for other purposes more cumbersome), or go through and > range check the socket number on all of those accesses. Could we avoid the issue by replacing socket_cpumask array with a list or hashtable, indexed by socket ID? > Chao, could you - inside Intel - please check whether there are > any assumptions on the respective CPUID leaf output that aren't > explicitly stated in the SDM right now (like resulting in contiguous > socket numbers), and ask for them getting made explicit (if there > are any), or it being made explicit that no assumptions at all are > to be made at all on the presented values (in which case we'd > have to consume MADT parsing data in set_nr_sockets(), e.g. > by replacing num_processors there with one more than the > maximum APIC ID of any non-disabled CPU)? I suppose the key is whether Intel has encoded such assumptions in the BIOS reference code, or has otherwise communicated them to AMI et al. --Ed _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |