[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
>>> On 23.11.15 at 17:36, <eswierk@xxxxxxxxxxxxxxxxxx> wrote: > I instrumented detect_extended_topology() and ran again with 4 CPUs. >[...] > (XEN) smp_store_cpu_info id=3 > (XEN) detect_extended_topology cpuid_count op=0xb count=0 eax=0x0 ebx=0x1 > ecx=0x100 edx=0x6 > (XEN) detect_extended_topology initial_apicid=6 core_plus_mask_width=0 > core_level_siblings=1 > (XEN) detect_extended_topology cpuid_count op=0xb count=1 eax=0x0 ebx=0x1 > ecx=0x201 edx=0x6 > (XEN) detect_extended_topology ht_mask_width=0 core_plus_mask_width=0 > core_select_mask=0x0 core_level_siblings=1 >[...] > If cpuid 0xb returned 1 rather than 0 in eax[4:0], we would get > consecutively-numbered physical processor IDs. > > But the only requirement I see in the IA SDM (vol 2A, table 3-17) is that > the eax[4:0] value yield unique IDs, not necessarily consecutive. Likewise > while the examples in vol 3A sec 8.9 show physical IDs numbered > consecutively, the algorithms do not assume this is the case. Indeed, and I think I had said so. The algorithm does, however, tell us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both shifts being zero), which for the whole system results in sockets 1, 3, and 5 unused. While not explicitly excluded, I'm not sure how far we should go in expecting all kinds of odd configurations (along those lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / MAX_LOCAL_APIC, which for big systems is 4 times the number of CPUs we support). Taking it to set_nr_sockets(), a pretty basic assumption is broken by the above way of presenting topology: We would have to have more sockets than there are CPUs. I would have wanted to check what e.g. Linux does here, but there doesn't seem to be any support of CAT (and hence any need for per-socket data) there. (I am, btw, now also confused by you saying that e.g. for a 3-CPU config things work. If the topology data gets presented in similar ways in that case, I can't see why you wouldn't run into the same problem. Unless memory corruption occurs silently in one case, but "loudly" in the other.) Bottom line - for the moment I do not see a reasonable way of dealing with that situation. The closest I could see would be what we iirc had temporarily during the review cycles of the initial CAT series: A command line option to specify the number of sockets. Or make all accesses to socket_cpumask[] conditional upon PSR being enabled (which would have the bad side effect of making future uses for other purposes more cumbersome), or go through and range check the socket number on all of those accesses. Chao, could you - inside Intel - please check whether there are any assumptions on the respective CPUID leaf output that aren't explicitly stated in the SDM right now (like resulting in contiguous socket numbers), and ask for them getting made explicit (if there are any), or it being made explicit that no assumptions at all are to be made at all on the presented values (in which case we'd have to consume MADT parsing data in set_nr_sockets(), e.g. by replacing num_processors there with one more than the maximum APIC ID of any non-disabled CPU)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |