[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion
On Tue, Nov 24, 2015 at 03:34:45AM -0700, Jan Beulich wrote: > >>> On 23.11.15 at 17:36, <eswierk@xxxxxxxxxxxxxxxxxx> wrote: > > I instrumented detect_extended_topology() and ran again with 4 CPUs. > >[...] > > (XEN) smp_store_cpu_info id=3 > > (XEN) detect_extended_topology cpuid_count op=0xb count=0 eax=0x0 ebx=0x1 > > ecx=0x100 edx=0x6 > > (XEN) detect_extended_topology initial_apicid=6 core_plus_mask_width=0 > > core_level_siblings=1 > > (XEN) detect_extended_topology cpuid_count op=0xb count=1 eax=0x0 ebx=0x1 > > ecx=0x201 edx=0x6 > > (XEN) detect_extended_topology ht_mask_width=0 core_plus_mask_width=0 > > core_select_mask=0x0 core_level_siblings=1 > >[...] > > If cpuid 0xb returned 1 rather than 0 in eax[4:0], we would get > > consecutively-numbered physical processor IDs. > > > > But the only requirement I see in the IA SDM (vol 2A, table 3-17) is that > > the eax[4:0] value yield unique IDs, not necessarily consecutive. Likewise > > while the examples in vol 3A sec 8.9 show physical IDs numbered > > consecutively, the algorithms do not assume this is the case. > > Indeed, and I think I had said so. The algorithm does, however, tell > us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both > shifts being zero), which for the whole system results in sockets 1, > 3, and 5 unused. While not explicitly excluded, I'm not sure how far > we should go in expecting all kinds of odd configurations (along those > lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / > MAX_LOCAL_APIC, which for big systems is 4 times the number of > CPUs we support). > > Taking it to set_nr_sockets(), a pretty basic assumption is broken by > the above way of presenting topology: We would have to have more > sockets than there are CPUs. I would have wanted to check what > e.g. Linux does here, but there doesn't seem to be any support of > CAT (and hence any need for per-socket data) there. Actually I checked Linux code when I implementing this but it doesn't exist. Current Linux CAT patch supports only system-level other than per-socket level so it doesn't need that as well. There are people requesting to add per-socket support so Linux need solve this problem eventually. But at this time, we don't have any reference. > > (I am, btw, now also confused by you saying that e.g. for a 3-CPU > config things work. If the topology data gets presented in similar > ways in that case, I can't see why you wouldn't run into the same > problem. Unless memory corruption occurs silently in one case, but > "loudly" in the other.) > > Bottom line - for the moment I do not see a reasonable way of > dealing with that situation. The closest I could see would be what > we iirc had temporarily during the review cycles of the initial CAT > series: A command line option to specify the number of sockets. Or > make all accesses to socket_cpumask[] conditional upon PSR being > enabled (which would have the bad side effect of making future > uses for other purposes more cumbersome), or go through and > range check the socket number on all of those accesses. > > Chao, could you - inside Intel - please check whether there are > any assumptions on the respective CPUID leaf output that aren't > explicitly stated in the SDM right now (like resulting in contiguous > socket numbers), and ask for them getting made explicit (if there > are any), or it being made explicit that no assumptions at all are > to be made at all on the presented values Actually there is already such statement in SDM (ch8.9.1, vol3): "The value of valid APIC_IDs need not be contiguous across package boundary or core boundaries". > (in which case we'd > have to consume MADT parsing data in set_nr_sockets(), e.g. > by replacing num_processors there with one more than the > maximum APIC ID of any non-disabled CPU)? Even with this, we still have problem for hotplug case, the inserted CPU may have a APIC_ID bigger than the maximum APIC_ID here. But let's back to the real world. Most machines that support CAT should have continuous SOCKET_ID so it's not a problem. Giving that CAT is the only feature uses this, I guess this suggestion might be better than other solutions in practice. Chao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |