|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
On 09/02/2015 07:58 AM, Juergen Gross wrote: On 08/31/2015 06:12 PM, Boris Ostrovsky wrote:On 08/20/2015 02:16 PM, Juergen Groà wrote:On 08/18/2015 05:55 PM, Dario Faggioli wrote:Hey everyone, So, as a followup of what we were discussing in this thread: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guesthttp://lists.xenproject.org/archives/html/xen-devel/2015-07/msg03241.htmlI started looking in more details at scheduling domains in the Linuxkernel. Now, that thread was about CPUID and vNUMA, and their weird wayof interacting, while this thing I'm proposing here is completely independent from them both.In fact, no matter whether vNUMA is supported and enabled, and no matterwhether CPUID is reporting accurate, random, meaningful or completely misleading information, I think that we should do something about how scheduling domains are build. Fact is, unless we use 1:1, and immutable (across all the guest lifetime) pinning, scheduling domains should not be constructed, inLinux, by looking at *any* topology information, because that just doesnot make any sense, when vcpus move around.Let me state this again (hoping to make myself as clear as possible): nomatter in how much good shape we put CPUID support, no matter how beautifully and consistently that will interact with both vNUMA,licensing requirements and whatever else. It will be always possible forvCPU #0 and vCPU #3 to be scheduled on two SMT threads at time t1, and on two different NUMA nodes at time t2. Hence, the Linux schedulershould really not skew his load balancing logic toward any of those twosituations, as neither of them could be considered correct (since nothing is!). For now, this only covers the PV case. HVM case shouldn't be anydifferent, but I haven't looked at how to make the same thing happen inthere as well. OVERALL DESCRIPTION =================== What this RFC patch does is, in the Xen PV case, configure scheduling domains in such a way that there is only one of them, spanning all the pCPUs of the guest.Note that the patch deals directly with scheduling domains, and there isno need to alter the masks that will then be used for building andreporting the topology (via CPUID, /proc/cpuinfo, /sysfs, etc.). That isthe main difference between it and the patch proposed by Juergen here:http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05088.htmlThis means that when, in future, we will fix CPUID handling and make it comply with whatever logic or requirements we want, that won't have any I didn't mean to just set has_mp to zero unconditionally (for Xen, or any other, guest). We'd need to have some logic as to when to set it to false. -boris Also, it seems to me that Xen guests would not be the only ones having to deal with topology inconsistencies due to migrating VCPUs. Don't KVM guests, for example, have the same problem? And if yes, perhaps we should try solving it in non-Xen-specific way (especially given that both of those patches look pretty simple and thus are presumably easy to integrate into common code).Indeed. I'll have a try.And, as George already pointed out, this should be an optional feature --- if a guest spans physical nodes and VCPUs are pinned then we don't always want flat topology/domains.Yes, it might be a good idea to be able to keep some of the topology levels. I'll modify my patch to make this command line selectable. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |