[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy



On 09/23/2015 05:36 AM, Juergen Gross wrote:
> On 09/22/2015 06:22 PM, George Dunlap wrote:
>> On 09/22/2015 05:42 AM, Juergen Gross wrote:
>>> One other thing I just discovered: there are other consumers of the
>>> topology sibling masks (e.g. topology_sibling_cpumask()) as well.
>>>
>>> I think we would want to avoid any optimizations based on those in
>>> drivers as well, not only in the scheduler.
>>
>> I'm beginning to lose the thread of the discussion here a bit.
>>
>> Juergen / Dario, could one of you summarize your two approaches, and the
>> (alleged) advantages and disadvantages of each one?
> 
> Okay, I'll have a try:
> 
> The problem we want to solve:
> -----------------------------
> 
> The Linux kernel is gathering cpu topology data during boot via the
> CPUID instruction on each processor coming online. This data is
> primarily used in the scheduler to decide to which cpu a thread should
> be migrated when this seems to be necessary. There are other users of
> the topology information in the kernel (e.g. some drivers try to do
> optimizations like core-specific queues/lists).
> 
> When started in a virtualized environment the obtained data is next to
> useless or even wrong, as it is reflecting only the status of the time
> of booting the system. Scheduling of the (v)cpus done by the hypervisor
> is changing the topology beneath the feet of the Linux kernel without
> reflecting this in the gathered topology information. So any decisions
> taken based on that data will be clueless and possibly just wrong.
> 
> The minimal solution is to change the topology data in the kernel in a
> way that all cpus are regarded as equal regarding their relation to each
> other (e.g. when migrating a thread to another cpu no cpu is preferred
> as a target).
> 
> The topology information of the CPUID instruction is, however, even
> accessible form user mode and might be used for licensing purposes of
> any user program (e.g. by limiting the software to run on a specific
> number of cores or sockets). So just mangling the data returned by
> CPUID in the hypervisor seems not to be a general solution, while we
> might want to do it at least optionally in the future.
> 
> In the future we might want to support either dynamic topology updates
> or be able to tell the kernel to use some of the topology data, e.g.
> when pinning vcpus.
> 
> 
> Solution 1 (Dario):
> -------------------
> 
> Don't use the CPUID derived topology information in the Linux scheduler,
> but let it use a simple "flat" topology by setting own scheduler domain
> data under Xen.
> 
> Advantages:
> + very clean solution regarding the scheduler interface
> + scheduler decisions are based on a minimal data set
> + small patch
> 
> Disadvantages:
> - covers the scheduler only, drivers still use the "wrong" data
> - a little bit hacky regarding some NUMA architectures (needs either a
>   hook in the code dealing with that architecture or multiple scheduler
>   domain data overwrites)
> - future enhancements will make the solution less clean (either need
>   duplicating scheduler domain data or some new hooks in scheduler
>   domain interface)
> 
> 
> Solution 2 (Juergen):
> ---------------------
> 
> When booted as a Xen guest modify the topology data built during boot
> resulting in the same simple "flat" topology as in Dario's solution.
> 
> Advantages:
> + the simple topology is seen by all consumers of topology data as the
>   data itself is modified accordingly
> + small patch
> + future enhancements rather easy by selecting which data to modify
> 
> Disadvantages:
> - interface to scheduler not as clean as in Dario's approach
> - scheduler decisions are based on multiple layers of topology data
>   where one layer would be enough to describe the topology
> 
> 
> Dario, are you okay with this summary?

Thanks -- that's very helpful.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.