[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy

To: George Dunlap <george.dunlap@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: Juergen Gross <jgross@xxxxxxxx>
Date: Wed, 23 Sep 2015 06:36:39 +0200
Cc: "Luis R. Rodriguez" <mcgrof@xxxxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Delivery-date: Wed, 23 Sep 2015 04:36:52 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 09/22/2015 06:22 PM, George Dunlap wrote:

On 09/22/2015 05:42 AM, Juergen Gross wrote:

One other thing I just discovered: there are other consumers of the
topology sibling masks (e.g. topology_sibling_cpumask()) as well.

I think we would want to avoid any optimizations based on those in
drivers as well, not only in the scheduler.


I'm beginning to lose the thread of the discussion here a bit.

Juergen / Dario, could one of you summarize your two approaches, and the
(alleged) advantages and disadvantages of each one?


Okay, I'll have a try:

The problem we want to solve:
-----------------------------

The Linux kernel is gathering cpu topology data during boot via the
CPUID instruction on each processor coming online. This data is
primarily used in the scheduler to decide to which cpu a thread should
be migrated when this seems to be necessary. There are other users of
the topology information in the kernel (e.g. some drivers try to do
optimizations like core-specific queues/lists).

When started in a virtualized environment the obtained data is next to
useless or even wrong, as it is reflecting only the status of the time
of booting the system. Scheduling of the (v)cpus done by the hypervisor
is changing the topology beneath the feet of the Linux kernel without
reflecting this in the gathered topology information. So any decisions
taken based on that data will be clueless and possibly just wrong.

The minimal solution is to change the topology data in the kernel in a
way that all cpus are regarded as equal regarding their relation to each
other (e.g. when migrating a thread to another cpu no cpu is preferred
as a target).

The topology information of the CPUID instruction is, however, even
accessible form user mode and might be used for licensing purposes of
any user program (e.g. by limiting the software to run on a specific
number of cores or sockets). So just mangling the data returned by
CPUID in the hypervisor seems not to be a general solution, while we
might want to do it at least optionally in the future.

In the future we might want to support either dynamic topology updates
or be able to tell the kernel to use some of the topology data, e.g.
when pinning vcpus.


Solution 1 (Dario):
-------------------

Don't use the CPUID derived topology information in the Linux scheduler,
but let it use a simple "flat" topology by setting own scheduler domain
data under Xen.

Advantages:
+ very clean solution regarding the scheduler interface
+ scheduler decisions are based on a minimal data set
+ small patch

Disadvantages:
- covers the scheduler only, drivers still use the "wrong" data
- a little bit hacky regarding some NUMA architectures (needs either a
  hook in the code dealing with that architecture or multiple scheduler
  domain data overwrites)
- future enhancements will make the solution less clean (either need
  duplicating scheduler domain data or some new hooks in scheduler
  domain interface)


Solution 2 (Juergen):
---------------------

When booted as a Xen guest modify the topology data built during boot
resulting in the same simple "flat" topology as in Dario's solution.

Advantages:
+ the simple topology is seen by all consumers of topology data as the
  data itself is modified accordingly
+ small patch
+ future enhancements rather easy by selecting which data to modify

Disadvantages:
- interface to scheduler not as clean as in Dario's approach
- scheduler decisions are based on multiple layers of topology data
  where one layer would be enough to describe the topology


Dario, are you okay with this summary?

Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: George Dunlap
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: Dario Faggioli

References:
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: Juergen Gross
- Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
  - From: George Dunlap

Prev by Date: [Xen-devel] [linux-linus test] 62197: regressions - FAIL
Next by Date: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
Previous by thread: Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
Next by thread: Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.