[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 13/60] xen/sched: move some per-vcpu items to struct sched_unit



On 02.07.19 11:05, Jan Beulich wrote:
On 02.07.2019 10:44, Juergen Gross wrote:
On 02.07.19 10:27, Jan Beulich wrote:
On 02.07.2019 10:14, Juergen Gross wrote:
On 02.07.19 09:54, Jan Beulich wrote:
And again - if someone pins every vCPU to a single pCPU, that last
such pinning operation will be what takes long term effect. Aiui all
vCPU-s in the unit will then be pinned to that one pCPU, i.e.
they'll either all compete for the one pCPU's time, or only one of
them will ever get scheduled.

No, that's not how it works. Lets say we have a system with the
following topology and core scheduling active:

cpu0: core 0, thread 0
cpu1: core 0, thread 1
cpu2: core 1, thread 0
cpu3: core 1, thread 1

Then any even numbered vcpu will only ever be scheduled on cpu0 or cpu2,
while any odd numbered vcpu will only run on cpu1 or cpu3.

So virtual cores get scheduled on physical cores. Virtual thread 0 will
only run on physical thread 0 and the associated virtual thread 1 will
run on the associated physical thread 1 of the same physical core.

Pinning a virtual thread 1 to a physical thread 0 is not possible (in
reality only the virtual core is being pinned to the physical core).

But that's what existing guests may be doing. You may want to
take a look at our old, non-pvops kernels, in particular the
functionality provided by their drivers/xen/core/domctl.c. Yes,
{sys,dom}ctl-s aren't supposed to be used by the kernel, but to
achieve the intended effects I saw no way around (ab)using them.
(I mean this to be taken as an example only - I realize that the
code there wouldn't work on modern Xen without updating, due to
the sysctl/domctl interface version that needs setting.)

First - speaking of "guests" is a little bit misleading here. This is
restricted to dom0.

So when you want to use such a dom0 kernel with Xen 4.13 or later you'd
need to stay with cpu scheduling. Any recent kernel will run just fine
as dom0 with core scheduling active.

Right, but such recent kernels have (afaict) no solution to some of
the problems that the (ab)use of the {sys,dom}ctl-s were meant to
address.

With SCHEDOP_pin_override this should all be doable. The kernel would
need to execute the SCHEDOP_pin_override code on a suitable vcpu
(ideally with vcpu-id == pcpu-id).

I'd prefer new hypervisor interfaces for that purpose, though.


--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -438,11 +438,11 @@ static inline cpumask_t* cpupool_domain_cpumask(struct 
domain *d)
       * * The hard affinity is not a subset of soft affinity
       * * There is an overlap between the soft and hard affinity masks
       */
-static inline int has_soft_affinity(const struct vcpu *v)
+static inline int has_soft_affinity(const struct sched_unit *unit)
      {
-    return v->soft_aff_effective &&
-           !cpumask_subset(cpupool_domain_cpumask(v->domain),
-                           v->cpu_soft_affinity);
+    return unit->soft_aff_effective &&
+           !cpumask_subset(cpupool_domain_cpumask(unit->vcpu->domain),
+                           unit->cpu_soft_affinity);
      }

Okay, at the moment there looks to be a 1:1 relationship between sched
units and vCPU-s. This would - at this point of the series - invalidate most
my earlier comments. However, in patch 57 I don't see how this unit->vcpu
mapping would get broken, and I can't seem to identify any other patch
where this might be happening. Looking at the github branch I also get the
impression that the struct vcpu * pointer out of struct sched_unit survives
until the end of the series, which doesn't seem right to me.

It is right. The vcpu pointer in the sched_unit is pointing to the first
vcpu of the unit at the end of the series. Further vcpus are found via
v->next_in_list.

I'm afraid this sets us up for misunderstanding and misuse. I don't
think there should be a straight struct vcpu * out of struct sched_unit.

That was the most effective way to do it. What are you suggesting?

An actual list, i.e. with a struct list_head. That'll make obvious that
more than one vCPU might be associated with a unit. That's even more so
that the ability to associate more than one appears only quite late in
the series, i.e. there may be further instances like the code above, and
it would require a careful audit (rather than the compiler finding such
instance) to determine all places where using the first vCPU in a unit
isn't really what was meant.

TBH I don't see how this would help at all.

Instead of using the vcpu pointer I'd had to use the list head instead.
Why is that different to a plain pointer regarding finding the places
where using the first vcpu was wrong?

Take the example above: Is it correct to act on just the first vCPU?
I guess _here_ it is, but the same pattern could be found elsewhere.
If, from the beginning, you use a clearly identifiable list construct,
then it'll be obvious to you as the writer and to reviewers that by
the end of the series there may be multiple entities that need dealing
with - we'd see list_first*() or for_each*() constructs right away
(and you wouldn't be able to circumvent their use in a way that
wouldn't trigger "don't open-code" comments).

Would you be fine with just renaming the pointer to "vcpu_list"?
This would avoid the need to introduce another vcpu list in struct vcpu
and I could re-use the already existing list as today.

Well, yes, this would at least make obvious at use sites that there
may be more than one of them.

Thanks.


It should be noted that I named the pointer just "vcpu" in order to
avoid lots of reformatting due to longer lines, though.

I can appreciate this aspect, but as said I'm afraid it would be
misleading (and potentially inviting for mis-use).

Okay, lets live with some more multi-line statements then.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.