[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths



On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote:
> > > > On 09.11.17 at 12:01, <raistlin@xxxxxxxx> wrote:
> > 
> > pCPU1
> > =====
> > current == vCPU1
> > context_switch(next == idle)
> > !! __context_switch() is skipped
> > vcpu_migrate(vCPU1)
> > anything_that_uses_or_touches_context()
> > 
> > So, it must be anything_that_uses_or_touches_context() --knowing it
> > will be reading or touching the pCPU context-- that syncs the
> > state,
> > before using or touching it (which is, e.g., what Jan's patch
> > does).
> 
> Well, generally after the vcpu_migrate(vCPU1) above we expect
> nothing to happen at all on the pCPU, until another (non-idle)
> vCPU gets scheduled onto it. 
>
Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec-
trace (which is what I wanted to do in my email already)?

pCPU1
=====
current == vCPU1
context_switch(next == idle)
!! __context_switch() is skipped
anything_that_uses_or_touches_context()

Point being, is the underlying and general "issue" here, really bound
to migrations, or is it something intrinsic of lazy context switch? I'm
saying it's the latter.

That being said, sure it makes sense to assume that, if we migrated the
vCPU from pCPU1 to pCPU2, it's highly unlikely that it will resume the
execution on pCPU1, and hence there is no point in leaving its context
there, and we could very well sync. It's a reasonable best-effort
measure, but can we rely on it for correctness? I don't think we can.

And generalizing the idea enough that we could then rely on it for
correctness, tends to be close enough to not doing lazy context switch
at all, I think.

> The problem is with tasklet / softirq
> (and hence also RCU) work. 
>
Yes.

> Tasklets already take care of this by
> calling sync_local_execstate() before calling the handler. But
> for softirqs this isn't really an option; I'm surprised to see that
> tasklet code does this independently of what kind of tasklet that
> is. 
>
Good point. Weird indeed.

> Softirq tasklets aren't used very often, so I wonder if we have
> a latent bug here. Otoh, if this was actually fine, adding a similar
> call to rcu_do_batch() (or its caller) would be an option, I think.
> 
We can have a look.

What about the effect on performance, though?

I mean, assuming that lazy context switch does a good job, with respect
to that, by avoiding synching in enough case where it is actually not
necessary, how do things change if we start to sync at any softirq,
even when the handler would have not required that?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.