[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.



On Thu, Mar 19, 2015 at 07:53:35AM +0000, Jan Beulich wrote:
> >>> On 17.03.15 at 16:38, <konrad.wilk@xxxxxxxxxx> wrote:
> > There is race when we clear the STATE_SCHED in the softirq
> > - which allows the 'raise_softirq_for' (on another CPU or
> > on the one running the softirq) to schedule the dpci.
> > 
> > Specifically this can happen when the other CPU receives
> > an interrupt, calls 'raise_softirq_for', and puts the dpci
> > on its per-cpu list (same dpci structure). Note that
> > this could also happen on the same physical CPU, however
> > the explanation for simplicity will assume two CPUs actors.
> > 
> > There would be two 'dpci_softirq' running at the same time
> > (on different CPUs) where on one CPU it would be executing
> > hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
> > and on the other CPU it is trying to call:
> > 
> >    if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >             BUG();
> > 
> > Since STATE_RUN is already set it would end badly.
> > 
> > The reason we can get his with this is when an interrupt
> > affinity is set over multiple CPUs.
> > 
> > Potential solutions:
> > 
> > a) Instead of the BUG() we can put the dpci back on the per-cpu
> > list to deal with later (when the softirq are activated again).
> > This putting the 'dpci' back on the per-cpu list is an spin
> > until the bad condition clears.
> > 
> > b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
> > to detect for 'STATE_RUN' bit being set and schedule the dpci.
> > The BUG() check in dpci_softirq would be replace with a spin until
> > 'STATE_RUN' has been cleared. The dpci would still not
> > be scheduled when STATE_SCHED bit was set.
> > 
> > c) Only schedule the dpci when the state is cleared
> > (no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
> > (as it is in progress and will finish). If the STATE_SCHED is set
> > (so hasn't run yet) we won't try to spin and just exit.
> > 
> > Down-sides of the solutions:
> > 
> > a). Live-lock of the CPU. We could be finishing an dpci, then adding
> > the dpci, exiting, and the processing the dpci once more. And so on.
> > We would eventually stop as the TIMER_SOFTIRQ would be set, which will
> > cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.
> > 
> > Interestingly the old ('tasklet') code used this mechanism.
> > If the function assigned to the tasklet was running  - the softirq
> > that ran said function (hvm_dirq_assist) would be responsible for
> > putting the tasklet back on the per-cpu list. This would allow
> > to have an running tasklet and an 'to-be-scheduled' tasklet
> > at the same time.
> > 
> > b). is similar to a) - instead of re-entering the dpci_softirq
> > we are looping in the softirq waiting for the correct condition to
> > arrive. As it does not allow unwedging ourselves because the other
> > softirqs are not called - it is less preferable.
> > 
> > c) can cause an dead-lock if the interrupt comes in when we are
> > processing the dpci in the softirq - iff this happens on the same CPU.
> > We would be looping in on raise_softirq waiting for STATE_RUN
> > to be cleared, while the softirq that was to clear it - is preempted
> > by our interrupt handler.
> > 
> > As such, this patch - which implements a) is the best candidate
> > for this quagmire.
> > 
> > Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> > Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> 
> So I now agree that in the state we're in this is the most reasonable
> fix. My reservations against the extra logic introduced earlier (and

Thank you.

Are you OK with me checking it in?
> being fixed here) stand though: From an abstract perspective the
> IRQ and softirq logic alone should be sufficient to deal with the
> needs we have. The complications really result from the desire to
> use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler
> alternative should be found for (after all real hardware doesn't use
> such lists).
> 
> A first thought would be to put them all on a per-domain list and
> have a cpumask tracking which CPUs they need servicing on. The
> downside of this would be (apart from again not being a proper
> equivalent of how actual hardware handles this) that - the softirq
> handler not having any other context - domains needing servicing
> would also need to be tracked in some form (in order to avoid
> having to iterate over all of them), and a per-CPU list would be
> undesirable for the exact same reasons. Yet a per-CPU
> domain-needs-service bitmap doesn't seem very attractive either,
> i.e. this would need further thought (also to make sure such an
> alternative model doesn't become even more involved than what
> we have now).

HA! (yes, I completly agree on - "complex" == "unpleasant")

Perhaps we can brainstorm some of this at XenHackathon in Shanghai?
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.