[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v5 3/3] xen/arm: vgic_migrate_irq: do not race against GIC_IRQ_GUEST_MIGRATING



On Fri, 31 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 30/03/17 00:47, Stefano Stabellini wrote:
> > On Fri, 3 Mar 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 01/03/17 22:15, Stefano Stabellini wrote:
> > > > A potential race condition occurs when vgic_migrate_irq is called a
> > > > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
> > > > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
> > > 
> > > Hmmm, vgic_migrate_irq will bail out before accessing inflight list if
> > > GIC_IRQ_GUEST_MIGRATING is already set:
> > > 
> > > /* migrating already in progress, no need to do anything */
> > > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status )
> > >   return;
> > > 
> > > And test_bit is atomic. So I don't understand what is the corruption
> > > problem
> > > you mention.
> > 
> > The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set
> > and vgic_migrate_irq is called to move the irq again, even though the
> > first migration is not complete yet.
> 
> What you described is not a data corruption to me.

No, it is not, thanks to the previous two patches. The commit
description needs an update.


> The host IRQ will be routed
> to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong
> pCPU, it will be slower but we are capable to handle that.
> 
> The use case you describe would only happen if a guest is trying to change the
> routing multiple times while an interrupt is pending. So to be honest, a sane
> guest would not do that. But this would only affect stupid guest.
> 
> So I don't think this is worth to support considering how this patch will
> increase the code complexity in a component that is already a nightmare to
> handle.

I think we have to fix this because it is not predictable. Latency could
be much higher, depending on who wins the race. It also uses more Xen
resources -- the time that Xen spends to send and to handle SGIs could
be used for something  else. I think it is more important to be
predictable than correct. Especially given that a sane guest shouldn't
do this, I prefer to refuse a "nested" migration we cannot handle (even
though it is a mistake) than provide unreliable latency.

In other words, I think we need to fix this one way or another, but we
might be able to come up with a better fix than this.



> > This could happen:
> > 
> > 
> >       CPU 0                                    CPU 1
> >   gic_update_one_lr
> >   test_and_clear_bit MIGRATING
> >   read target (old)
> >                                             write target (new)
> >                                             vgic_migrate_irq
> >                                               test_bit MIGRATING
> >                                               irq_set_affinity (new)
> >                                               return
> >   irq_set_affinity (old)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.