[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 3/3] xen/arm: vgic_migrate_irq: do not race against GIC_IRQ_GUEST_MIGRATING
On Fri, 31 Mar 2017, Julien Grall wrote: > Hi Stefano, > > On 30/03/17 00:47, Stefano Stabellini wrote: > > On Fri, 3 Mar 2017, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 01/03/17 22:15, Stefano Stabellini wrote: > > > > A potential race condition occurs when vgic_migrate_irq is called a > > > > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, > > > > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. > > > > > > Hmmm, vgic_migrate_irq will bail out before accessing inflight list if > > > GIC_IRQ_GUEST_MIGRATING is already set: > > > > > > /* migrating already in progress, no need to do anything */ > > > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status ) > > > return; > > > > > > And test_bit is atomic. So I don't understand what is the corruption > > > problem > > > you mention. > > > > The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set > > and vgic_migrate_irq is called to move the irq again, even though the > > first migration is not complete yet. > > What you described is not a data corruption to me. No, it is not, thanks to the previous two patches. The commit description needs an update. > The host IRQ will be routed > to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong > pCPU, it will be slower but we are capable to handle that. > > The use case you describe would only happen if a guest is trying to change the > routing multiple times while an interrupt is pending. So to be honest, a sane > guest would not do that. But this would only affect stupid guest. > > So I don't think this is worth to support considering how this patch will > increase the code complexity in a component that is already a nightmare to > handle. I think we have to fix this because it is not predictable. Latency could be much higher, depending on who wins the race. It also uses more Xen resources -- the time that Xen spends to send and to handle SGIs could be used for something else. I think it is more important to be predictable than correct. Especially given that a sane guest shouldn't do this, I prefer to refuse a "nested" migration we cannot handle (even though it is a mistake) than provide unreliable latency. In other words, I think we need to fix this one way or another, but we might be able to come up with a better fix than this. > > This could happen: > > > > > > CPU 0 CPU 1 > > gic_update_one_lr > > test_and_clear_bit MIGRATING > > read target (old) > > write target (new) > > vgic_migrate_irq > > test_bit MIGRATING > > irq_set_affinity (new) > > return > > irq_set_affinity (old) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |