[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen/arm: fix rank/vgic locks inversion bug
Hi Stefano, On 17/12/2016 02:13, Stefano Stabellini wrote: On Fri, 16 Dec 2016, Stefano Stabellini wrote:On Fri, 16 Dec 2016, Julien Grall wrote:Hi Stefano, On 16/12/16 00:08, Stefano Stabellini wrote:On Thu, 15 Dec 2016, Julien Grall wrote:On 15/12/2016 01:04, Stefano Stabellini wrote:The locking order is: first rank lock, then vgic lock. The order is respected everywhere, except for gic_update_one_lr. gic_update_one_lr is called with the vgic lock held, but it calls vgic_get_target_vcpu, which tries to obtain the rank lock. This can cause deadlocks. We already have a version of vgic_get_target_vcpu that doesn't take the rank lock: __vgic_get_target_vcpu. Also the only routine that modify the target vcpu are vgic_store_itargetsr and vgic_store_irouter. They call vgic_migrate_irq, which already take the vgic lock. Solve the lock inversion problem, by not taking the rank lock in gic_update_one_lr (calling __vgic_get_target_vcpu instead of vgic_get_target_vcpu).If I look at the callers of gic_update_one_lr, the function gic_clear_lrs will not take the rank lock. So from my understanding nobody will take the rank here. However __vgic_get_target_vcpu has an ASSERT to check whether the rank lock has been taken. So who is taking lock for gic_update_one_lr now?I should have removed the ASSERT - nobody is taking the rank lock now on the gic_update_one_lr code path.Please either keep this ASSERT or update it. But not dropping it, we need to prevent anyone calling this function without any lock taken at least in debug build. The current callers (see vgic_{disable,enable}_irqs) are calling the function with rank lock taken. So we need to keep this behavior at least for them.We make this safe, by placing modifications torank->vcpu within regions protected by the vgic lock.This look unsafe to me. The vgic lock is per-vcpu, but rank->vcpu could be written/read by any vCPU. So you will never protect rank->vcpu with this lock. Did I miss anything?The vgic lock is per-vcpu, but it is always the vgic lock of the old (old in the sense, before the interrupt is migrated) vcpu that is taken. On one hand any vcpu can read/write to itargetsr. Those operations are protected by the rank lock. However vgic_migrate_irq and writes to rank->vcpu are protected also by the vgic lock of the old vcpu (first the rank lock is taken, then the vgic lock of the old vcpu).I don't think this is true. Any vCPU read/write to itargetsr will be protected by the vgic lock of the vCPU in rank->vcpu[offset]. For inflight IRQ, the migration will happen when the guest has EOIed the IRQ. Until this is happening, the guest may have written multiple time in ITARGETSR. For instance, let say the guest is requesting to migrate the IRQ from vCPU A to vCPU B. The vgic lock A will be taken in vgic_store_itargetsr, if the interrupt is inflight vgic_migrate_irq will set a flag. Now the guest could decide to migrate the interrupt to vCPU C before the interrupt has been EOIed. In this case, vgic lock B will be taken. So we have a mismatch between the lock taken in gic_update_one_lr and vgic_migrate_irq. I think the only safe way to fully protect rank->vcpu is taking the rank lock. It will never change and be accessible from anywhere.When I submitted this series, I considered this scenario, and I thought that the existing GIC_IRQ_GUEST_MIGRATING flag should be enough to protect it, because when GIC_IRQ_GUEST_MIGRATING is set, vgic_migrate_irq actually returns immediately, without doing anything. Either GIC_IRQ_GUEST_MIGRATING is not set, and it's the first irq migration, or GIC_IRQ_GUEST_MIGRATING is set, and vgic_migrate_irq just returns. However, as I was writing this explanation to you in more details, I realized that it is not actually safe. Concurrent accesses to rank->vcpu, and concurrent calls to irq_set_affinity, can result in the physical affinity actually set to the intermediate pcpu rather than the final. As a matter of fact, one of these races affect the code today, even without this patch series. Because irq_set_affinity is not called protected by the same lock (e.g rank or vgic), right? [...] 2) We run gic_update_one_lr and vgic_store_itargetsr in parallel safely and locklessly. There might be a way to do it, but it is not easy I haven't found it yet. Correct me if I am wrong. There are no restriction to write into IROUTER/ITARGETSR while an IRQ is pending. So the irq_set_affinity could be called once at the beginning of vgic_irq_migrate. We may receive the interrupt on the wrong physical CPU at the beginning. But it would be the same when writing into IROUTER/ITARGETSR. This would remove the need to get the rank lock in gic_update_one_lr. Did I miss anything? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |