[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] ITS emulation race conditions
Hi, On 10/04/17 00:12, André Przywara wrote: > Hi, > > I wanted to run some ideas on how to prevent the race conditions we are > facing with the ITS emulation and removing devices and/or LPIs. > I think Stefano's idea of tagging a discarded LPI is the key, but still > some details are left to be solved. > I think we are dealing with two issues: > 1) A guest's DISCARD can race with in incoming LPI. > Ideally DISCARD would remove the host LPI -> vLPI connection (in the > host_lpis table), so any new incoming (host) LPI would simply be > discarded very early in gicv3_do_LPI() without ever resolving into a > virtual LPI. Now while this removal is atomic, we could have just missed > an incoming LPI, so the old virtual LPI would still traverse down the > VGIC with a "doomed" virtual LPI ID. > I wonder if that could be solved by a "crosswise" check: > - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then* > removes the host_lpis connectin. > - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag > in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that). > With this setup the DISCARD handler can assume that no new virtual LPI > instances enter the VGIC afterwards. > > 2) An unmapped LPI might still be in use by the VGIC, foremost might > still be in an LR. > Tagging the pending_irq should solve this in general. Whenever a VGIC > function finds the UNMAPPED tag, it does not process the vIRQ, but > retires it. For simplicity we might limit this to handling when a VCPU > exists and an LR gets cleaned up: If we hit a tagged pending_irq here, > we clean the LR, remove the pending_irq from all lists and signal the > ITS emulation that this pending_irq is now ready to be removed (by > calling some kind of cleanup_lpi() function, for instance). > The ITS code can then remove the struct pending_irq from the radix tree. > MAPD(V=0) is now using this to tag all still mapped events as > "UNMAPPED", the counting down the "still alive" pending_irqs in > cleanup_lpi() until they reach zero. At this point it should be safe to > free the pend_irq array. > > Does that sound like a plan? > Do I miss something here? Probably yes, hence I am asking ;-) So there are two issues with this: - For doing the LPI cleanup, we would need to call a virtual ITS or virtual LPI specific function directly from gic.c. This looks like bad coding style, as it breaks the abstraction (calling a virtual LPI/ITS specific function from within gic.c) - If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while this vLPI is still in an LR (but not cleaned up until both commands have been handled), we end up with using the wrong pending_irq struct (the new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI). Can this create problems? I see two possibilities: a) the old vLPI is still pending in the LR: as the new LR would be pristine, the code in update_one_lr() would just clean the QUEUED bit, but not do anything further. The LR would be kept as used by this vLPI, with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU this would make the new vLPI pending. b) the old vLPI has been EOIed: the new LR would be pristine, the code in update_one_lr() would try to clean it up anyway, which should not hurt, AFAICT. So if there is no new LPI triggered, a) would be wrong, but b) right. Is that correct so far? Now if there is a new LPI, a) would be correct, though the priority might be different (though I am not sure this is an issue). b) should work anyway, since the cleanup code checks for a new pending condition, so would (re-)inject the (new) vLPI. Julien does not seem to be a fan of this tagging idea, as this might create more subtle failures that we don't see at the moment (which I can understand). So I would like to know how to proceed here: I) stick to the tag, and fix that particular case above by checking for the "inflight && ENABLED" condition and clearing the LR if the pending_irq does not state it's pending anymore II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag, where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING. NO_LONGER_PENDING would be checked and cleared by update_one_lr(), UNMAPPED by vgic_vcpu_inject_irq(). That would leave the issue how to call the pend_irq_cleanup() function from update_one_lr() without breaking abstraction III) Revert to the "check for NULL" solution. MMh, this is nasty stuff, any suggestions? Cheers, Andre. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |