[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] ITS emulation race conditions
On 11/04/17 10:24, Julien Grall wrote: Hi Stefano, On 04/11/2017 12:01 AM, Stefano Stabellini wrote:On Mon, 10 Apr 2017, Andre Przywara wrote:Hi, On 10/04/17 00:12, André Przywara wrote:Hi, I wanted to run some ideas on how to prevent the race conditions we are facing with the ITS emulation and removing devices and/or LPIs. I think Stefano's idea of tagging a discarded LPI is the key, but still some details are left to be solved. I think we are dealing with two issues: 1) A guest's DISCARD can race with in incoming LPI. Ideally DISCARD would remove the host LPI -> vLPI connection (in the host_lpis table), so any new incoming (host) LPI would simply be discarded very early in gicv3_do_LPI() without ever resolving into a virtual LPI. Now while this removal is atomic, we could have just missed an incoming LPI, so the old virtual LPI would still traverse down the VGIC with a "doomed" virtual LPI ID. I wonder if that could be solved by a "crosswise" check: - The DISCARD handler *first* tags the pending_irq as "UNMAPPED", *then* removes the host_lpis connectin. - do_LPI() reads the host_lpis() array, *then* checks the UNMAPPED tag in the corresponding pending_irq (or lets vgic_vcpu_inject_irq() do that). With this setup the DISCARD handler can assume that no new virtual LPI instances enter the VGIC afterwards. 2) An unmapped LPI might still be in use by the VGIC, foremost might still be in an LR. Tagging the pending_irq should solve this in general. Whenever a VGIC function finds the UNMAPPED tag, it does not process the vIRQ, but retires it. For simplicity we might limit this to handling when a VCPU exists and an LR gets cleaned up: If we hit a tagged pending_irq here, we clean the LR, remove the pending_irq from all lists and signal the ITS emulation that this pending_irq is now ready to be removed (by calling some kind of cleanup_lpi() function, for instance). The ITS code can then remove the struct pending_irq from the radix tree. MAPD(V=0) is now using this to tag all still mapped events as "UNMAPPED", the counting down the "still alive" pending_irqs in cleanup_lpi() until they reach zero. At this point it should be safe to free the pend_irq array. Does that sound like a plan? Do I miss something here? Probably yes, hence I am asking ;-)So there are two issues with this: - For doing the LPI cleanup, we would need to call a virtual ITS or virtual LPI specific function directly from gic.c. This looks like bad coding style, as it breaks the abstraction (calling a virtual LPI/ITS specific function from within gic.c)This is just code organization, I am not worried about it. We might have to register cleanup functions. The real problem to solve is below.- If we have a "DISCARD; MAPTI" sequence, targeting the same vLPI, while this vLPI is still in an LR (but not cleaned up until both commands have been handled), we end up with using the wrong pending_irq struct (the new one). (I assume the UNMAPPED tag would be cleared upon the new MAPTI).It looks like "DISCARD; MAPTI" would be a problem even if we go with "GIC: Add checks for NULL pointer pending_irq's", because instead of a NULL pending_irq, we could get the new pending_irq, the one backing the same vlpi but a different eventid.Can this create problems? I see two possibilities: a) the old vLPI is still pending in the LR: as the new LR would be pristine, the code in update_one_lr() would just clean the QUEUED bit, but not do anything further. The LR would be kept as used by this vLPI, with the state still set to GICH_LR_PENDING. Upon re-entering the VCPU this would make the new vLPI pending. b) the old vLPI has been EOIed: the new LR would be pristine, the code in update_one_lr() would try to clean it up anyway, which should not hurt, AFAICT.This cannot be allowed to happen. We have to keep a consistent internal state at all times: we cannot change the vlpi mapping in pend_lpi_tree before the old vlpi is completed discarded. We cannot have a vlpi marked as "UNMAPPED", but still alive in an LR, while vlpi_to_pending already returns the new vlpi. Well, in hardware LPI are handled by the re-distributor and ITS is only acting as a walker to find the mapping (DeviceID, EventID) -> LPI. In real hardware, a LPI may be received afterwards if the re-distributor already inject it. The OS will receive the LPI and have to deal with it, potentially re-directing to the new interrupt handler and not the old. The problem is the same here, an LPI may be in LR whilst the DISCARD happen. I think this is too late and the guest should receive it. We have a number of possible approaches, I'll mark them with [letter]. [a] On DISCARD we do everything we can to remove the vlpi from everywhere: - if pending_irq is in lr_queue, remove it from the list - if the vlpi is in an LR register as pending (not active, see below), remove it from the LR (see the code in gic_restore_pending_irqs, under "No more free LRs: find a lower priority irq to evict") - remove pending_irq from inflight - remove pending_irq from pend_lpi_tree At this stage there should be no more traces of the vlpi/pending_irq anywhere. What do we do if a "DISCARD; MAPTI" pair of commands is issued while the old vlpi is still ACTIVE in an LR? ACTIVE means that the guest is still handling the irq and hasn't even EOIed it yet. I know that physical LPIs have no active state, but what about virtual LPIs? I am tempted to say that in any case for simplicity we could just remove the vlpi from the LR ("evict") anyway. I don't think this case should happen with a well behaved guest anyhow. But the other issue is that "DISCARD, MAPTI" could be done on a different vcpu, compared to the one having the old vlpi in an LR. In this case, we would have to send an SGI to the right pcpu and clear the LR, which is a royal pain in the neck because the other vcpu could not even be running. Also another MAPTI could come in on a different pcpu while we haven't completed the first LR removal. There might be races. Let's explore other options. [b] On DISCARD we would have to: - if pending_irq is in lr_queue, remove it from the list - if the vlpi is in an LR register: - mark as UNMAPPED (this check can be done from another vcpu) - remove from inflight (it could also be done in gic_update_one_lr) Then when the guest EOIs the interrupt, in gic_update_one_lr: - clear LR - if UNMAPPED - clear UNMAPPED and return - remove old pending_irq from pend_lpi_tree (may not be necessary?) Now the problem with this approach, like you pointed out, is: what do we do if the guest issues a MAPTI before EOIing the interrupt? [c] The simplest thing to do is ignore it (or report error back to the guest) and printk a warning if the old vlpi is still UNMAPPED. It could be a decent solution for Xen 4.9.I don't think this is a decent solution for Xen 4.9. It could be easily trigger and it is possible to have "DISCARD MAPTI" in the queue.[d] Otherwise, we can try to handle it properly. On DISCARD: - if pending_irq is in lr_queue, remove it from the list - if vlpi is in LR: - mark as UNMAPPED - remove from inflight On MAPTI: if old pending_irq is UNMAPPED: - clear UNMAPPED in old pending_irq - add UNMAPPED to new pending_irq - remove old pending_irq from pend_lpi_tree - add new pending_irq to pend_lpi_tree Then when the guest EOIs the interrupt, in gic_update_one_lr: - clear LR - if UNMAPPED - clear UNMAPPED and returnSo who is removing the pending_irq from the radix_tree? If you do in both MAPTI and gic_update_one_lr, you will end up in a potential race condition where MAPTI add the new on in the radix_tree and gic_clear_update will remove from the radix_tree. So the mapping will disappear. It was not easily solvable on the migration case, and I don't think this is different here. Also, the new MAPTI may use a different collection (vCPU ID) so how do you protect that correctly? Thinking a bit more, I think there is another race with your suggestion between gic_update_one_lr and vgci_vcpu_inject_irq. If I understand correctly your suggestion, UNMAPPED will clear the lrs and return. However, you may have receive a new interrupt just before clearing the LRs. Do you expect vgic_vcpu_inject_irq to clear UNMAPPED? If so, how the function will know if this is the new pending_irq or the previous and should be ignored? Among these approaches, [b]+[c] would be OK for 4.9. Or [d]. [a] looks simple but it is actually difficult to do correctly.So I would like to know how to proceed here: I) stick to the tag, and fix that particular case above by checking for the "inflight && ENABLED" condition and clearing the LR if the pending_irq does not state it's pending anymore II) introduce a NO_LONGER_PENDING tag in addition to the UNMAPPED tag, where a new MAPTI just clears UNMAPPED_LPI, but keeps NO_LONGER_PENDING. NO_LONGER_PENDING would be checked and cleared by update_one_lr(), UNMAPPED by vgic_vcpu_inject_irq(). That would leave the issue how to call the pend_irq_cleanup() function from update_one_lr() without breaking abstractionIf MAPTI is called with a different eventid but the same vlpi, wouldn't vlpi_to_pending return the wrong pending_irq struct in gic_update_one_lr, causing problems to both I) and II)? The flags would not be set.III) Revert to the "check for NULL" solution.Wouldn't irq_to_pending potentially return the new pending_irq instead of NULL in gic_update_one_lr? This is a problem. For example, the new pending_irq is not in the inflight list, but gic_update_one_lr will try to remove it from it. I think this approach is error prone: we should always get the right pending_irq struct, or one that is appropriately marked with a flag, or NULL. In this case, we would still have to mark the new pending_irq as "UNMAPPED" to tell gic_update_one_lr to do nothing and return. It would end up very similar to [d]. We might as well do [d].I think you still have to handle the NULL case because you may receive a spurious interrupt from the host before whilst cleaning up the mapping. The mapping may pLPI <-> vLPI may still exist, but when you get into vgic_vcpu_inject_irq the irq_to_pending will return NULL as it does not exist anymore in the radix tree. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |