Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/IRQ: prevent vector sharing within IO-APICs

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 14 Nov 2011 16:29:34 +0000

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 14 Nov 2011 08:30:11 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 14/11/11 14:20, Jan Beulich wrote: >> I'm having some more fundamental problem with this original change >> of yours: The assignment of the new vector happens in the context >> of move_native_irq(), which gets called *after* doing the manual >> EOI (and hence also *after* the vector != desc->arch.vector check). >> How does that do what it is supposed to? >> >> (Below the [untested] patch that I would propose to do what I described >> above, pending your clarification regarding the original change.) >> >> Jan > > Are you sure? I was under the impression that for safety, you have to > move the IRQ before ack'ing it at the local APIC, to prevent another IRQ > coming in while you are updating the structures. Just take another look at end_level_ioapic_irq(). > The steps (as far as I remember from debugging this issue) are: > > 1) Scheduler decides move a vcpu to a different pcpu. This implies > moving all bound IRQs. It sets irq_desc->arch->pending_mask to the > pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING > 2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING > set, the ack handler rewrites the RTE to point to the new pcpu:vector, > and updates the irq_desc a bit. This rewriting necessarily happens > before sending an EOI. One would think so, but according to my reading of the code that's not the case. And it really can't, as otherwise io_apic_level_ack_pending() would always return true (and hence move_native_irq() would never get called). > 3) When the next irq appears (on the new pcpu), the code cleans up the > old vector_irq reference for the old pcpu, and tweaks some of irq_desc. But the crucial thing is that desc->arch.vector gets written during step 2, and hence the condition around the EOI can never be true unless an interrupt surfaces in the window between the EOI in step 2 and the disabling of it in move_native_irq(). Was it just this special case that your original change addressed (the change description doesn't hint in that direction). > An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR > IPI which looks through all pending cleanups on the current pcpu and > cleans them up. > > Anyway, it was certainly the case that I saw the RTE being updated > before the EOI was sent, which is why I had to change the > hw_irq_controler.end interface to include the vector from the pending > EOI stack, so end_level_ioapic_irq could EOI the new vector if it was > different from the vector the lapic saw. That's another thing I would want to revert: There really shouldn't be a need to pass in the old vector, as it is getting stored in __assign_irq_vector(). If the value wasn't valid anymore at the point we'd need it, we'd have a more severe problem - the vector gets marked as not in use by smp_irq_move_cleanup_interrupt(), and hence it wouldn't be valid anymore to play with it (read: EOI it) as it may have got re-used already. As vector_irq[] gets updated there too, that should result in "No irq handler for vector ...", but it would not make it into end_level_ioapic_irq() in such a case. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.