[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/IRQ: prevent vector sharing within IO-APICs



>>> On 14.11.11 at 16:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 14/11/11 14:20, Jan Beulich wrote:
>> I'm having some more fundamental problem with this original change
>> of yours: The assignment of the new vector happens in the context
>> of move_native_irq(), which gets called *after* doing the manual
>> EOI (and hence also *after* the vector != desc->arch.vector check).
>> How does that do what it is supposed to?
>>
>> (Below the [untested] patch that I would propose to do what I described
>> above, pending your clarification regarding the original change.)
>>
>> Jan
> 
> Are you sure?  I was under the impression that for safety, you have to
> move the IRQ before ack'ing it at the local APIC, to prevent another IRQ
> coming in while you are updating the structures.

Just take another look at end_level_ioapic_irq().

> The steps (as far as I remember from debugging this issue) are:
> 
> 1) Scheduler decides move a vcpu to a different pcpu.  This implies
> moving all bound IRQs.  It sets irq_desc->arch->pending_mask to the
> pcpu(s) we wish to move to, and sets irq_desc->static |= IRQ_MOVE_PENDING
> 2) When an irq appears (on the original pcpu) which has IRQ_MOVE_PENDING
> set, the ack handler rewrites the RTE to point to the new pcpu:vector,
> and updates the irq_desc a bit.  This rewriting necessarily happens
> before sending an EOI.

One would think so, but according to my reading of the code that's
not the case. And it really can't, as otherwise
io_apic_level_ack_pending() would always return true (and hence
move_native_irq() would never get called).

> 3) When the next irq appears (on the new pcpu), the code cleans up the
> old vector_irq reference for the old pcpu, and tweaks some of irq_desc.

But the crucial thing is that desc->arch.vector gets written during step 2,
and hence the condition around the EOI can never be true unless an
interrupt surfaces in the window between the EOI in step 2 and the
disabling of it in move_native_irq(). Was it just this special case that
your original change addressed (the change description doesn't hint in
that direction).

> An alternative to 3) can occur when using the IRQ_MOVE_CLEANUP_VECTOR
> IPI which looks through all pending cleanups on the current pcpu and
> cleans them up.
> 
> Anyway, it was certainly the case that I saw the RTE being updated
> before the EOI was sent, which is why I had to change the
> hw_irq_controler.end interface to include the vector from the pending
> EOI stack, so end_level_ioapic_irq could EOI the new vector if it was
> different from the vector the lapic saw.

That's another thing I would want to revert: There really shouldn't be
a need to pass in the old vector, as it is getting stored in
__assign_irq_vector(). If the value wasn't valid anymore at the point
we'd need it, we'd have a more severe problem - the vector gets
marked as not in use by smp_irq_move_cleanup_interrupt(), and
hence it wouldn't be valid anymore to play with it (read: EOI it) as
it may have got re-used already. As vector_irq[] gets updated there
too, that should result in "No irq handler for vector ...", but it would
not make it into end_level_ioapic_irq() in such a case.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.