[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x

On 28/03/2013 10:50, Jan Beulich wrote:
>>>> On 27.03.13 at 17:27, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 27/03/2013 15:51, Marek Marczykowski wrote:
>>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>>> memory corruption causing these effects.
>>>> Ok, will try something like this.
>>> Nothing interesting here...
>>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump 
>> information).
>> Even in the case where we hit the original assertion?
>> If so, then all I can thing is that the move_pending flag for that
>> specific GSI has been corrupted in memory somehow.
> No, I think the flag is legitimately set after resume, and gets
> looked at the after the first SCI got signaled (which would
> trigger the pending affinity change to be carried out that was
> initiated in the suspend path). The problem is a more
> fundamental one: irq_move_cleanup_interrupt() (in unstable
> terms) includes the legacy vectors, so if, upon encountering the
> move_cleanup_count for IRQ 9 (or any legacy IRQ) execution
> doesn't make it all the way through to carrying out the cleanup,
> the loop, once in the legacy vector range, will re-encounter the
> same IRQ, find move_cleanup_count non-zero again, and thus
> tries to do something here.
> Hence I think skipping the legacy vector range here is indeed
> necessary, even outside the suspend/resume scenario (see
> below). Another alternative would be to invalidate the
> vector_irq[] entries for legacy vectors handled through the
> Jan
> x86: irq_move_cleanup_interrupt() must ignore legacy vectors
> Since the main loop in the function includes legacy vectors, and since
> vector_irq[] gets set up for legacy vectors regardless of whether those
> get handled through the IO-APIC, it must not do anything on this vector
> range. In fact, we should never get here for IRQs not handled through
> the IO-APIC, so add a respective warning at once (could probably as
> well be an ASSERT()).
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Under what circumstances would we have any vectors 0xe0-0xef programmed
into the IOAPIC?  I cant think of any offhand.

As far as I am aware, it is not valid for any PIC interrupts to ever be
up for moving, as they should only be delivered to the BSP.

In addition to the check you have, the scope of the loop should probably
be reduced.  We should never be considering to move any vector larger
than LAST_HIPRIORITY_VECTOR, which I believe are all LAPIC interrupts,
making 8 useless iterations of the loop.  I would also suggest that it
is an ASSERT rather than a WARN, but that leaves us not fixing the bug
at hand, as we have already verified that vector 0xe9 is not programmed
into the IOAPIC.


> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -625,6 +625,12 @@ void irq_move_cleanup_interrupt(struct c
>          if ((int)irq < 0)
>              continue;
> +        if ( vector >= FIRST_LEGACY_VECTOR && vector <= LAST_LEGACY_VECTOR )
> +        {
> +            WARN_ON(!IO_APIC_IRQ(irq));
> +            continue;
> +        }
> +
>          desc = irq_to_desc(irq);
>          if (!desc)
>              continue;

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.