[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HPET stack overflow, and general problems with do_IRQ()

On 16/08/2013 08:53, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:

>>>> On 15.08.13 at 22:21, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> Hello,
>> I have finally managed to get a full stack dump from affected hardware.
>> The logs can be found here (including hypervisor with debugging symbols):
>> http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz
>> The interesting log file is xen.pcpu0.stack.log
>> By my count (grepping for e008 as CS), there are are 8 exception frames
>> on the Xen stack (all stack page 6)
>> However, because of the early ack() at the LAPIC, and disabling of
>> interrupts, the vectors (in order of interrupts arriving) are
>> c1, 99, b1, b9, a9, a1, 91, 89
> So these are all HPET interrupts as it seems to me. You said the
> box just has 8 of them, so the fundamental problem is not the
> general handling of interrupts that you talk about below, but the
> fact that _all_ these channels are bound to CPU0: That's an
> insane side effect of the way channel management works when
> there are (potentially) more CPUs than channels. So _I_ think
> this is what needs fixing.
> That's even more so that the above sequence would be impossible
> for guest interrupts (which don't get EOI-ed immediately, and
> interrupts don't get re-enabled on that path either). Hence in the
> discussion here we need to only be concerned of interrupts that
> Xen uses for itself: timer, console, iommu, and HPET. Out of these,
> timer and console - going through the IO-APIC - are safe from this
> because of how io_apic.c implements the ->ack()/->end() pairs.
> Both IOMMU implementations ack their IRQs in the LAPIC only in
> ->end(). And that's what I suggested to switch HPET to too. And
> other than I said about this earlier, disabling interrupts in the
> ->end() handler isn't even necessary, as it already gets called with
> them disabled.
> So we have two possible fixes to the HPET, either of which is
> very likely to deal with the problem on its own.

Additionally, with per-vcpu stacks we could have a larger per-cpu irq stack.
It would be easier to grow that without 'wasting' memory. Although I think
Jan's arguments above do make sense.

 -- Keir

> Jan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.