[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared

Hi Jan,

On 16/01/2013 2:23 AM, Jan Beulich wrote:
On 15.01.13 at 04:27, Steven Haigh <netwiz@xxxxxxxxx> wrote:
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
Call Trace:
   <IRQ>  [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
   [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
   [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
   [<ffffffff810a5a37>] handle_irq_event+0x38/0x54
   [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
   [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
   [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
   [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
   <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
   [<ffffffff810169b1>] ? default_idle+0x50/0x8a
   [<ffffffff81016318>] ? cpu_idle+0xc0/0xff
   [<ffffffff8148160e>] ? rest_init+0x72/0x74
   [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
   [<ffffffff817455a7>] ? repair_env_string+0x58/0x58
   [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
   [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
[<ffffffffa012edd9>] mv_interrupt [sata_mv]
Disabling IRQ #16

I have tried booting with the irqpoll option on the kernel boot line,
but the same problem occurs.

It seems disk throughput almost drops dead when this happens - as the
SATA controller seems to go into some different mode of operation. It
also seems like this has only happened recently - I was using builds of
3.6.x as my Xen Dom0 kernel with no signs of this problem.

Has anyone else seen this in recent kernel releases? I'm not quite sure
how to try and track this down.
First of all, you'll want to clarify whether this problem is present
_only_ when running under Xen, or also when running the same
kernel without Xen underneath. This is primarily because the
output you provided shows that IRQ 16 actually has a handler,
just that it apparently ignores the interrupts (and that's nothing
that Xen controls).
I'm not 100% sure how to do this. I haven't been able to find a method to cause the problem to happen... It just does - and it seems random when it does happen. Part of the problem with running the system without the hypervisor in place is that I can't replicate any kind of workload that would normally trigger the problem.
Then, if this is a Xen-only problem, you will want to provide full
hypervisor and kernel (boot) logs, the hypervisor one including
debug key 'i' output, and the kernel one once with and once
without Xen.

Finally you'll want to clarify whether, when updating the kernel,
you also updated the hypervisor (and if so, try the know good
and known bad kernels on identical hypervisors).

I have been running Xen 4.2.1 for a while - and used multiple kernel versions with it. Sadly, I don't have an archive of the RPMs that I used (even though I built them!). I've only really noticed this happening in the last month - when I've been running kernel 3.7.1+

On the off chance today, I have moved the card from one 16x PCIe slot to the second one on the mainboard. This has moved the card from IRQ16 to IRQ19. As of yet, I haven't had the problem occur - however as it is a seemingly random occurrence, there is no guarantee that the problem is solved. I've tried loading up the i/o by doing a resync of the RAID6 (of which, 2 drives are on the sata_mv card) as well as hammering i/o in the DomUs (rather random stuff), but still no reliable way to force the problem to occur :(

I'm open to any suggestions :)

Steven Haigh

Email: netwiz@xxxxxxxxx
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.