[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Steven Haigh <netwiz@xxxxxxxxx>
Date: Wed, 16 Jan 2013 04:15:38 +1100
Cc: xen-devel@xxxxxxxxxxxxx
Delivery-date: Tue, 15 Jan 2013 18:09:34 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi Jan,

On 16/01/2013 2:23 AM, Jan Beulich wrote:

On 15.01.13 at 04:27, Steven Haigh <netwiz@xxxxxxxxx> wrote:

irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
Call Trace:
   <IRQ>  [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
   [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
   [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
   [<ffffffff810a5a37>] handle_irq_event+0x38/0x54
   [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
   [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
   [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
   [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
   <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
   [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
   [<ffffffff810169b1>] ? default_idle+0x50/0x8a
   [<ffffffff81016318>] ? cpu_idle+0xc0/0xff
   [<ffffffff8148160e>] ? rest_init+0x72/0x74
   [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
   [<ffffffff817455a7>] ? repair_env_string+0x58/0x58
   [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
   [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
handlers:
[<ffffffffa012edd9>] mv_interrupt [sata_mv]
Disabling IRQ #16

I have tried booting with the irqpoll option on the kernel boot line,
but the same problem occurs.

It seems disk throughput almost drops dead when this happens - as the
SATA controller seems to go into some different mode of operation. It
also seems like this has only happened recently - I was using builds of
3.6.x as my Xen Dom0 kernel with no signs of this problem.

Has anyone else seen this in recent kernel releases? I'm not quite sure
how to try and track this down.

First of all, you'll want to clarify whether this problem is present
_only_ when running under Xen, or also when running the same
kernel without Xen underneath. This is primarily because the
output you provided shows that IRQ 16 actually has a handler,
just that it apparently ignores the interrupts (and that's nothing
that Xen controls).

I'm not 100% sure how to do this. I haven't been able to find a methodto cause the problem to happen... It just does - and it seems randomwhen it does happen. Part of the problem with running the system withoutthe hypervisor in place is that I can't replicate any kind of workloadthat would normally trigger the problem.

Then, if this is a Xen-only problem, you will want to provide full
hypervisor and kernel (boot) logs, the hypervisor one including
debug key 'i' output, and the kernel one once with and once
without Xen.

Finally you'll want to clarify whether, when updating the kernel,
you also updated the hypervisor (and if so, try the know good
and known bad kernels on identical hypervisors).

I have been running Xen 4.2.1 for a while - and used multiple kernelversions with it. Sadly, I don't have an archive of the RPMs that I used(even though I built them!). I've only really noticed this happening inthe last month - when I've been running kernel 3.7.1+

On the off chance today, I have moved the card from one 16x PCIe slot tothe second one on the mainboard. This has moved the card from IRQ16 toIRQ19. As of yet, I haven't had the problem occur - however as it is aseemingly random occurrence, there is no guarantee that the problem issolved. I've tried loading up the i/o by doing a resync of the RAID6 (ofwhich, 2 drives are on the sata_mv card) as well as hammering i/o in theDomUs (rather random stuff), but still no reliable way to force theproblem to occur :(


I'm open to any suggestions :)

--
Steven Haigh

Email: netwiz@xxxxxxxxx
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
  - From: Jan Beulich

References:
- [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
  - From: Steven Haigh
- Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
Next by Date: Re: [Xen-devel] S3 resume issues
Previous by thread: Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
Next by thread: Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.