[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel 3.7.[12] - irq 16: nobody cared
Hi Jan, On 16/01/2013 2:23 AM, Jan Beulich wrote: I'm not 100% sure how to do this. I haven't been able to find a method to cause the problem to happen... It just does - and it seems random when it does happen. Part of the problem with running the system without the hypervisor in place is that I can't replicate any kind of workload that would normally trigger the problem.On 15.01.13 at 04:27, Steven Haigh <netwiz@xxxxxxxxx> wrote:irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1 Call Trace: <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6 [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5 [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6 [<ffffffff810a5a37>] handle_irq_event+0x38/0x54 [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5 [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7 [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42 [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a [<ffffffff810169b1>] ? default_idle+0x50/0x8a [<ffffffff81016318>] ? cpu_idle+0xc0/0xff [<ffffffff8148160e>] ? rest_init+0x72/0x74 [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd [<ffffffff817455a7>] ? repair_env_string+0x58/0x58 [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4 handlers: [<ffffffffa012edd9>] mv_interrupt [sata_mv] Disabling IRQ #16 I have tried booting with the irqpoll option on the kernel boot line, but the same problem occurs. It seems disk throughput almost drops dead when this happens - as the SATA controller seems to go into some different mode of operation. It also seems like this has only happened recently - I was using builds of 3.6.x as my Xen Dom0 kernel with no signs of this problem. Has anyone else seen this in recent kernel releases? I'm not quite sure how to try and track this down.First of all, you'll want to clarify whether this problem is present _only_ when running under Xen, or also when running the same kernel without Xen underneath. This is primarily because the output you provided shows that IRQ 16 actually has a handler, just that it apparently ignores the interrupts (and that's nothing that Xen controls). Then, if this is a Xen-only problem, you will want to provide full hypervisor and kernel (boot) logs, the hypervisor one including debug key 'i' output, and the kernel one once with and once without Xen. Finally you'll want to clarify whether, when updating the kernel, you also updated the hypervisor (and if so, try the know good and known bad kernels on identical hypervisors). I have been running Xen 4.2.1 for a while - and used multiple kernel versions with it. Sadly, I don't have an archive of the RPMs that I used (even though I built them!). I've only really noticed this happening in the last month - when I've been running kernel 3.7.1+ On the off chance today, I have moved the card from one 16x PCIe slot to the second one on the mainboard. This has moved the card from IRQ16 to IRQ19. As of yet, I haven't had the problem occur - however as it is a seemingly random occurrence, there is no guarantee that the problem is solved. I've tried loading up the i/o by doing a resync of the RAID6 (of which, 2 drives are on the sata_mv card) as well as hammering i/o in the DomUs (rather random stuff), but still no reliable way to force the problem to occur :( I'm open to any suggestions :) -- Steven Haigh Email: netwiz@xxxxxxxxx Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |