[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel aio bug in Debian 2.6.32-5-xen kernel?



On Thu, 2012-04-26 at 12:23 +0100, Stefano Stabellini wrote:
> On Thu, 26 Apr 2012, Ian Campbell wrote:
> > On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
> > > Recently I pulled in new changesets from xen- and qemu-unstable, and
> > > when creating a PV guest I'm getting errors like the stack trace
> > > below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> > > in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> > > using a config file so I can see if it is qemu's aio?
> > > 
> > > The config file is attached, for reference.
> > 
> > Which revision of the Debian kernel is this?
> > 
> > It looks like Squeeze, which was a fairly old snapshot of Jeremy's
> > Xen.git -- it's certainly not impossible that there were latent AIO bugs
> > in there and Stefano has been fixing these sort of things in recent
> > kernels too. So it's very possible we need to backport some fix.
> 
> Right.
> 
> 
> > > [  408.127439] BUG: unable to handle kernel paging request at af00003e
> > > [  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
> > > [  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
> > > [  408.144532] Oops: 0000 [#1] SMP
> > > [  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
> > > [  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
> > > x_tables xen_evtchn xenfs bridge stp loop snd_p]
> > > [  408.194797]
> > > [  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
> > > (2.6.32-5-xen-686 #1) PowerEdge R710
> > > [  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
> > > [  408.210485] EIP is at set_page_dirty+0x1e/0x4a
> > > [  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
> > > [  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
> > > [  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> > > [  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
> > > task=f3de50c0 task.ti=f33be000)
> > > [  408.241508] Stack:
> > > [  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
> > > f4f0c500 c10dc048
> > > [  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
> > > c1006767 00000000 00000000
> > > [  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
> > > 0000000c 00001000 00000000
> > > [  408.267509] Call Trace:
> > > [  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
> > > [  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
> > > [  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
> > > [  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
> > > [  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
> > > [  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
> > > [  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
> > > [  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
> > > [  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > [  408.320376]  [<c1006770>] ? check_events+0x8/0xc
> > > [  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > [  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
> > > [  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
> > > [  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
> > > [  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
> > > [  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
> > > [  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
> > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
> > > [  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 
> > > 0069:f33bfdb8
> > > [  408.382512] CR2: 00000000af00003e
> > > [  408.385894] ---[ end trace 9ce48eb2f06897bf ]---
>  
> This looks like a classic direct_IO/AIO not working bug: it could be
> because the m2p_override is not working correctly or it might not even
> be present at all in this kernel (it went upstream in 2.6.38).
> It only started showing now because qemu-xen-traditional switched to
> O_DIRECT.

This kernel had VM_FOREIGN and PageForeign etc rather than the
m2p_override. Could be that we need to extend VM_FOREIGN to cover rant
mapped pages?

That's actually a fair chunk of dev work, not just a simple backport.

However this kernel does have blktap so why is qemu based AIO being used
at all?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.