[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!



On Wed, Mar 09, 2011 at 07:58:39AM +0100, Andreas Olsowski wrote:
> 
> I encountered this bug in 2010 when xen 4.0 was released, around the
> time development on 2.6.31 was halted.

That is interesting data. Can you give more details on what 2.6.31 kernel
and hypervisor you are using? Have you tried to rev the hypervisor
up to Xen 4.1.0-rc7-pre for example?
> 
> That is why i stuck with 2.6.31 when everyone else went with 2.6.32,
> because i determined 2.6.32 was not stable and im guessing it still
> isnt today.
> 
> The bug occures on 2.6.32 xen kernels ( maybe even newer ones) and

Good point. Let me check 2.6.38.
> is distribution unrelated, i was running debian 5.0 then, i am
> running 6.0 testing now and even have tried compiling all the
> userland stuff myself.

<nods>The overnight tests on Fedora Core 13 showed the same failure
@ loop 97.
> 
> This is error can be encountered during a number of different actions:
> 1.) any action with lvm (start, stop, create, delete)
> 2.) while starting multipathd (restarting too, of course)
> 
> Sometimes the box only hangs there and no further device
> interactions are possible. This is where i got my syslog entry from.
> Some other times, processes like vgchange just ... hang, for no
> particular reason.
> 
> Back in 2010 i had to serial console the server and stuff like that
> to see the whole error.
> If you tried to use anything that did sth with device mapper, it
> would just ... hang. xm list for example.
> 
> 
> my guess is everything one does with the device mapper can and will
> trigger this sooner or later.
> 
> Does anybody have any kind of insight on what the problem may be?

The problem is that when an user application dies, the page-table is
discarded (which is a process of unpinning the pages) and we let the
pages be re-used. When another application is launched we construct a new
page-table (and pin the page-table) and when it exits we do the same thing.

The issue is that during the construction (the application just forked)
we encounter a page that used to belong to now a discarded page-table and Xen
(rightly) tells us that we are trying to pin an already pinned page.

Pinning here is the process of letting Xen inspect the pagetable so that
it can assert that there are no machine addresses that point to the
hypervisor or another guest.

Back to the problem. Xen tell us that we are trying to pin an
already pinned page  _way_ after we had discarded (or thought we had)
the old page-tables - so finding the culprit of _why_ we missed a page
is difficult as it had happend in the past. Jeremy had gone over
with a fine comb over the code that deals with construction/deconstruction
and made sure there are no races but there is obviously something amiss.

> 
> 
> ------------
> Here is my syslog part when i ran "/etc/init.d/multipath-tools restart":
> 
> Mar  9 00:24:10 memoryana multipathd: mpatha: stop event checker
> thread (140606587918080)
> Mar  9 00:24:10 memoryana multipathd: mpathb: stop event checker
> thread (140606587885312)
> Mar  9 00:24:10 memoryana multipathd: mpathc: stop event checker
> thread (140606587852544)
> Mar  9 00:24:10 memoryana kernel: ------------[ cut here ]------------
> Mar  9 00:24:10 memoryana kernel: kernel BUG at arch/x86/xen/mmu.c:1872!
> Mar  9 00:24:10 memoryana kernel: invalid opcode: 0000 [#1] SMP
> Mar  9 00:24:10 memoryana kernel: last sysfs file: 
> /sys/devices/pci0000:00/0000:00:07.0/0000:04:00.1/host3/rport-3:0-2/target3:0:2/3:0:2:0/state
> Mar  9 00:24:10 memoryana kernel: CPU 1
> Mar  9 00:24:10 memoryana kernel: Modules linked in: dm_round_robin
> dm_multipath qla2xxx
> Mar  9 00:24:10 memoryana kernel: Pid: 10662, comm: multipath-tools
> Not tainted 2.6.32.28-xen0 #4 PowerEdge R610
> Mar  9 00:24:10 memoryana kernel: RIP: e030:[<ffffffff8100d471>]
> [<ffffffff8100d471>] pin_pagetable_pfn+0x31/0x60
> Mar  9 00:24:10 memoryana kernel: RSP: e02b:ffff8800c3101df8
> EFLAGS: 00010282
> Mar  9 00:24:10 memoryana kernel: RAX: 00000000ffffffea RBX:
> ffff8800cc4c3400 RCX: 0000000000000003
> Mar  9 00:24:10 memoryana kernel: RDX: 0000000000000000 RSI:
> 0000000000000001 RDI: ffff8800c3101df8
> Mar  9 00:24:10 memoryana kernel: RBP: ffff8800c3135b60 R08:
> 00003ffffffff000 R09: ffff880000000000
> Mar  9 00:24:10 memoryana kernel: R10: 0000000000007ff0 R11:
> 0000000000000246 R12: 00000000000cc302
> Mar  9 00:24:10 memoryana kernel: R13: 0000000000000000 R14:
> ffff8800c374cc60 R15: ffff8800c374cc60
> Mar  9 00:24:10 memoryana kernel: FS:  00007f60add15700(0000)
> GS:ffff880028055000(0000) knlGS:0000000000000000
> Mar  9 00:24:10 memoryana kernel: CS:  e033 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Mar  9 00:24:10 memoryana kernel: CR2: 00007f60ad841876 CR3:
> 00000000cef79000 CR4: 0000000000002660
> Mar  9 00:24:10 memoryana kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000
> Mar  9 00:24:10 memoryana kernel: DR3: 0000000000000000 DR6:
> 00000000ffff0ff0 DR7: 0000000000000400
> Mar  9 00:24:10 memoryana kernel: Process multipath-tools (pid:
> 10662, threadinfo ffff8800c3100000, task ffff8800cc01cbc0)
> Mar  9 00:24:10 memoryana kernel: Stack:
> Mar  9 00:24:10 memoryana kernel: 0000000000000000 00000000008e8302
> ffff8800cc4c3400 ffff8800c3135b60
> Mar  9 00:24:10 memoryana kernel: <0> 00000000000cc302
> ffffffff810b0382 00007f60ad841876 ffff8800c30b4c10
> Mar  9 00:24:10 memoryana kernel: <0> 00000000000100e0
> 0000000000000000 ffff8800c374cc60 ffffffff810b3595
> Mar  9 00:24:10 memoryana kernel: Call Trace:
> Mar  9 00:24:10 memoryana kernel: [<ffffffff810b0382>] ?
> __pte_alloc+0xf2/0x120
> Mar  9 00:24:10 memoryana kernel: [<ffffffff810b3595>] ?
> handle_mm_fault+0xa45/0xab0
> Mar  9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ?
> page_fault+0x25/0x30
> Mar  9 00:24:10 memoryana kernel: [<ffffffff8153d21a>] ?
> error_exit+0x2a/0x60
> Mar  9 00:24:10 memoryana kernel: [<ffffffff8101481d>] ?
> retint_restore_args+0x5/0x6
> Mar  9 00:24:10 memoryana kernel: [<ffffffff81038631>] ?
> do_page_fault+0x121/0x3c0
> Mar  9 00:24:10 memoryana kernel: [<ffffffff812a2e0d>] ?
> __put_user_4+0x1d/0x30
> Mar  9 00:24:10 memoryana kernel: [<ffffffff8153cfe5>] ?
> page_fault+0x25/0x30
> Mar  9 00:24:10 memoryana kernel: Code: 57 c7 75 00 00 48 89 f0 89
> 3c 24 74 27 48 89 44 24 08 48 89 e7 be 01 00 00 00 31 d2 41 ba f0 7f
> 00 00 e8 d3 be ff ff 85 c0 74 04 <0f> 0b eb fe 48 83 c4 28 c3 48 89
> f7 e8 6e f7 ff ff 48 83 f8 ff
> Mar  9 00:24:10 memoryana kernel: RIP  [<ffffffff8100d471>]
> pin_pagetable_pfn+0x31/0x60
> Mar  9 00:24:10 memoryana kernel: RSP <ffff8800c3101df8>
> Mar  9 00:24:10 memoryana kernel: ---[ end trace f4eae184c1a9f532 ]---
> Mar  9 00:24:11 memoryana multipathd: --------shut down-------
> 
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.