[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860!
All xen 4.1.0 test were done on server1 (netcatarina). All but one test with xen 4.0.1 were made on server2 (memoryana).Why i had to rerun one of the test for server2 on server1 is explained below. Here are my test results: ====================================================== Kernel 2.6.32.28 without XEN: about 50 successful runs of Teck Choon Giams "test.sh" script. (modified for handling 10 test volumes and sleeping 2 seconds) multipathd restarted succesfully s multipath module loaded/unloaded successfully lvm2 restarted successfully ====================================================== Kernel 2.6.38 without XEN: about 20 successful runs of "test.sh" multipathd restarted succesfully s multipath module loaded/unloaded successfully lvm2 restarted successfully ====================================================== Kernel 2.6.32.28 with XEN 4.0.1: at about loop 2 for volume 7 of "test.sh" it stopped doing ... well anything there has been no output on the screen and neitehr syslog nor dmesg entry.I left it hanging for about 15 Minutes until i decided to write this one off as a side effect of the same underlying problem. All lvm2 tools stopped working and i couldnt shut it down. Killing the hangig process ended it properly.I did a cold reset of the server, as i wanted to see the discussed BUG again. But i failed here. It would seem like my server2 has some kind of addressing error: pci 000:04:00.1: BAR 6: address space collision of device .... 0000:04:00.1: is one of my QLogic HBAsAnd since i use centralized FC storage ... who knows what side effects happened here. Interesting enough i had no problems with kernel 2.6.38 on this machine.So i downgraded server1 that did never show this message to xen 4.0.1 and ran the test: after 2 loops at volume 5 i hit "kernel BUG at arch/x86/xen/mmu.c" again. ====================================================== Kernel 2.6.38 with XEN 4.0.1: 100 runs of test.sh without error multipathd restarted successfully multipath module loaded/unloaded successfully lvm2 stop/start ok ====================================================== Kernel 2.6.32.28 with XEN 4.1.0-rc7: booted at first: crash afer only 5 iterations of "test.sh" http://pastebin.com/uNL7ehZ8later, after having booted 2.6.38 on this server to test it with xen 4.1, i encountered different error at boottime: BUG: unable to handle kernel paging request at ffff8800cc3e5f48 Only have pictures of it: http://141.39.208.101/err1.png http://141.39.208.101/err2.pngI then did a cold boot of the server, as this has proven to make it boot in the past. When this did not help, i stopped the test.sh running on my other server, because the hang came when lvm2 was started and the servers use shared storage. Apparently this helped, the server booted fine after another cold reset.After that i encountered an error again at loop 10 of "test.sh", but not with the "kernel BUG at arch/x86/xen/mmu.c", but again, with "BUG: unable to handle kernel paging request at ffff8800cc61ce010" http://141.39.208.101/err3.png http://141.39.208.101/err4.png ====================================================== Kernel 2.6.38 with XEN 4.1.0-rc7: 100 runs of test.sh without error multipathd restarted successfully multipath module loaded/unloaded successfully lvm2 stop/start ok ====================================================== Summary ====================================================== So thats two different errors i have encountered, one is the "kernel BUG at arch/x86/xen/mmu.c", the other is "BUG: unablte to handle kernel paging request" Both only apply to 2.6.32 when running under eitehr xen4.0.1 or 4.1. On its own the kernel works fine. Kernel 2.6.38 ran fine on both hypervisors as well as on its own. One other issue occured that i didnt expect:With the same .config (make oldconfig), 2.6.38 left my screen black after loading the kernel, on both hypervisors. The servers worked just fine, i just didnt see any output on their VGA ports. I hope this information helps you to hunt this bug down as it effectively makes the "default" Xen unusable in server situations where the device mapper is involved. It is puzzling to me why noone did notice it last year, am i the only one running xen on server hardware (Dell R610, 710 and 2950) with centralized storage (FibreChannel or iSCSI) and using it as environment for production. Is multipathing two links to a centralized storage and using LVM2 to split it up for virtual machines running on two or more servers really such a rare thing to find Xen running on? Btw, who is currently working on the remus implementation? If you should need any more testing from me, feel free to ask. Best regards. -- Andreas Olsowski Attachment:
smime.p7s _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |