[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: kernel BUG at arch/x86/xen/mmu.c:1872
2011/4/10 MaoXiaoyun <tinnycloud@xxxxxxxxxxx>: > Hi Konrad & Jeremy: > > I think we finally located the missing patch for this commit. > We test commit > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=c97f681f138039425c87f35ea46a92385d81e70e > which is works. > > We test commit > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=221c64dbf860d37f841f40893bddf8d804aa55bd > which server crashed. > > Later I found the comments for this commit: > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec > > So It looks like this fix is not applied on 2.6.32.36, Could you > take a look at this? > > Many thanks. > > ===================================================== >>Hi Konrad & Jeremy: >> >> I'd like to open this BUG in a new thread, since the old thread is too >> long for easy read. >> >> We recently want to upgrade our kernel to 2.6.32, but unfortunately, >> we confront a kernel crash bug. >>Our test case is simple, start 24 win2003 HVMS on our physical machine, and >> each HVM reboot >>every 15minutes. The kernel will crash in half an hour.(That is crash on VM >> second starts). >> >>Our test go much further. >>We test different kernel version. >>2.6.32.10 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=d945b014ac5df9592c478bf9486d97e8914aab59 >>2.6.32.11 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27f948a3bf365a5bc3d56119637a177d41147815 >>2.6.32.12 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ba739f9abd3f659b907a824af1161926b420a2ce >>2.6.32.13 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=f6fe6583b77a49b569eef1b66c3d761eec2e561b >>2.6.32.15 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27ed1b0e0dae5f1d5da5c76451bc84cb529128bd >>2.6.32.21 >> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=69e50db231723596ed8ef9275d0068d6697f466a >> >>There are basic three different result we met. >> >>i1) grant table issue >>The host still function, but use xm dmesg, we have abnormal log. >>please refer to the attched log of grant table >> >>i2) kernel crash on a different place. >>Host die during the test, after reboot, we can see nothing abnormal in >> /var/log/messages >> >>i3) kernel BUG at arch/x86/xen/mmu.c:1872; >>Host die during the test, after reboot, we see the crash log in messages, >> refer to the attached log of 2.6.32.36 >>Summary of the test result, can be classified in two: >> >>1) 2.6.32.10 >>30 machines involved the test, and three has issue (i1), and two has issue >> (i2), *no* issue (i3) >>Other machines run tests successfully till now, more than 8 hours >> >>2)2.6.32.11 or later version. >>Each version containers 10 machine for tests, and all machine crashed in >> less than half an hour. >> >>Conclusion: >>1) grant table issue exists in all kernel version >>2) kernerl crash at different place may exist in all kernel versions, but >> not happen so frequently, 2 out of 30 >>3) We observe the major difference of issue i3), from the test, it looks >> like it is introduced between the version >>2.6.32.10 and 2.6.32.11. >> >>Hope this help to locate the bug. >>Many thanks. >> >> > Hi, Sorry, since this mmu related BUG has been troubled me for very long... I really want to "kill" this BUG but my knowledge in kernel hacking and/or xen is very limited. While waiting for Jeremy or Konrad or others ... Many thanks for spending time to track down this mmu related BUG. I have backported the commit from http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec to 2.6.32.36 PVOPS kernel and patch attached. I won't know whether did I backport it correctly nor does it affects anything. I am currently testing the 2.6.32.36 PVOPS kernel with this patch applied and also unset CONFIG_DEBUG_PAGEALLOC. Currently running testcrash.sh loop 1000 as I am unable to reproduce this mmu BUG 1872 in testcrash.sh loop 100. Please note that when CONFIG_DEBUG_PAGEALLOC is unset, I can reproduce this mmu BUG 1872 easily within <50 testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36 kernel. Now test with this backport patch to see whether I can reproduce this mmu BUG... ... Kindest regards, Giam Teck Choon Attachment:
vmalloc__eagerly_clear_ptes_on_vunmap.patch Attachment:
testcrash.sh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |