[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [SPAM] Re: kernel BUG at arch/x86/xen/mmu.c:1860! - ideas.
On Fri, Mar 25, 2011 at 11:57 AM, Teck Choon Giam <giamteckchoon@xxxxxxxxx> wrote: > On Thu, Mar 24, 2011 at 7:57 PM, Konrad Rzeszutek Wilk > <konrad.wilk@xxxxxxxxxx> wrote: >> On Wed, Mar 16, 2011 at 12:40:01PM -0400, Konrad Rzeszutek Wilk wrote: >>> > > - turn on CONFIG_DEBUG_PAGEALLOC >>> > > - turn on CONFIG_DEBUG_LIST >>> > > - turn on CONFIG_DEBUG_KMEMLEAK >>> > > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG >>> > > - turn on CONFIG_SLUB_DEBUG_ON >>> > > >>> > > And see if anything starts coming out. >>> > > >>> > >>> > Thanks a lot for both of you spending time to do so. It isn't easy as I >>> > believe this is something related to kernel 2.6.32.x and just wondering is >>> > there something related to *sched_domains? I read recent mails in LKML >>> >>> Hmmm.. no idea. >>> > about rebuild_sched_domains consider dangerous issues... and that is about >>> > recent kernels but won't know what recent kernels that refer to... ... >>> > >>> > I will do those config changes in one of my test server when time permit >>> > and >>> > will post results/output here when done. >>> >>> OK. Thank you! >> >> I've been using Jermey's latest tree: 2.6.32.32 (there is even a 2.6.32.33) >> and I can't hit this bug anymore. Would appreciate your input if you still >> see this. >> > > This is a report back to you that unfortunately I still able to hit this bug > :( > > git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git > git branch = xen/next-2.6.32 > git commit = 4306ea8f6db3d83a5a2bbfe5448dd78e6846475a > > Thanks. > > Kindest regards, > Giam Teck Choon > Maybe this is good news ;) This is my report about various suggested kernel configuration options suggested by Konrad and Jeremy. I think I caught the cause or prevent this same BUG from happening so that Konrad or Jeremy have fewer place to look into it. Sorry, this will be little lengthy and sorry for my poor English. I am using the following: git url = git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git git branch = xen/next-2.6.32 git commit = df3a5560166da5a05de93f2fc36b718cc43c6c3c hg_root = http://xenbits.xensource.com/xen-4.0-testing.hg hg_changeset = 21465 With my old kernel config, I still hit this BUG with testcrash loop 100 easily. In fact, with loop below 30 I mostly will hit this same BUG. My two test servers setup with at least 20 x 5GB LV. So each loop cycle will have at least 20 lvcreate/lvremove snapshots/mount/umount. With the suggested CONFIGURATION changes by Konrad and Jeremy, I am unable to reproduce this same BUG for testcrash.sh loop 1000 for two of my test servers. The following are the summary/short note: > - turn on CONFIG_DEBUG_PAGEALLOC Ok, set. > - turn on CONFIG_DEBUG_LIST Already set originally. > - turn on CONFIG_DEBUG_KMEMLEAK Don't think I can enable this with x86_64 as there isn't an option for it in x86_64 arch. However, I can see this option in x86_32 arch so I guess it is dependent in x86_32. Anyway, I don't think this is important for my case... ... why... read on... ... :P > - turn on CONFIG_JBD_DEBUG, CONFIG_JBD2_DEBUG Ok, set. > - turn on CONFIG_SLUB_DEBUG_ON Ok, set as I need to change from CONFIG_SLAB to CONFIG_SLUB instead which also set CONFIG_SLUB_DEBUG=y besides CONFIG_SLUB_DEBUG_ON=y. So from the testcrash results for two of my servers, I know there must be related to the kernel CONFIGURATION changes and one of them is the cause to prevent hitting this BUG. Now I am testing to set one of the mentioned CONFIG at a time then run the same testcrash again to determine which is the only CONFIG option that will not trigger this same BUG. The results as below all using my old config as base with *only one CONFIG option change at a time* to run testcrash loop 100: With CONFIG_DEBUG_PAGEALLOC=y: Result : Think should be this one to prevent hitting this same BUG as one of my test server already past testcrash loop cycle 100... ... now testing testcrash loop 10000 :P With CONFIG_SLUB=y and CONFIG_SLUB_DEBUG=y: Result : CRASH With CONFIG_SLUB=y, CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y: Result : CRASH With CONFIG_JBD_DEBUG=y: Result : CRASH With CONFIG_JBD2_DEBUG=y: Result : CRASH Can others who hit this same BUG reconfirm that your kernel config is without CONFIG_DEBUG_PAGEALLOC being set/on? I think most production servers will not have this config option enable in default. If so, can test with CONFIG_DEBUG_PAGEALLOC=y instead? Sorry, currently still in testing phrase for such configuration and hopefully can pass this testcrash with loop 10000 for one of my server (am I crazy? LOL). If this is really the case (I hope) then I guess there must be some conditional difference for CONFIG_PAGEALLOC as without CONFIG_DEBUG_PAGEALLOC set it will hit this BUG but with it set to on it won't (at least during my composing of this mail reply/report)... ... I will report back my final testcrash loop 10000 result when finish... ... keeping fingers crossed!!! Can anyone test with kernel version 2.6.38 PVOPS tree with CONFIG_DEBUG_PAGEALLOC not set and set to see whether such BUG exists in 2.6.38? I hope this report is useful especially to Konrad and Jeremy... ... ;) Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |