[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] NMI with SMP domain causing machine to reboot
Keir Thanks for your reply. I don't think the problem is caused by not properly reseting CPU1's perf counter. I can see that the number of NMIs being generated are similar both for CPU0 and CPU1, and both CPUs perf counters are being programmed in the exact same way. (The command "xenpmc -s" enables me to see the number of NMIs generated) Moreover, when we have multiple non-SMP domains running on both CPUs, this problem does not happen. Sharing of MSRs between hyperthreads should not be the problem either, since my machine has 2 physical CPUs and hyperthreading is disabled in the BIOS.(ie. CPU0 and CPU1 are distinct physical CPUs) It seems that there is something wrong or some race condition introduced by SMPs domains. Any idea of what is different in Xen (maybe interrupt handling) when you have SMP domains? Any chance you could try reproducing this behavior in one of your machines? Can you think of any situation that would cause the machine to reboot without printing any error message in the serial console? Any help is deeply appreciate since I loosing hope I will be able to nail this down by myself. It is always possible possible that I am doing something wrong, but at this point the code left is not doing much and I am starting to suspect the problem lies somewhere else in Xen. In this case I would desperately need someone else help. Thanks Renato >> -----Original Message----- >> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx] >> Sent: Friday, September 09, 2005 1:57 AM >> To: Santos, Jose Renato G >> Cc: Turner, Yoshio; xen-devel@xxxxxxxxxxxxxxxxxxx; G John Janakiraman >> Subject: Re: [Xen-devel] NMI with SMP domain causing machine >> to reboot >> >> >> >> On 8 Sep 2005, at 20:33, Santos, Jose Renato G wrote: >> >> > I have spend most of the last weeks trying to nail down >> a nasty bug >> > that is preventing me to release xenoprof for SMP domains. >> > The bug is non-deterministic and when it happens the machine just >> > reboots with no message or warning on the serial console. >> > This made the debugging process painfull and slow. >> >> Hard to say from the code, but maybe it's somethign to do with >> hyperthreading? The performance counter MSRs are shared in a >> weird way >> between hyperthreads. Maybe you're not properly resetting >> CPU1's perf >> counter and ending up with an NMI storm? >> >> -- Keir >> >> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |