[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)



Am Dienstag 12 MÃrz 2013, 16:54:11 schrieb Boris Ostrovsky:
> On 03/12/2013 04:31 PM, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 12, 2013 at 02:50:59PM -0400, Boris Ostrovsky wrote:
> >> On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
> >>> This issue I am encountering seems to only happen on multi-socket
> >>> machines.
> >> I believe I was able to reproduce this (once) on my laptop.
> >>
> >>> It also does not help that the only multi-socket box I have is
> >>> an Romley-EP (so two socket SandyBridge CPUs). The other
> >>> SandyBridge boxes I've (one socket) are not showing this. Granted
> >>> they are also a different model (42).
> >>>
> >>> The problem is that when I run 'perf top' within an SMP PVHVM
> >>> guest, after a couple of seconds or minutes the guest hangs.
> >>> Hypervisor ends up stuck too looping, and then the dom0 ends
> >>> up hanging as well.
> >>>
> >>> Dumping the cpu registers (Ctrl-A x3, then 'd'
> >>> shows that the guest is pretty firmly stuck in vmx_vmexit_handler:
> >>>
> >>> (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> >> And in my case this address is the second instruction after STI, i.e. we
> >> are right at the point where interrupts got enabled.
> >>
> >> So I am wondering whether this has something to do with the counter
> >> overflow interrupt (which I believe is an NMI).
> > Interestingly enough, if I run the PVHVM guest with 'nowatchdog'
> > it runs fine!
> 
> I think by default perf top runs off timer interrupt so it does not use 
> HW counters. But watchdog
> is implemented on top of the counters so perhaps it fires the interrupt 
> at a bad time, messing
> something up.

This looks like a strange behavior we had on nehalem cpus see
http://lists.xen.org/archives/html/xen-devel/2010-11/msg01157.html
For this I added a quirk, see check_pmc_quirk() in vpmu_core2.c
The model 42 is in the quirk list and it seems to work but Romley-EP is model
43 I think which is not in the list.
Maybe you should add this model and give it a try.


Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.