[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)



On 03/12/2013 04:31 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Mar 12, 2013 at 02:50:59PM -0400, Boris Ostrovsky wrote:
On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
This issue I am encountering seems to only happen on multi-socket
machines.
I believe I was able to reproduce this (once) on my laptop.

It also does not help that the only multi-socket box I have is
an Romley-EP (so two socket SandyBridge CPUs). The other
SandyBridge boxes I've (one socket) are not showing this. Granted
they are also a different model (42).

The problem is that when I run 'perf top' within an SMP PVHVM
guest, after a couple of seconds or minutes the guest hangs.
Hypervisor ends up stuck too looping, and then the dom0 ends
up hanging as well.

Dumping the cpu registers (Ctrl-A x3, then 'd'
shows that the guest is pretty firmly stuck in vmx_vmexit_handler:

(XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
And in my case this address is the second instruction after STI, i.e. we
are right at the point where interrupts got enabled.

So I am wondering whether this has something to do with the counter
overflow interrupt (which I believe is an NMI).
Interestingly enough, if I run the PVHVM guest with 'nowatchdog'
it runs fine!

I think by default perf top runs off timer interrupt so it does not use HW counters. But watchdog is implemented on top of the counters so perhaps it fires the interrupt at a bad time, messing
something up.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.