[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)

On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
This issue I am encountering seems to only happen on multi-socket

I believe I was able to reproduce this (once) on my laptop.

It also does not help that the only multi-socket box I have is
an Romley-EP (so two socket SandyBridge CPUs). The other
SandyBridge boxes I've (one socket) are not showing this. Granted
they are also a different model (42).

The problem is that when I run 'perf top' within an SMP PVHVM
guest, after a couple of seconds or minutes the guest hangs.
Hypervisor ends up stuck too looping, and then the dom0 ends
up hanging as well.

Dumping the cpu registers (Ctrl-A x3, then 'd'
shows that the guest is pretty firmly stuck in vmx_vmexit_handler:

(XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174

And in my case this address is the second instruction after STI, i.e. we
are right at the point where interrupts got enabled.

So I am wondering whether this has something to do with the counter
overflow interrupt (which I believe is an NMI).


and if I let this stay for some time, dom0 detects that some
of its VCPUs are hanged and it resorts to sending NMI. NMI
is not implemented in pv-ops and then dom0 wedges. In some
cases it also wedges itself when doing 'xl list' or any up-calls
to the hypervisor.

Anyhow, following 'Ctrl-A x3, then 'v' tells me:

(XEN) Virtual processor ID = 0x0c02
.. snip..
(XEN) Virtual processor ID = 0x0fc4
(XEN)   VCPU 3

and stays stuck there. Doing the 'Ctrl-A x3' and 'd' to
see where it is stuck tells me:

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.