[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)



>>> On 12.03.13 at 18:30, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> This issue I am encountering seems to only happen on multi-socket
> machines.
> 
> It also does not help that the only multi-socket box I have is
> an Romley-EP (so two socket SandyBridge CPUs). The other
> SandyBridge boxes I've (one socket) are not showing this. Granted
> they are also a different model (42).
> 
> The problem is that when I run 'perf top' within an SMP PVHVM
> guest, after a couple of seconds or minutes the guest hangs.
> Hypervisor ends up stuck too looping, and then the dom0 ends
> up hanging as well.
> 
> Dumping the cpu registers (Ctrl-A x3, then 'd'
> shows that the guest is pretty firmly stuck in vmx_vmexit_handler:
> 
> (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> 
> and if I let this stay for some time, dom0 detects that some
> of its VCPUs are hanged and it resorts to sending NMI. NMI
> is not implemented in pv-ops and then dom0 wedges. In some
> cases it also wedges itself when doing 'xl list' or any up-calls
> to the hypervisor.

Did you try running Xen with its watchdog (and perhaps Dom0
without)?

> Anyhow, following 'Ctrl-A x3, then 'v' tells me:
> 
> (XEN) Virtual processor ID = 0x0c02
> .. snip..
> (XEN) Virtual processor ID = 0x0fc4
> (XEN)   VCPU 3
> 
> and stays stuck there. Doing the 'Ctrl-A x3' and 'd' to
> see where it is stuck tells me:

Perhaps sending 'd' without first sending 'v' might better show where
the original hang is?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.