[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped



On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote:
> On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:
> > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote:
> >> Hey,
> >>
> >> I'm experiencing strange problem: non-deterministic PV domain hang, only
> >> on some machines (with fast SSD drive). I've tried xen-4.1.0 and
> >> xen-4.1.1 with many kernels different kernels:
> >> VM:
> >>  - 2.6.38.3 xenlinux based on SUSE package
> >>  - vanilla 3.0.3
> >>  - vanilla 3.1 rc2
> >> dom0:
> >>  - 2.6.38.3 xenlinux based on SUSE package
> >>  - vanilla 3.1 rc2
> >>
> >> Result always the same: sometimes VM hang at startup, SysRq-T shows
> >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and
> >> jiffies counter not increasing between task-states dumps.
> >>
> >> The only found thing (probably) connected with this problem are domU
> >> kernel messages:
> >> CE: xen increased min_delta_ns to 150000 nsec
> >> (...)
> >> CE: xen increased min_delta_ns to 4000000 nsec
> >> CE: Reprogramming failure. Giving up
> >>
> >> This messages doesn't exists in successful boot.
> >>
> >> I've also tried some options to xen and domU kernel, but without success
> >> (all combinations):
> > 
> > BTW, your 'xencons=..' and 'swiotlb=force' are obsolete. Use
> > 'console=hvc0' and 'iommu=soft'. The 'swiotlb=force' kills performance.
> > 
> >> xen: tsc=unstable, cpufreq=none
> >> domU: nohz=off, clocksource=tsc
> >>
> >> Some combination of above options lowered frequency of problem (ex
> >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15
> >> boots fails.
> >>
> >> Have you idea what is the cause and what can help?
> > 
> > The problem looks to be xenwatch stuck. So the problem is in Dom0 right?
> 
> This "R" state of xenwatch looks like result of SysRq, which dumps data...
> 
> [  118.679707]  [<ffffffff812a8081>] handle_sysrq+0x21/0x30
> [  118.679707]  [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0
> [  118.679707]  [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170
> 
> And the problem is at DomU boot, Dom0 works without any problems.

Ok, but I am still unsure where it is hanging in DomU. Can you run with
'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
of what is stuck in the guest? You might also have better luck using
'xenctx' to get a stack trace of what is hangning in the guest.
(you will need the System.map file from the guest's kernel.. but that should
be fairly easy to extract).

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.