[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped



On 30.08.2011 19:18, Marek Marczykowski wrote:
> On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:
>> Ok, but I am still unsure where it is hanging in DomU. Can you run with
>> 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
>> of what is stuck in the guest? 
> 
> With "initcall_debug" parameter problem does not appear (at least for
> 200 domU starts)... It looks like race condition which doesn't happens
> on slowed down kernel (by printing lots of debug info). This also
> explains why this bug appears only on fast hardware.
> 
>> You might also have better luck using
>> 'xenctx' to get a stack trace of what is hangning in the guest.
>> (you will need the System.map file from the guest's kernel.. but that should
>> be fairly easy to extract).
> 
> xenctx didn't provide any useful data :/ It always shows following trace
> for hanged domU:
> -----------------
> rip: ffffffff810013aa hypercall_page+0x3aa
> flags: 00001246 i z p
> rsp: ffffffff81801ee0
> rax: 0000000000000000 rcx: ffffffff810013aa   rdx: 0000000000000000
> rbx: ffffffff81800010 rsi: 00000000deadbeef   rdi: 00000000deadbeef
> rbp: ffffffff81801ef8  r8: 0000000000000000    r9: 0000000000000000
> r10: 0000000000000000 r11: 0000000000000246   r12: 0000000000000000
> r13: 0000000000000000 r14: ffffffffffffffff   r15: 0000000000000000
>  cs: e033      ss: e02b        ds: 0000        es: 0000
>  fs: 0000 @ 0000000000000000
>  gs: 0000 @ ffff880018ee7000/0000000000000000
> Code (instr addr ffffffff810013aa)
> cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b
> 59 c3 cc cc cc cc cc cc cc
> 
> 
> Stack:
>  0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
>  ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
>  ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
>  ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88
> 
> Call Trace:
>   [<ffffffff810013aa>] hypercall_page+0x3aa  <--
>   [<ffffffff810072a0>] xen_safe_halt+0x10
>   [<ffffffff81012528>] default_idle+0x58
>   [<ffffffff81009faf>] cpu_idle+0x5f
>   [<ffffffff813fb388>] rest_init+0x68
>   [<ffffffff81875c79>] start_kernel+0x36f
>   [<ffffffff81875346>] x86_64_start_reservations+0x131
>   [<ffffffff81878245>] xen_start_kernel+0x5f1
> ------------------
> 
> I've collected few more messages from successful and failed domU starts.
> The only difference is the place where "Switched to NOHz mode on CPU #0"
> appears and existence of "CE: xen increased min_delta_ns to ..." and
> "CE: Reprogramming failure. Giving up" messages.
> 
> I think it can be related to:
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
> (this was on HVM not PV, but looks similar)
> 
> I've tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
> but it doesn't help. Also pinning vcpu doesn't help (this domUs have
> only 1 vcpu). Is 'xenpm set-max-cstate 0' the same as booting xen with
> max_cstate=0?

Looks like tsc_mode=2 solves the problem.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.