[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-ia64-devel] Weekly benchmark results [ww24]



==================
STATUS
==================
I do the heavy load test of create/destroy.
CREDIT scheduler(cshed_schedule) checks BUG_ON(!vcpu_running) at the end of 
code.
It makes error.
The reason is that
atomic_inc(&v->pausecnt)@vcpu_pause() is called without lock.
(spin_lock(&schedule_data[cpu].schedule_lock))
This lock-less "pausecnt" makes vcpu_running state changing 
during the lock of 
spin_lock_irq(&schedule_data[cpu].schedule_lock)@__enter_schduler().
The code of cshed_schedule exists within this lock. 


===================
REPRODUCE THE ERROR
===================
This problem occured by doing heavily 
create/destroy loop.

============
Discussion
============
Credit scheduler do the very strict check at the end of code.

I think two solution(to be discussed)

1)Remove BUG_ON(!vcpu_running)
 I propose the patch to fix the problem.
 it is just removing BUG_ON(!vcpu_running)
 I use this method because consider the following circum stances.
 a)SEDF, BVT scheduler is not checked on this
  (CREDIT scheduler should use same policy)
 b)We have some possibility to change the vcpu state  within runq. 
  (vcpu_running => !vcpu_running)
 c)This function is scheduler main route.
  so we want to avoid using the lock.
  This proposal assumes pausecnt policy is loosely locked.

2)Implement Strict Lock Policy in scheduler
  Implement v->pausecnt policy within the lock of 
  schedule_data[cpu].schedule_lock.






>I got a trace log.
>
>(XEN) BUG at sched_credit.c:1075
>(XEN) die_if_kernel: bug check 0
>(XEN) d 0xf0000000041d00c8 domid 7
>(XEN) vcpu 0xf0000000041c0000 vcpu 0
>(XEN) 
>(XEN) CPU 1
>(XEN) psr : 0000101008222018 ifs : 8000000000000a98 ip  :
>[<f0000000040375a0>]
>(XEN) ip is at csched_schedule+0x970/0xf70
>(XEN) unat: 0000000000000000 pfs : 0000000000000a98 rsc : 0000000000000003
>(XEN) rnat: 0000121008226018 bsps: f00000000405a6c0 pr  : 000000000001aaa9
>(XEN) ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
>(XEN) csd : 0000000000000000 ssd : 0000000000000000
>(XEN) b0  : f0000000040375a0 b6  : f000000004049c80 b7  : e000000000100800
>(XEN) f6  : 0fffbccccccccc8c00000 f7  : 0ffd9a200000000000000
>(XEN) f8  : 0ffff8000000000000000 f9  : 10002a000000000000000
>(XEN) f10 : 0fffbccccccccc8c00000 f11 : 1003e0000000000000000
>(XEN) r1  : f000000004302c70 r2  : 0000000000005ba9 r3  : f0000000041c7fe8
>(XEN) r8  : 0000000000000000 r9  : 0000000000000000 r10 : 0000000000000000
>(XEN) r11 : 0009804c0270033f r12 : f0000000041c78e0 r13 : f0000000041c0000
>(XEN) r14 : 0000000000000000 r15 : f0000000041119b8 r16 : 0000000000004001
>(XEN) r17 : f000000004105214 r18 : 0000000000001ba9 r19 : f000000004105210
>(XEN) r20 : a00000010095be10 r21 : 0000000000000000 r22 : 0000000000000001
>(XEN) r23 : 0000000000000000 r24 : f0000000041c7e20 r25 : f0000000041c7e28
>(XEN) r26 : 0000000000000000 r27 : 0000000000000000 r28 : 000000000000001d
>(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : f000000004114098
>(XEN) 
>(XEN) Call Trace:
>(XEN)  [<f000000004094820>] show_stack+0x80/0xa0
>(XEN)                                 sp=f0000000041c7500
>bsp=f0000000041c1018
>(XEN)  [<f000000004075c00>] die_if_kernel+0x80/0xd0
>(XEN)                                 sp=f0000000041c76d0
>bsp=f0000000041c0fe0
>(XEN)  [<f00000000406b7a0>] ia64_handle_break+0x1d0/0x290
>(XEN)                                 sp=f0000000041c76d0
>bsp=f0000000041c0fa0
>(XEN)  [<f0000000040934c0>] ia64_leave_kernel+0x0/0x310
>(XEN)                                 sp=f0000000041c76e0
>bsp=f0000000041c0fa0
>(XEN)  [<f0000000040375a0>] csched_schedule+0x970/0xf70
>(XEN)                                 sp=f0000000041c78e0
>bsp=f0000000041c0ee0
>(XEN)  [<f00000000403f0b0>] __enter_scheduler+0x150/0x6b0
>(XEN)                                 sp=f0000000041c78f0
>bsp=f0000000041c0e60
>(XEN)  [<f00000000403f6a0>] do_yield+0x90/0xb0
>(XEN)                                 sp=f0000000041c7910
>bsp=f0000000041c0e48
>(XEN)  [<f00000000403f970>] do_sched_op_compat+0x120/0x170
>(XEN)                                 sp=f0000000041c7910
>bsp=f0000000041c0e18
>(XEN)  [<f00000000405a6e0>] ia64_hypercall+0xe50/0xe90
>(XEN)                                 sp=f0000000041c7910
>bsp=f0000000041c0db0
>(XEN)  [<f00000000406b7f0>] ia64_handle_break+0x220/0x290
>(XEN)                                 sp=f0000000041c7df0
>bsp=f0000000041c0d70
>(XEN)  [<f0000000040934c0>] ia64_leave_kernel+0x0/0x310
>(XEN)                                 sp=f0000000041c7e00
>bsp=f0000000041c0d70
>(XEN)  [<e000000000100810>] ???
>(XEN)                                 sp=f0000000041c8000
>bsp=f0000000041c0d20
>(XEN)  [<a000000100067170>] ???
>(XEN)                                 sp=f0000000041c8000
>bsp=f0000000041c0d00
>(XEN) domain_crash_sync called from xenmisc.c:109
>(XEN) Domain 7 (vcpu#0) crashed on cpu#1:
>(XEN) d 0xf0000000041d00c8 domid 7
>(XEN) vcpu 0xf0000000041c0000 vcpu 0
>(XEN) 
>(XEN) CPU 1
>(XEN) psr : 00001012083a6010 ifs : 800000000000050a ip  :
>[<e000000000100810>]
>(XEN) ip is at ???
>(XEN) unat: 0000000000000000 pfs : 8000000000000209 rsc : 0000000000000008
>(XEN) rnat: 0000000000000000 bsps: a000000100955028 pr  : 000000000001aa85
>(XEN) ldrs: 0000000000700000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
>(XEN) csd : 0000000000000000 ssd : 0000000000000000
>(XEN) b0  : a000000100067170 b6  : a000000100148100 b7  : e000000000100800
>(XEN) f6  : 000000000000000000000 f7  : 1003e28f5c28f5c28f5c3
>(XEN) f8  : 000000000000000000000 f9  : 100068000000000000000
>(XEN) f10 : 1003e0000000000000000 f11 : 1003e0000000000000000
>(XEN) r1  : a000000100d071b0 r2  : 0000000000001000 r3  : 8000000000000209
>(XEN) r8  : a000000100067170 r9  : 0000000000000100 r10 : 0000000000000000
>(XEN) r11 : 0000000000010ac5 r12 : a00000010095bd80 r13 : a000000100954000
>(XEN) r14 : 0000000000000001 r15 : 0000000000000000 r16 : f100000000004c18
>(XEN) r17 : a00000010095bdb0 r18 : a00000010095bdb1 r19 : a00000010095be90
>(XEN) r20 : a00000010095be10 r21 : 0000000000000000 r22 : 0000000000000001
>(XEN) r23 : 0000000000000000 r24 : a000000100b22ae0 r25 : a000000100954f10
>(XEN) r26 : 0000000000000000 r27 : a0000001009550f0 r28 : 000000000000001d
>(XEN) r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000000
>(XEN) r32 : a00000010095bbc0 r33 : 0000000000000000 r34 : 0000000000000004
>(XEN) r35 : 0000000000000000 r36 : 0000000000000c58 r37 : a0000001009540d0
>(XEN) r38 : ffffffffffff49c0 r39 : 0000000000000000 r40 : a00000010001a930
>(XEN) r41 : 8000000000000307
>
>Thanks,
>Fujita
>> -----Original Message-----
>> From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of yo.fujita
>> Sent: Tuesday, June 20, 2006 9:03 AM
>> To: 'You, Yongkang'; 'Alex Williamson'
>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> 
>> Hi Yongkang, Alex,
>> 
>> I tried the latest cset 10419 too.
>> And the same problem was reproduced.
>> I think this may be caused by the "credit" scheduler.
>> Now our developers are researching this problem.
>> 
>> Setting
>> server      :tiger4
>> dom0mem     :512M
>> domUmem     :512M
>> domU cpus   :2
>> sched       :credit
>> 
>> Test details
>> create 3/10 hung with dom0.
>> destroy 4/7 hung with dom0.
>> 
>> Thanks,
>> Fujita
>> > -----Original Message-----
>> > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
>yo.fujita
>> > Sent: Monday, June 19, 2006 4:21 PM
>> > To: 'You, Yongkang'; 'Alex Williamson'
>> > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> >
>> > Yongkang,
>> >
>> > I have a request.
>> > I'm using a scheduler "credit" for test.
>> > If you're using other scheduler, can you try the "credit"?
>> >
>> > Thanks,
>> > Fujita
>> > > -----Original Message-----
>> > > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> > > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
>> yo.fujita
>> > > Sent: Monday, June 19, 2006 4:03 PM
>> > > To: 'You, Yongkang'; 'Alex Williamson'
>> > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> > >
>> > > Hi Yongkang,
>> > >
>> > > Thanks for your information!
>> > > We also must try the latest changeset.
>> > > If it happens again, the cause is in our environment.
>> > > I'll inform you a result. Please wait for a while.
>> > >
>> > > Thanks,
>> > > Fujita
>> > > > -----Original Message-----
>> > > > From: You, Yongkang [mailto:yongkang.you@xxxxxxxxx]
>> > > > Sent: Monday, June 19, 2006 3:37 PM
>> > > > To: yo.fujita; Alex Williamson
>> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > > > Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> > > >
>> > > > Hi Fujita,
>> > > >
>> > > > I tried the latest Changeset 10419. But I couldn't reproduce this
>> > problem
>> > > in
>> > > > my box. Is it fixed? I create and destroy SMP XenU for more than 100
>> > > times.
>> > > > And try to make kernels in xenU 2 times. I didn't meet the xen0
>hang.
>> > > >
>> > > > My box has 3072M memory. Xen0 has 512M, xenU has 512M. XenU has been
>> > > assigned
>> > > > with 2 CPUs.
>> > > >
>> > > > Best Regards,
>> > > > Yongkang (Kangkang) 永康
>> > > >
>> > > > >-----Original Message-----
>> > > > >From: yo.fujita [mailto:yo.fujita@xxxxxxxxxxxxxxxx]
>> > > > >Sent: 2006年6月19日 10:27
>> > > > >To: You, Yongkang; 'Alex Williamson'
>> > > > >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > > > >Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> > > > >
>> > > > >> Hi Fujita,
>> > > > >>
>> > > > >> Maybe it is not guest image issue. In our nightly testing for
>XenU,
>> > we
>> > > > >still
>> > > > >> focus on the UP stability and basic booting/destroying testing.
>> > > Automatic
>> > > > >SMP
>> > > > >> XenU isn't fully added into nightly testing (will add it this
>> week).
>> > I
>> > > > >noticed
>> > > > >> your report mentioned that the stress testing and keeping
>> > > > >creating/destroying
>> > > > >> SMP XenU will catch this issue. I can do some trying to see if I
>> can
>> > > > >reproduce.
>> > > > >Hi Yongkang,
>> > > > >
>> > > > >Thanks for your comments.
>> > > > >As you said, I meant the problem is not in the image itself but the
>> > > stress
>> > > > >of booting domU. In other words, I thought larger size image takes
>> more
>> > > > >stress on Xen.
>> > > > >So I guess a customized image for testing (necessity minimum size)
>> > > doesn't
>> > > > >cause
>> > > > >the problem.
>> > > > >We appreciate it if you can try to reproduce these issues.
>> > > > >
>> > > > >Thanks,
>> > > > >Fujita
>> > > > >
>> > > > >
>> > > > >>
>> > > > >> Best Regards,
>> > > > >> Yongkang (Kangkang) 永康
>> > > > >>
>> > > > >> >-----Original Message-----
>> > > > >> >From: yo.fujita [mailto:yo.fujita@xxxxxxxxxxxxxxxx]
>> > > > >> >Sent: 2006年6月19日 8:53
>> > > > >> >To: 'Alex Williamson'; You, Yongkang
>> > > > >> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > > > >> >Subject: RE: [Xen-ia64-devel] Weekly benchmark results [ww24]
>> > > > >> >
>> > > > >> >>    Thanks Fujita.  Is anyone else seeing these hangs booting
>> domU?
>> > > > >> >> The Intel status report on the same changeset has no
>indication
>> of
>> > > > >> >> dom0 hangs on their tests.
>> > > > >> >Hi Alex, Yongkang,
>> > > > >> >
>> > > > >> > In Fujitsu, it's only me who saw this problem because few
>> > developers
>> > > > >have
>> > > > >> >started any tests relating to SMP.
>> > > > >> >I think the reason why Intel's results was not match to ours is
>> the
>> > > > >> >difference of stress on booting SMP domU because the our guest
>> image
>> > > is
>> > > > >just
>> > > > >> >only a copy of root file system of native Linux, which any
>> functions
>> > > was
>> > > > >not
>> > > > >> >edited.
>> > > > >> >
>> > > > >> >My domU environment.
>> > > > >> >Disk size :5G
>> > > > >> >OS        :RHEL4U2
>> > > > >> >Memory    :512M
>> > > > >> >CPUs      :2
>> > > > >> >Yongkang, do you have any comments?
>> > > > >> >
>> > > > >> >> Has anything else changed in the test environment?  Thanks,
>> > > > >> >No. We switched this test to SMP from UP two weeks ago.
>> > > > >> >
>> > > > >> >Thanks,
>> > > > >> >Fujita
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Xen-ia64-devel mailing list
>> > > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > > http://lists.xensource.com/xen-ia64-devel
>> >
>> >
>> >
>> > _______________________________________________
>> > Xen-ia64-devel mailing list
>> > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> > http://lists.xensource.com/xen-ia64-devel
>> 
>> 
>> 
>> _______________________________________________
>> Xen-ia64-devel mailing list
>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-ia64-devel
>
>
>
>_______________________________________________
>Xen-ia64-devel mailing list
>Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-ia64-devel
>


------------------------------------------------------------
富士通(株) プラットフォーム技術開発本部 仮想システム開発統括部
酒井 敦    Email   sakaia@xxxxxxxxxxxxxx
                TEL     7124-4167(4月7日より)




_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.