|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Race condition with scheduler runqueues
>>> On 18.02.13 at 19:11, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> Hello,
>
> Our testing has discovered a crash (pagefault at 0x0000000000000008)
> which I have tracked down to bad __runq_remove() in csched_vcpu_sleep()
> in sched_credit.c (because a static function of the same name also
> exists in sched_credit2.c, which confused matters to start with)
>
> The test case was a loop of localhost migrate of a 1vcpu HVM win8
> domain. The test case itself has passed many times in the past on the
> same Xen codebase (Xen-4.1.3), indicating that it is very rare. There
> does not appear to be any relevant changes between the version of Xen in
> the test and xen-4.1-testing.
>
> The failure itself is because of a XEN_DOMCTL_scheduler_op (trace below)
> from dom0, targeting the VCPU of the migrating domain.
>
> (XEN) Xen call trace:
> (XEN) [<ffff82c480116a14>] csched_vcpu_sleep+0x44/0x70
> (XEN) 0[<ffff82c480120117>] vcpu_sleep_nosync+0xe7/0x3b0
> (XEN) 12[<ffff82c4801203e9>] vcpu_sleep_sync+0x9/0x50
> (XEN) 14[<ffff82c48011fd4c>] sched_adjust+0xac/0x230
> (XEN) 24[<ffff82c480102bc1>] do_domctl+0x731/0x1130
> (XEN) 64[<ffff82c4802013c4>] compat_hypercall+0x74/0x80
>
> The relevant part of csched_vcpu_sleep() is
>
> else if ( __vcpu_on_runq(svc) )
> __runq_remove(svc);
>
> which disassembles to
>
> ffff82c480116a01: 49 8b 10 mov (%r8),%rdx
> ffff82c480116a04: 4c 39 c2 cmp %r8,%rdx
> ffff82c480116a07: 75 07 jne ffff82c480116a10
> <csched_vcpu_sleep+0x40>
> ffff82c480116a09: f3 c3 repz retq
> ffff82c480116a0b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffff82c480116a10: 49 8b 40 08 mov 0x8(%r8),%rax
> ffff82c480116a14: 48 89 42 08 mov %rax,0x8(%rdx) #
> <- Pagefault here
> ffff82c480116a18: 48 89 10 mov %rdx,(%rax)
> ffff82c480116a1b: 4d 89 40 08 mov %r8,0x8(%r8)
> ffff82c480116a1f: 4d 89 00 mov %r8,(%r8)
>
> The relevant crash registers from the pagefault are:
> rax: 0000000000000000
> rdx: 0000000000000000
> r8: ffff83080c89ed90
>
> If I am reading the code correctly, this means that runq->next is NULL,
> so we fail list_empty() and erroneously pass __vcpu_on_runq(). We then
> fail with a fault when trying to update runq->prev, which is also NULL.
>
> The only place I can spot in the code where the runq->{next,prev} could
> conceivably be NULL is in csched_alloc_vdata() between the memset() and
> INIT_LIST_HEAD(). This is logically sensible in combination with the
> localhost migrate loop, and I cant immediately see anything to prevent
> this race happening.
But that doesn't make sense: csched_alloc_vdata() doesn't store
svc into vc->sched_priv; that's being done by the generic
scheduler code once the actor returns.
So I'd rather suspect a stale pointer being used, which is easily
possible when racing with sched_move_domain() (as opposed to
schedule_cpu_switch(), where the new pointer gets stored
_before_ de-allocating the old one).
However, sched_move_domain() (as well as schedule_cpu_switch())
get called only from CPU pools code, and I would guess CPU pools
aren't involved here, and you don't in parallel soft offline/online
pCPU-s (as I'm sure you otherwise would have mentioned it).
But wait - libxl__domain_make() _unconditionally_ calls
xc_cpupool_movedomain(), as does XendDomainInfo's
_constructDomain(). The reason for this escapes me - JÃrgen? Yet
I'd expect the pool ID matching check to short cut the resulting
sysctl, i.e. sched_move_domain() ought to not be reached anyway
(worth verifying of course).
The race there nevertheless ought to be fixed.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |