[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying



On Fri, May 9, 2014 at 6:01 AM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> On 08.05.2014 17:10, George Dunlap wrote:
>>
>> On Wed, May 7, 2014 at 2:23 PM, Juergen Gross
>> <juergen.gross@xxxxxxxxxxxxxx> wrote:
>>>
>>> On 07.05.2014 15:10, George Dunlap wrote:
>>>>
>>>>
>>>> On Wed, May 7, 2014 at 8:52 AM, Juergen Gross
>>>> <juergen.gross@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> When a cpupool is destroyed just after the last domain has been stopped
>>>>> the
>>>>> domain might already be removed from the cpupool without having
>>>>> decremented
>>>>> the domain count of the cpupool. This will result in rejection of the
>>>>> cpupool-destroy operation.
>>>>
>>>>
>>>>
>>>> I'm a bit confused.  What's the sched_move_domain() for, then?  If
>>>> we're going to handle "dying domains" by doing a retry, could we just
>>>> get rid of it?
>>>
>>>
>>>
>>> The sched_move_domain() is still needed for cases where a domain stays
>>> dying for a longer time, e.g. when a dom0 process is still referencing
>>> some of it's memory pages. This may be a rare situation, but being unable
>>> to use a physical cpu for another cpupool just because of this case is
>>> worse than this little piece of code, IMO.
>>
>>
>> And I take it there are times when the move fails for whatever reason?
>
>
> ENOMEM for example.
>
>
>> Could you add a comment explaining this above the for() loop then, for
>> posterity?
>
>
> Could you define 'this', please? The reason for the sched_move_domain()
> is mentioned in the head comment of the function (zombie domains). The
> possibility of a failing sched_move_domain() is obvious by the return
> value checking.

Oh, sorry -- I misunderstood the patch.  I thought you were adding
code to handle the case when sched_move_domain() failed -- but
actually, you're handling the case where you go through the
for_each_domain_in_cpupool() loop, successfully call
sched_move_domain() on each such domain (and decrement n_dom each
time), but still somehow at the end have a positive n_dom.

Do I have it right now, or am I still confused?

So there are times when a domain might not come up in the
for_each_domain_in_cpupool() loop, but for some reason still be in he
n_dom reference count?

That doesn't seem like it should be allowed to happen; and at the
moment I'm failing to see how that should happen, unless you have a
race somewhere.

Sorry if I'm just being really dense here, but from what I can tell:
* for_each_domain_in_cpupool() iterates through the domain list,
looking for domains such that d->cpupool == c
* d->cpupool is only modified when the cpupool_lock is held
* Whenever d->cpupool is modified, n_dom for the appropriate cpupools
are also modified
* In this function, you hold cpupool_lock

So how is it that you have a situation where d->cpupool != c, but
c->n_dom is counting that domain?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.