[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool



On Thu, Jul 2, 2015 at 5:01 PM, Dario Faggioli
<dario.faggioli@xxxxxxxxxx> wrote:
> On Thu, 2015-07-02 at 16:24 +0100, George Dunlap wrote:
>> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
>> <dario.faggioli@xxxxxxxxxx> wrote:
>> > Ideally, the pCPUs that are 'free', i.e., not assigned
>> > to any cpupool, should not be considred by the scheduler
>> > for load balancing or anything. In Credit1, we fail at
>> > this, because of how we use cpupool_scheduler_cpumask().
>> > In fact, for a free pCPU, cpupool_scheduler_cpumask()
>> > returns a pointer to cpupool_free_cpus, and hence, near
>> > the top of csched_load_balance():
>> >
>> >  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
>> >      goto out;
>> >
>> > is false (the pCPU _is_ free!), and we therefore do not
>> > jump to the end right away, as we should. This, causes
>> > the following splat when resuming from ACPI S3 with
>> > pCPUs not assigned to any pool:
>>
>> What I haven't quite twigged yet with regard to this specific bug is
>> why csched_load_balance() is being run on this cpu at all if it hasn't
>> been added to the cpupool yet.
>>
> Because the cpu is online already. That happens in start_secondary(),
> right after having notified the CPU_STARTING phase of CPU bringup.
>
> OTOH, adding a cpu to a pool only happens during the CPU_ONLINE phase,
> which is after that.
>
>> AFAICT it's only called from
>> csched_schedule() -- why is that being called on a cpu that isn't
>> online yet?
>>
> Because, sadly, it is online. :-/
>
>> If we can't fix that before the code freeze (or if we
>> wouldn't want to avoid it anyway), would it make more sense to check
>> for that condition in csched_schedule() and just return the idle
>> domain?
>>
> I tried to move things around (i.e., moving the assignment to a cpupool
> ahead in the bringup process), but that introduces irq-safe vs. non
> irq-safe locking issues (ISTR that was on the spin locks protecting the
> memory being allocated from inside schedule_cpu_switch(), which is
> called by the cpu to cpupool assignment routine).
>
> So, yeah, it's something I also don't like much, but it's hard to fix
> properly, even without considering 4.6's freeze. :-/
>
>> (Or schedule() even?)
>>
> Perhaps, but:
>  - this is (I think) pretty orthogonal wrt this patch,
>  - I tried something like that as well, but then I did not like having a
>    check for "is this cpu outside of any cpupool?" in such code, and
>    going through it all the time (and this is an hot path!), for the
>    sake of a corner case (as I consider having cpus outside of any
>    cpupool a transient situation, as it is system start).
>
> The approach implemented seemed the best one in term of both how the
> code looks and performs for the vast majority of use cases and of the
> time.

Fair enough:

Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.