[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool



On Thu, 2015-07-02 at 16:24 +0100, George Dunlap wrote:
> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
> <dario.faggioli@xxxxxxxxxx> wrote:
> > Ideally, the pCPUs that are 'free', i.e., not assigned
> > to any cpupool, should not be considred by the scheduler
> > for load balancing or anything. In Credit1, we fail at
> > this, because of how we use cpupool_scheduler_cpumask().
> > In fact, for a free pCPU, cpupool_scheduler_cpumask()
> > returns a pointer to cpupool_free_cpus, and hence, near
> > the top of csched_load_balance():
> >
> >  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
> >      goto out;
> >
> > is false (the pCPU _is_ free!), and we therefore do not
> > jump to the end right away, as we should. This, causes
> > the following splat when resuming from ACPI S3 with
> > pCPUs not assigned to any pool:
> 
> What I haven't quite twigged yet with regard to this specific bug is
> why csched_load_balance() is being run on this cpu at all if it hasn't
> been added to the cpupool yet.  
>
Because the cpu is online already. That happens in start_secondary(),
right after having notified the CPU_STARTING phase of CPU bringup.

OTOH, adding a cpu to a pool only happens during the CPU_ONLINE phase,
which is after that.

> AFAICT it's only called from
> csched_schedule() -- why is that being called on a cpu that isn't
> online yet?  
>
Because, sadly, it is online. :-/

> If we can't fix that before the code freeze (or if we
> wouldn't want to avoid it anyway), would it make more sense to check
> for that condition in csched_schedule() and just return the idle
> domain?
>
I tried to move things around (i.e., moving the assignment to a cpupool
ahead in the bringup process), but that introduces irq-safe vs. non
irq-safe locking issues (ISTR that was on the spin locks protecting the
memory being allocated from inside schedule_cpu_switch(), which is
called by the cpu to cpupool assignment routine).

So, yeah, it's something I also don't like much, but it's hard to fix
properly, even without considering 4.6's freeze. :-/

> (Or schedule() even?)
> 
Perhaps, but:
 - this is (I think) pretty orthogonal wrt this patch,
 - I tried something like that as well, but then I did not like having a
   check for "is this cpu outside of any cpupool?" in such code, and
   going through it all the time (and this is an hot path!), for the
   sake of a corner case (as I consider having cpus outside of any
   cpupool a transient situation, as it is system start).

The approach implemented seemed the best one in term of both how the
code looks and performs for the vast majority of use cases and of the
time.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.