[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] credit: Limit load balancing to once per millisecond



On Fri, Sep 22, 2023 at 2:49 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 22.09.2023 14:20, George Dunlap wrote:
> > On Thu, Sep 21, 2023 at 2:12 PM Juergen Gross <jgross@xxxxxxxx> wrote:
> >>
> >> On 21.09.23 14:23, George Dunlap wrote:
> >>> The credit scheduler tries as hard as it can to ensure that it always
> >>> runs scheduling units with positive credit (PRI_TS_UNDER) before
> >>> running those with negative credit (PRI_TS_OVER).  If the next
> >>> runnable scheduling unit is of priority OVER, it will always run the
> >>> load balancer, which will scour the system looking for another
> >>> scheduling unit of the UNDER priority.
> >>>
> >>> Unfortunately, as the number of cores on a system has grown, the cost
> >>> of the work-stealing algorithm has dramatically increased; a recent
> >>> trace on a system with 128 cores showed this taking over 50
> >>> microseconds.
> >>>
> >>> Add a parameter, load_balance_ratelimit, to limit the frequency of
> >>> load balance operations on a given pcpu.  Default this to 1
> >>> millisecond.
> >>>
> >>> Invert the load balancing conditional to make it more clear, and line
> >>> up more closely with the comment above it.
> >>>
> >>> Overall it might be cleaner to have the last_load_balance checking
> >>> happen inside csched_load_balance(), but that would require either
> >>> passing both now and spc into the function, or looking them up again;
> >>> both of which seemed to be worse than simply checking and setting the
> >>> values before calling it.
> >>>
> >>> On a system with a vcpu:pcpu ratio of 2:1, running Windows guests
> >>> (which will end up calling YIELD during spinlock contention), this
> >>> patch increased performance significantly.
> >>>
> >>> Signed-off-by: George Dunlap <george.dunlap@xxxxxxxxx>
> >>> ---
> >>> Changes since v1:
> >>> - Fix editing mistake in commit message
> >>> - Improve documentation
> >>> - global var is __ro_after_init
> >>> - Remove sysctl, as it's not used.  Define max value in credit.c.
> >>> - Fix some style issues
> >>> - Move comment tweak to the right patch
> >>> - In the event that the commandline-parameter value is too high, clip
> >>>    to the maximum value rather than setting to the default.
> >>>
> >>> CC: Dario Faggioli <dfaggioli@xxxxxxxx>
> >>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>> CC: George Dunlap <george.dunlap@xxxxxxxxxx>
> >>> CC: Jan Beulich <jbeulich@xxxxxxxx>
> >>> CC: Julien Grall <julien@xxxxxxx>
> >>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> >>> CC: Wei Liu <wl@xxxxxxx>
> >>> ---
> >>>   docs/misc/xen-command-line.pandoc |  8 ++++++
> >>>   xen/common/sched/credit.c         | 47 +++++++++++++++++++++++++------
> >>>   2 files changed, 46 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/docs/misc/xen-command-line.pandoc 
> >>> b/docs/misc/xen-command-line.pandoc
> >>> index f88e6a70ae..9c3c72a7f9 100644
> >>> --- a/docs/misc/xen-command-line.pandoc
> >>> +++ b/docs/misc/xen-command-line.pandoc
> >>> @@ -1884,6 +1884,14 @@ By default, Xen will use the INVPCID instruction 
> >>> for TLB management if
> >>>   it is available.  This option can be used to cause Xen to fall back to
> >>>   older mechanisms, which are generally slower.
> >>>
> >>> +### load-balance-ratelimit
> >>> +> `= <integer>`
> >>> +
> >>> +The minimum interval between load balancing events on a given pcpu, in
> >>> +microseconds.  A value of '0' will disable rate limiting.  Maximum
> >>> +value 1 second. At the moment only credit honors this parameter.
> >>> +Default 1ms.
> >>> +
> >>>   ### noirqbalance (x86)
> >>>   > `= <boolean>`
> >>>
> >>> diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
> >>> index f2cd3d9da3..5c06f596d2 100644
> >>> --- a/xen/common/sched/credit.c
> >>> +++ b/xen/common/sched/credit.c
> >>> @@ -50,6 +50,10 @@
> >>>   #define CSCHED_TICKS_PER_TSLICE     3
> >>>   /* Default timeslice: 30ms */
> >>>   #define CSCHED_DEFAULT_TSLICE_MS    30
> >>> +/* Default load balancing ratelimit: 1ms */
> >>> +#define CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US 1000
> >>> +/* Max load balancing ratelimit: 1s */
> >>> +#define CSCHED_MAX_LOAD_BALANCE_RATELIMIT_US     1000000
> >>>   #define CSCHED_CREDITS_PER_MSEC     10
> >>>   /* Never set a timer shorter than this value. */
> >>>   #define CSCHED_MIN_TIMER            XEN_SYSCTL_SCHED_RATELIMIT_MIN
> >>> @@ -153,6 +157,7 @@ struct csched_pcpu {
> >>>
> >>>       unsigned int idle_bias;
> >>>       unsigned int nr_runnable;
> >>> +    s_time_t last_load_balance;
> >>>
> >>>       unsigned int tick;
> >>>       struct timer ticker;
> >>> @@ -218,7 +223,7 @@ struct csched_private {
> >>>
> >>>       /* Period of master and tick in milliseconds */
> >>>       unsigned int tick_period_us, ticks_per_tslice;
> >>> -    s_time_t ratelimit, tslice, unit_migr_delay;
> >>> +    s_time_t ratelimit, tslice, unit_migr_delay, load_balance_ratelimit;
> >>>
> >>>       struct list_head active_sdom;
> >>>       uint32_t weight;
> >>> @@ -612,6 +617,8 @@ init_pdata(struct csched_private *prv, struct 
> >>> csched_pcpu *spc, int cpu)
> >>>       BUG_ON(!is_idle_unit(curr_on_cpu(cpu)));
> >>>       cpumask_set_cpu(cpu, prv->idlers);
> >>>       spc->nr_runnable = 0;
> >>> +
> >>> +    spc->last_load_balance = NOW();
> >>>   }
> >>>
> >>>   static void cf_check
> >>> @@ -1676,9 +1683,17 @@ csched_runq_steal(int peer_cpu, int cpu, int pri, 
> >>> int balance_step)
> >>>       return NULL;
> >>>   }
> >>>
> >>> +/*
> >>> + * Minimum delay, in microseconds, between load balance operations.
> >>> + * This prevents spending too much time doing load balancing, 
> >>> particularly
> >>> + * when the system has a high number of YIELDs due to spinlock priority 
> >>> inversion.
> >>> + */
> >>> +static unsigned int __ro_after_init load_balance_ratelimit_us = 
> >>> CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US;
> >>> +integer_param("load-balance-ratelimit", load_balance_ratelimit_us);
> >>> +
> >>>   static struct csched_unit *
> >>>   csched_load_balance(struct csched_private *prv, int cpu,
> >>> -    struct csched_unit *snext, bool *stolen)
> >>> +                    struct csched_unit *snext, bool *stolen)
> >>>   {
> >>>       const struct cpupool *c = get_sched_res(cpu)->cpupool;
> >>>       struct csched_unit *speer;
> >>> @@ -1958,15 +1973,19 @@ static void cf_check csched_schedule(
> >>>           /*
> >>>            * SMP Load balance:
> >>>            *
> >>> -         * If the next highest priority local runnable UNIT has already 
> >>> eaten
> >>> -         * through its credits, look on other PCPUs to see if we have 
> >>> more
> >>> -         * urgent work... If not, csched_load_balance() will return 
> >>> snext, but
> >>> -         * already removed from the runq.
> >>> +         * If the next highest priority local runnable UNIT has
> >>> +         * already eaten through its credits (and we're below the
> >>> +         * balancing ratelimit), look on other PCPUs to see if we have
> >>> +         * more urgent work... If we don't, csched_load_balance() will
> >>> +         * return snext, but already removed from the runq.
> >>>            */
> >>> -        if ( snext->pri > CSCHED_PRI_TS_OVER )
> >>> -            __runq_remove(snext);
> >>> -        else
> >>> +        if ( snext->pri <= CSCHED_PRI_TS_OVER
> >>> +             && now - spc->last_load_balance > 
> >>> prv->load_balance_ratelimit) {
> >>
> >> ^ Just found a style issue (after Andrew pointing out the ones in patch 2).
> >
> > Just checking, you mean the space before the closing `)`?
>
> And the placement of the opening {.

Gah, sorry I missed that. :-( (Golang won't compile if the `{` isn't
on the same line.)

I guess I'll send a follow-up patch.

 -George



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.