|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 1/5] tasklet: Introduce per-cpu tasklet for softirq.
>>> On 25.08.16 at 21:23, <konrad.wilk@xxxxxxxxxx> wrote:
> This implements a lockless per-cpu tasklet mechanism.
How does this correlate with the title? IOW, how is this new form of
tasklets "for" softirq? Perhaps part of the sentence above would be
a better fit for the title?
> The existing tasklet mechanism has a single global
> spinlock that is taken every-time the global list
> is touched. And we use this lock quite a lot - when
> we call do_tasklet_work which is called via an softirq
> and from the idle loop. We take the lock on any
> operation on the tasklet_list.
>
> The problem we are facing is that there are quite a lot of
> tasklets scheduled. The most common one that is invoked is
> the one injecting the VIRQ_TIMER in the guest. Guests
> are not insane and don't set the one-shot or periodic
> clocks to be in sub 1ms intervals (causing said tasklet
> to be scheduled for such small intervalls).
I don't follow how this "guests are not insane" relates to the issue
described here. Plus - what if there was an insane guest?
> This is especially an problem with guests that have a
> large amount of VCPUs.
>
> With this patch the problem disappears.
>
> As such this patch introduces the code to setup
> softirq per-cpu tasklets and only modifies the PCI
> passthrough cases instead of doing it wholesale. This
> is done because:
> - We want to easily bisect it if things break.
> - We modify the code one section at a time to
> make it easier to review this core code.
To me this looks to be contrary to the very first change I see:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1511,7 +1511,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
> if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) /* teardown:
> hvm_funcs.vcpu_destroy */
> goto fail3;
>
> - softirq_tasklet_init(
> + percpu_tasklet_init(
> &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet,
> (void(*)(unsigned long))hvm_assert_evtchn_irq,
> (unsigned long)v);
Isn't this unrelated to pass-through?
> --- a/xen/common/tasklet.c
> +++ b/xen/common/tasklet.c
> @@ -31,10 +31,30 @@ static DEFINE_PER_CPU(struct list_head,
> softirq_tasklet_list);
> /* Protects all lists and tasklet structures. */
> static DEFINE_SPINLOCK(tasklet_lock);
>
> +static DEFINE_PER_CPU(struct list_head, softirq_list);
Please put this next to the other two. Which then makes more
obvious that some naming changes might be helpful: We already
have softirq_tasklet_list. Maybe percpu_list or - since this already
is a per-CPU variable - local_list? Otherwise at least a comment
should be added to clarify the different purposes of the three lists.
> static void tasklet_enqueue(struct tasklet *t)
> {
> unsigned int cpu = t->scheduled_on;
>
> + if ( t->is_percpu )
> + {
> + unsigned long flags;
> + struct list_head *list;
> +
> + INIT_LIST_HEAD(&t->list);
Why? You list_add_tail() below unconditionally.
> + BUG_ON( !t->is_softirq );
If this is a requirement, wouldn't the two boolean flags better be
folded into a tristate enum, with the if() sequence here converted
to a switch()?
> + BUG_ON( cpu != smp_processor_id() ); /* Not implemented yet. */
> +
> + local_irq_save(flags);
Considering that the pre-existing two cases in this function don't do
any locking themselves, is the asymmetry to do the "locking" here a
good idea?
> + list = &__get_cpu_var(softirq_list);
> + list_add_tail(&t->list, list);
> + raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> +
> + local_irq_restore(flags);
> + return;
> + }
> if ( t->is_softirq )
Instead of return, please use "else if" here (unless converting to
switch() anyway).
> @@ -56,16 +76,25 @@ void tasklet_schedule_on_cpu(struct tasklet *t, unsigned
> int cpu)
> {
> unsigned long flags;
>
> - spin_lock_irqsave(&tasklet_lock, flags);
> + if ( !tasklets_initialised || t->is_dead )
> + return;
>
> - if ( tasklets_initialised && !t->is_dead )
> + if ( t->is_percpu )
> {
> - t->scheduled_on = cpu;
> - if ( !t->is_running )
> + if ( !test_and_set_bit(TASKLET_STATE_SCHED, &t->state) )
> {
> - list_del(&t->list);
> + t->scheduled_on = cpu;
> tasklet_enqueue(t);
> }
> + return;
> + }
> + spin_lock_irqsave(&tasklet_lock, flags);
Blank line above this one please.
> @@ -104,6 +133,66 @@ static void do_tasklet_work(unsigned int cpu, struct
> list_head *list)
> }
> }
>
> +void do_tasklet_work_percpu(void)
static
> +{
> + struct tasklet *t = NULL;
> + struct list_head *head;
> + bool_t poke = 0;
bool / false
> + local_irq_disable();
> + head = &__get_cpu_var(softirq_list);
> +
> + if ( !list_empty(head) )
> + {
> + t = list_entry(head->next, struct tasklet, list);
> +
> + if ( head->next == head->prev ) /* One singular item. Re-init head.
> */
Do we have no list_*() function for this?
> + INIT_LIST_HEAD(&__get_cpu_var(softirq_list));
> + else
> + {
> + /* Multiple items, update head to skip 't'. */
> + struct list_head *list;
> +
> + /* One item past 't'. */
> + list = head->next->next;
> +
> + BUG_ON(list == NULL);
> +
> + /* And update head to skip 't'. Note that t->list.prev still
> + * points to head, but we don't care as we only process one
> tasklet
> + * and once done the tasklet list is re-init one way or another.
> + */
> + head->next = list;
Why can't you use list_del() for all of the above, including the
INIT_LIST_HEAD() in the if branch?
> + poke = 1;
> + }
> + }
> + local_irq_enable();
> +
> + if ( !t )
> + return; /* Never saw it happend, but we might have a spurious case?
> */
> +
> + if ( tasklet_trylock(t) )
> + {
> + if ( !test_and_clear_bit(TASKLET_STATE_SCHED, &t->state) )
> + BUG();
Indentation.
> + sync_local_execstate();
> + t->func(t->data);
> + tasklet_unlock(t);
> + if ( poke )
> + raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> + /* We could reinit the t->list but tasklet_enqueue does it for us. */
I can't see why re-initing would be needed here or there, nor why
do_tasklet_work() does so.
> + return;
> + }
> +
> + local_irq_disable();
> +
> + INIT_LIST_HEAD(&t->list);
> + list_add_tail(&t->list, &__get_cpu_var(softirq_list));
> + smp_wmb();
> + raise_softirq(TASKLET_SOFTIRQ_PERCPU);
> + local_irq_enable();
tasklet_enqueue()? The only difference appears to be the barrier
you have here but not there (and I don't think you need it).
> @@ -147,10 +236,29 @@ static void tasklet_softirq_action(void)
> spin_unlock_irq(&tasklet_lock);
> }
>
> +/* Per CPU softirq context work. */
> +static void tasklet_softirq_percpu_action(void)
> +{
> + do_tasklet_work_percpu();
> +}
Considering this is the only caller of the function - why do you need
two functions here?
> void tasklet_kill(struct tasklet *t)
> {
> unsigned long flags;
>
> + if ( t->is_percpu )
> + {
> + while ( test_and_set_bit(TASKLET_STATE_SCHED, &t->state) )
> + {
> + do {
> + process_pending_softirqs();
> + } while ( test_bit(TASKLET_STATE_SCHED, &t->state) );
> + }
> + tasklet_unlock_wait(t);
> + clear_bit(TASKLET_STATE_SCHED, &t->state);
> + t->is_dead = 1;
true
> --- a/xen/include/xen/tasklet.h
> +++ b/xen/include/xen/tasklet.h
> @@ -17,21 +17,24 @@
> struct tasklet
> {
> struct list_head list;
> + unsigned long state;
> int scheduled_on;
> bool_t is_softirq;
> bool_t is_running;
> bool_t is_dead;
> + bool_t is_percpu;
> void (*func)(unsigned long);
> unsigned long data;
> };
>
> -#define _DECLARE_TASKLET(name, func, data, softirq) \
> +#define _DECLARE_TASKLET(name, func, data, softirq, percpu) \
> struct tasklet name = { \
> - LIST_HEAD_INIT(name.list), -1, softirq, 0, 0, func, data }
> + LIST_HEAD_INIT(name.list), 0, -1, softirq, 0, 0, percpu, \
> + func, data }
May I ask that you switch to designated member initializers if you
need to touch this already anyway?
> @@ -40,6 +43,54 @@ DECLARE_PER_CPU(unsigned long, tasklet_work_to_do);
> #define TASKLET_enqueued (1ul << _TASKLET_enqueued)
> #define TASKLET_scheduled (1ul << _TASKLET_scheduled)
>
> +/* These fancy bit manipulations (bit 0 and bit 1) along with using a lock
Please avoid referring to "locked" operations outside of x86 code.
ITYM "atomic".
> + * operation allow us to have four stages in tasklet life-time.
> + * a) 0x0: Completely empty (not scheduled nor running).
> + * b) 0x1: Scheduled but not running. Used to guard in 'tasklet_schedule'
> + * such that we will only schedule one. If it is scheduled and had never
> + * run (hence never clearing STATE_SCHED bit), tasklet_kill will spin
> + * forever on said tasklet. However 'tasklet_schedule' raises the
> + * softirq associated with the per-cpu - so it will run, albeit there
> might
> + * be a race (tasklet_kill spinning until the softirq handler runs).
And is there a guarantee the handler can actually run while
tasklet_kill() spins?
> + * c) 0x2: it is running (only on one CPU) and can be scheduled on any
> + * CPU. The bit 0 - scheduled is cleared at this stage allowing
> + * 'tasklet_schedule' to succesfully schedule.
> + * d) 0x3: scheduled and running - only possible if the running tasklet
> calls
> + * tasklet_schedule (on same CPU) or the tasklet is scheduled from
> another
> + * CPU while the tasklet is running on another CPU.
> + *
> + * The two bits play a vital role in assuring that the tasklet is scheduled
> + * once and runs only once. The steps are:
> + *
> + * 1) tasklet_schedule: STATE_SCHED bit set (0x1), added on the per cpu
> list.
> + * 2) tasklet_softirq_percpu_action picks one tasklet from the list.
> Schedules
> + * itself later if there are more tasklets on it. Tries to set STATE_RUN bit
> + * (0x3) - if it fails adds tasklet back to the per-cpu list. If it succeeds
> + * clears the STATE_SCHED bit (0x2).
Why are these two steps? Can't you transition 1 -> 2 in one go
(using cmpxchg())?
> Once tasklet completed, unsets STATE_RUN
> + * (0x0 or 0x1 if tasklet called tasklet_schedule).
> + */
> +enum {
> + TASKLET_STATE_SCHED, /* Bit 0 */
> + TASKLET_STATE_RUN
> +};
> +
> +static inline int tasklet_trylock(struct tasklet *t)
bool and const
> +{
> + return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
> +}
> +
> +static inline void tasklet_unlock(struct tasklet *t)
> +{
> + barrier();
> + clear_bit(TASKLET_STATE_RUN, &(t)->state);
> +}
> +static inline void tasklet_unlock_wait(struct tasklet *t)
const
> +{
> + while (test_bit(TASKLET_STATE_RUN, &(t)->state))
Missing blanks.
> + {
> + barrier();
> + }
Unnecessary braces.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |