|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/4] VT-d PI: restrict the vcpu number on a given pcpu
On Fri, Jul 7, 2017 at 7:48 AM, Chao Gao <chao.gao@xxxxxxxxx> wrote:
> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
> too many vCPUs are blocked on a given pCPU, it will incur that the list
> grows too long. After a simple analysis, there are 32k domains and
> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
> PI blocking list. When a wakeup interrupt arrives, the list is
> traversed to find some specific vCPUs to wake them up. This traversal in
> that case would consume much time.
>
> To mitigate this issue, this patch limits the number of vCPUs tracked on a
> given pCPU's blocking list, taking factors such as perfomance of common case,
> current hvm vCPU count and current pCPU count into consideration. With this
> method, for the common case, it works fast and for some extreme cases, the
> list length is under control.
>
> With this patch, when a vcpu is to be blocked, we check whether the pi
> blocking list's length of the pcpu where the vcpu is running exceeds
> the limit which is the average vcpus per pcpu ratio plus a constant.
> If no, the vcpu is added to this pcpu's pi blocking list. Otherwise,
> another online pcpu is chosen to accept the vcpu.
>
> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> ---
> v4:
> - use a new lock to avoid adding a blocked vcpu to a offline pcpu's blocking
> list.
>
> ---
> xen/arch/x86/hvm/vmx/vmx.c | 136
> +++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 114 insertions(+), 22 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ecd6485..04e9aa6 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -95,22 +95,91 @@ static DEFINE_PER_CPU(struct vmx_pi_blocking_vcpu,
> vmx_pi_blocking);
> uint8_t __read_mostly posted_intr_vector;
> static uint8_t __read_mostly pi_wakeup_vector;
>
> +/*
> + * Protect critical sections to avoid adding a blocked vcpu to a destroyed
> + * blocking list.
> + */
> +static DEFINE_SPINLOCK(remote_pbl_operation);
> +
> +#define remote_pbl_operation_begin(flags) \
> +({ \
> + spin_lock_irqsave(&remote_pbl_operation, flags); \
> +})
> +
> +#define remote_pbl_operation_done(flags) \
> +({ \
> + spin_unlock_irqrestore(&remote_pbl_operation, flags); \
> +})
> +
> void vmx_pi_per_cpu_init(unsigned int cpu)
> {
> INIT_LIST_HEAD(&per_cpu(vmx_pi_blocking, cpu).list);
> spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
> }
>
> +/*
> + * By default, the local pcpu (means the one the vcpu is currently running
> on)
> + * is chosen as the destination of wakeup interrupt. But if the vcpu number
> of
> + * the pcpu exceeds a limit, another pcpu is chosen until we find a suitable
> + * one.
> + *
> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu count, where
> + * v_tot is the total number of hvm vcpus on the system, p_tot is the total
> + * number of pcpus in the system, and K is a fixed number. An experment on a
> + * skylake server which has 112 cpus and 64G memory shows the maximum time to
> + * wakeup a vcpu from a 128-entry blocking list takes about 22us, which is
> + * tolerable. So choose 128 as the fixed number K.
> + *
> + * This policy makes sure:
> + * 1) for common cases, the limit won't be reached and the local pcpu is used
> + * which is beneficial to performance (at least, avoid an IPI when unblocking
> + * vcpu).
> + * 2) for the worst case, the blocking list length scales with the vcpu count
> + * divided by the pcpu count.
> + */
> +#define PI_LIST_FIXED_NUM 128
> +#define PI_LIST_LIMIT (atomic_read(&num_hvm_vcpus) / num_online_cpus() +
> \
> + PI_LIST_FIXED_NUM)
> +static inline bool pi_over_limit(int cpu)
> +{
> + return per_cpu(vmx_pi_blocking, cpu).counter > PI_LIST_LIMIT;
Is there any reason to hide this calculation behind a #define, when
it's only used once anyway?
Also -- the vast majority of the time, .counter will be <
PI_LIST_FIXED_NUM; there's no reason to do an atomic read and an
integer division in that case. I would do this:
if ( likely(per_cpu(vm_pi_blocking, cpu).counter <= PI_LIST_FIXED_LIMIT) )
return 0;
return per_cpu(vm_pi_blocking, cpu).counter < PI_LIST_FIXED_LIMIT +
(atomic_read(&num_hvm_vcpus) / num_online_cpus));
Also, I personally think it would make the code more readable to say,
"pi_under_limit()" instead; that way...
> +}
> +
> static void vmx_vcpu_block(struct vcpu *v)
> {
> - unsigned long flags;
> - unsigned int dest;
> + unsigned long flags[2];
> + unsigned int dest, pi_cpu;
> spinlock_t *old_lock;
> - spinlock_t *pi_blocking_list_lock =
> - &per_cpu(vmx_pi_blocking, v->processor).lock;
> struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> + spinlock_t *pi_blocking_list_lock;
> + bool in_remote_operation = false;
> +
> + pi_cpu = v->processor;
> +
> + if ( unlikely(pi_over_limit(pi_cpu)) )
> + {
...here you put the most common thing first, and the exceptional case
second (which I think makes the code easier to understand).
In fact, you might consider putting this whole thing in a function;
something like:
unsigned int pi_get_blocking_cpu(unsigned int pi_cpu, unsigned long &flags)
{
if ( pi_under_limit(pi_cpu) ) {
spin_lock_irqsave([pi lock], flags);
return pi_cpu;
}
/* Loop looking for pi's in other places */
}
Probably also worth mentioning briefly in a comment why this loop is
guaranteed to terminate.
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |