Xen project Mailing List

Re: [Xen-devel] [PATCH v2 3/5] VT-d PI: restrict the vcpu number on a given pcpu

To: "Gao, Chao" <chao.gao@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>

Date: Mon, 15 May 2017 05:24:45 +0000

Accept-language: en-US

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Nakajima, Jun" <jun.nakajima@xxxxxxxxx>

Delivery-date: Mon, 15 May 2017 05:25:15 +0000

Dlp-product: dlpe-windows

Dlp-reaction: no-action

Dlp-version: 10.0.102.7

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHSyhw6pQdJNedGb0a6v4OMKXQi+KH04ARw

Thread-topic: [PATCH v2 3/5] VT-d PI: restrict the vcpu number on a given pcpu

> From: Gao, Chao > Sent: Thursday, May 11, 2017 2:04 PM > > Currently, a blocked vCPU is put in its pCPU's pi blocking list. If > too many vCPUs are blocked on a given pCPU, it will incur that the list > grows too long. After a simple analysis, there are 32k domains and > 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's > PI blocking list. When a wakeup interrupt arrives, the list is > traversed to find some specific vCPUs to wake them up. This traversal in > that case would consume much time. > > To mitigate this issue, this patch limits the vcpu number on a given > pCPU, taking factors such as perfomance of common case, current hvm vcpu > count and current pcpu count into consideration. With this method, for > the common case, it works fast and for some extreme cases, the list > length is under control. > > The change in vmx_pi_unblock_vcpu() is for the following case: > vcpu is running -> try to block (this patch may change NSDT to > another pCPU) but notification comes in time, thus the vcpu > goes back to running station -> VM-entry (we should set NSDT again, > reverting the change we make to NSDT in vmx_vcpu_block()) > > Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> > --- > xen/arch/x86/hvm/vmx/vmx.c | 78 > +++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 71 insertions(+), 7 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index efff6cd..c0d0b58 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -100,16 +100,70 @@ void vmx_pi_per_cpu_init(unsigned int cpu) > spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock); > } > > +/* > + * Choose an appropriate pcpu to receive wakeup interrupt. > + * By default, the local pcpu is chosen as the destination. But if the > + * vcpu number of the local pcpu exceeds a limit, another pcpu is chosen. > + * > + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu, where > + * v_tot is the total number of vcpus on the system, p_tot is the total > + * number of pcpus in the system, and K is a fixed number. Experments > shows > + * the maximal time to wakeup a vcpu from a 128-entry blocking list is > + * considered acceptable. So choose 128 as the fixed number K. better you can provide your experimental data here so others have a gut-feeling why it's acceptable... > + * > + * This policy makes sure: > + * 1) for common cases, the limit won't be reached and the local pcpu is > used > + * which is beneficial to performance (at least, avoid an IPI when unblocking > + * vcpu). > + * 2) for the worst case, the blocking list length scales with the vcpu count > + * divided by the pcpu count. > + */ > +#define PI_LIST_FIXED_NUM 128 > +#define PI_LIST_LIMIT (atomic_read(&num_hvm_vcpus) / > num_online_cpus() + \ > + PI_LIST_FIXED_NUM) > + > +static unsigned int vmx_pi_choose_dest_cpu(struct vcpu *v) > +{ > + int count, limit = PI_LIST_LIMIT; > + unsigned int dest = v->processor; > + > + count = atomic_read(&per_cpu(vmx_pi_blocking, dest).counter); > + while ( unlikely(count >= limit) ) > + { > + dest = cpumask_cycle(dest, &cpu_online_map); > + count = atomic_read(&per_cpu(vmx_pi_blocking, dest).counter); > + } is it possible to hit infinite loop here? > + return dest; > +} > + > static void vmx_vcpu_block(struct vcpu *v) > { > unsigned long flags; > - unsigned int dest; > + unsigned int dest, dest_cpu; > spinlock_t *old_lock; > - spinlock_t *pi_blocking_list_lock = > - &per_cpu(vmx_pi_blocking, v->processor).lock; > struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + spinlock_t *pi_blocking_list_lock; > + > + /* > + * After pCPU goes down, the per-cpu PI blocking list is cleared. > + * To make sure the parameter vCPU is added to the chosen pCPU's > + * PI blocking list before the list is cleared, just retry when > + * finding the pCPU has gone down. Also retry to choose another > + * pCPU when finding the list length reachs the limit. > + */ > + retry: > + dest_cpu = vmx_pi_choose_dest_cpu(v); > + pi_blocking_list_lock = &per_cpu(vmx_pi_blocking, dest_cpu).lock; > > spin_lock_irqsave(pi_blocking_list_lock, flags); > + if ( unlikely((!cpu_online(dest_cpu)) || > + (atomic_read(&per_cpu(vmx_pi_blocking, dest_cpu).counter) > >= > + PI_LIST_LIMIT)) ) > + { > + spin_unlock_irqrestore(pi_blocking_list_lock, flags); > + goto retry; > + } > + > old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL, > pi_blocking_list_lock); > > @@ -120,11 +174,11 @@ static void vmx_vcpu_block(struct vcpu *v) > */ > ASSERT(old_lock == NULL); > > - atomic_inc(&per_cpu(vmx_pi_blocking, v->processor).counter); > - HVMTRACE_4D(PI_LIST_ADD, v->domain->domain_id, v->vcpu_id, v- > >processor, > - atomic_read(&per_cpu(vmx_pi_blocking, > v->processor).counter)); > + atomic_inc(&per_cpu(vmx_pi_blocking, dest_cpu).counter); > + HVMTRACE_4D(PI_LIST_ADD, v->domain->domain_id, v->vcpu_id, > dest_cpu, > + atomic_read(&per_cpu(vmx_pi_blocking, dest_cpu).counter)); > list_add_tail(&v->arch.hvm_vmx.pi_blocking.list, > - &per_cpu(vmx_pi_blocking, v->processor).list); > + &per_cpu(vmx_pi_blocking, dest_cpu).list); > spin_unlock_irqrestore(pi_blocking_list_lock, flags); > > ASSERT(!pi_test_sn(pi_desc)); > @@ -134,6 +188,13 @@ static void vmx_vcpu_block(struct vcpu *v) > ASSERT(pi_desc->ndst == > (x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK))); > > + if ( unlikely(dest_cpu != v->processor) ) > + { > + dest = cpu_physical_id(dest_cpu); > + write_atomic(&pi_desc->ndst, > + (x2apic_enabled ? dest : MASK_INSR(dest, > PI_xAPIC_NDST_MASK))); > + } > + > write_atomic(&pi_desc->nv, pi_wakeup_vector); > } > > @@ -163,6 +224,7 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v) > unsigned long flags; > spinlock_t *pi_blocking_list_lock; > struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + unsigned int dest = cpu_physical_id(v->processor); > > /* > * Set 'NV' field back to posted_intr_vector, so the > @@ -170,6 +232,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v) > * it is running in non-root mode. > */ > write_atomic(&pi_desc->nv, posted_intr_vector); > + write_atomic(&pi_desc->ndst, > + x2apic_enabled ? dest : MASK_INSR(dest, > PI_xAPIC_NDST_MASK)); > > pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock; > > -- > 1.8.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.