Xen project Mailing List

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

To: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Date: Mon, 21 Dec 2015 06:43:26 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>

Delivery-date: Mon, 21 Dec 2015 06:44:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRLagdFRHQaA0kbUKqX7bXg8zxDZ7VGmOA

Thread-topic: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

Ping... > -----Original Message----- > From: Wu, Feng > Sent: Thursday, December 3, 2015 4:36 PM > To: xen-devel@xxxxxxxxxxxxx > Cc: Wu, Feng <feng.wu@xxxxxxxxx>; Keir Fraser <keir@xxxxxxx>; Jan Beulich > <jbeulich@xxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Tian, > Kevin <kevin.tian@xxxxxxxxx>; George Dunlap <george.dunlap@xxxxxxxxxxxxx>; > Dario Faggioli <dario.faggioli@xxxxxxxxxx> > Subject: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling > > This is the core logic handling for VT-d posted-interrupts. Basically it > deals with how and when to update posted-interrupts during the following > scenarios: > - vCPU is preempted > - vCPU is slept > - vCPU is blocked > > When vCPU is preempted/slept, we update the posted-interrupts during > scheduling by introducing two new architecutral scheduler hooks: > vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we > introduce a new architectural hooks: arch_vcpu_block() to update > posted-interrupts descriptor. > > Besides that, before VM-entry, we will make sure the 'NV' filed is set > to 'posted_intr_vector' and the vCPU is not in any blocking lists, which > is needed when vCPU is running in non-root mode. The reason we do this check > is because we change the posted-interrupts descriptor in vcpu_block(), > however, we don't change it back in vcpu_unblock() or when vcpu_block() > directly returns due to event delivery (in fact, we don't need to do it > in the two places, that is why we do it before VM-Entry). > > When we handle the lazy context switch for the following two scenarios: > - Preempted by a tasklet, which uses in an idle context. > - the prev vcpu is in offline and no new available vcpus in run queue. > We don't change the 'SN' bit in posted-interrupt descriptor, this > may incur spurious PI notification events, but since PI notification > event is only sent when 'ON' is clear, and once the PI notificatoin > is sent, ON is set by hardware, hence no more notification events > before 'ON' is clear. Besides that, spurious PI notification events are > going to happen from time to time in Xen hypervisor, such as, when > guests trap to Xen and PI notification event happens, there is > nothing Xen actually needs to do about it, the interrupts will be > delivered to guest atht the next time we do a VMENTRY. > > CC: Keir Fraser <keir@xxxxxxx> > CC: Jan Beulich <jbeulich@xxxxxxxx> > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > CC: Kevin Tian <kevin.tian@xxxxxxxxx> > CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > CC: Dario Faggioli <dario.faggioli@xxxxxxxxxx> > Suggested-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> > Suggested-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx> > Suggested-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx> > Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> > Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx> > --- > v10: > - Check iommu_intpost first > - Remove pointless checking of has_hvm_container_vcpu(v) > - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal' > - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we > don't need use another list to save the vCPUs with 'ON' set, just > directly call vcpu_unblock(v). > > v9: > - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare() > - Add vmx_pi_state_change() and call it before VM Entry > > v8: > - Remove the lazy context switch handling for PI state transition > - Change PI state in vcpu_block() and do_poll() when the vCPU > is going to be blocked > > v7: > - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted > interrupts > and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked" > into this patch, so it is self-contained and more convenient > for code review. > - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static > - Coding style > - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt() > - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt() > - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare' > - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel' > - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu' > - Use 'spin_lock' and 'spin_unlock' when the interrupt has been > already disabled. > - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare > - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c > - Call .pi_ctxt_switch_to() __context_switch() instead of directly > calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to() > - Make .pi_block_cpu unsigned int > - Use list_del() instead of list_del_init() > - Coding style > > One remaining item in v7: > Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(), > need Dario or George's input about this. > > v6: > - Add two static inline functions for pi context switch > - Fix typos > > v5: > - Rename arch_vcpu_wake to arch_vcpu_wake_prepare > - Make arch_vcpu_wake_prepare() inline for ARM > - Merge the ARM dummy hook with together > - Changes to some code comments > - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if > PI is disabled or the vCPU is not in HVM > - Coding style > > v4: > - Newly added > > Changlog for "vmx: posted-interrupt handling when vCPU is blocked" > v6: > - Fix some typos > - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt() > > v4: > - Use local variables in pi_wakeup_interrupt() > - Remove vcpu from the blocked list when pi_desc.on==1, this > - avoid kick vcpu multiple times. > - Remove tasklet > > v3: > - This patch is generated by merging the following three patches in v2: > [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU > [RFC v2 10/15] vmx: Define two per-cpu variables > [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts > - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet' > - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct' > - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler' > - Make pi_wakeup_interrupt() static > - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list' > - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct' > - Rename 'blocked_vcpu' to 'pi_blocked_vcpu' > - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock' > > xen/arch/x86/hvm/hvm.c | 6 ++ > xen/arch/x86/hvm/vmx/vmcs.c | 2 + > xen/arch/x86/hvm/vmx/vmx.c | 172 > +++++++++++++++++++++++++++++++++++++ > xen/common/schedule.c | 4 + > xen/include/asm-arm/domain.h | 2 + > xen/include/asm-x86/domain.h | 2 + > xen/include/asm-x86/hvm/hvm.h | 2 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 9 ++ > xen/include/asm-x86/hvm/vmx/vmx.h | 4 + > 9 files changed, 203 insertions(+) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 6c2b512..3368cf2 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d) > hvm_destroy_all_ioreq_servers(d); > } > > +void arch_vcpu_block(struct vcpu *v) > +{ > + if ( v->arch.vcpu_block ) > + v->arch.vcpu_block(v); > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index 000d06e..0f23fce 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -676,6 +676,8 @@ int vmx_cpu_up(void) > if ( cpu_has_vmx_vpid ) > vpid_sync_all(); > > + vmx_pi_per_cpu_init(cpu); > + > return 0; > } > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 39dc500..0d9462e 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, > uint64_t msr_content); > static void vmx_invlpg_intercept(unsigned long vaddr); > static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); > > +/* > + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we > + * can find which vCPU should be woken up. > + */ > +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > + > uint8_t __read_mostly posted_intr_vector; > +uint8_t __read_mostly pi_wakeup_vector; > + > +void vmx_pi_per_cpu_init(unsigned int cpu) > +{ > + INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > + spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > +} > + > +void vmx_vcpu_block(struct vcpu *v) > +{ > + unsigned long flags; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !has_arch_pdevs(v->domain) ) > + return; > + > + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS); > + > + /* > + * The vCPU is blocking, we need to add it to one of the per pCPU lists. > + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for > + * the per-CPU list, we also save it to posted-interrupt descriptor and > + * make it as the destination of the wake-up notification event. > + */ > + v->arch.hvm_vmx.pi_block_cpu = v->processor; > + > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, > + &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu)); > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + > + ASSERT(!pi_test_sn(pi_desc)); > + > + /* > + * We don't need to set the 'NDST' field, since it should point to > + * the same pCPU as v->processor. > + */ > + > + write_atomic(&pi_desc->nv, pi_wakeup_vector); > +} > + > +static void vmx_pi_switch_from(struct vcpu *v) > +{ > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !iommu_intpost || !has_arch_pdevs(v->domain) || > + test_bit(_VPF_blocked, &v->pause_flags) ) > + return; > + > + /* > + * The vCPU has been preempted or went to sleep. We don't need to send > + * notification event to a non-running vcpu, the interrupt information > + * will be delivered to it before VM-ENTRY when the vcpu is scheduled > + * to run next time. > + */ > + pi_set_sn(pi_desc); > +} > + > +static void vmx_pi_switch_to(struct vcpu *v) > +{ > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !iommu_intpost || !has_arch_pdevs(v->domain) ) > + return; > + > + if ( x2apic_enabled ) > + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor)); > + else > + write_atomic(&pi_desc->ndst, > + MASK_INSR(cpu_physical_id(v->processor), > + PI_xAPIC_NDST_MASK)); > + > + pi_clear_sn(pi_desc); > +} > + > +static void vmx_pi_state_to_normal(struct vcpu *v) > +{ > + unsigned long flags; > + unsigned int pi_block_cpu; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !iommu_intpost || !has_arch_pdevs(v->domain) ) > + return; > + > + ASSERT(!test_bit(_VPF_blocked, &v->pause_flags)); > + > + /* > + * Set 'NV' field back to posted_intr_vector, so the > + * Posted-Interrupts can be delivered to the vCPU when > + * it is running in non-root mode. > + */ > + if ( pi_desc->nv != posted_intr_vector ) > + write_atomic(&pi_desc->nv, posted_intr_vector); > + > + /* the vCPU is not on any blocking list. */ > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > + if ( pi_block_cpu == NR_CPUS ) > + return; > + > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags); > + > + /* > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was > + * removed from the blocking list while we are acquiring the lock. > + */ > + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS ) > + { > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > + return; > + } > + > + list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > +} > > static int vmx_domain_initialise(struct domain *d) > { > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); > > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > + > v->arch.schedule_tail = vmx_do_resume; > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > > + if ( iommu_intpost ) > + v->arch.vcpu_block = vmx_vcpu_block; > + > if ( (rc = vmx_create_vmcs(v)) != 0 ) > { > dprintk(XENLOG_WARNING, > @@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v) > vmx_save_guest_msrs(v); > vmx_restore_host_msrs(); > vmx_save_dr(v); > + vmx_pi_switch_from(v); > } > > static void vmx_ctxt_switch_to(struct vcpu *v) > @@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > > vmx_restore_guest_msrs(v); > vmx_restore_dr(v); > + vmx_pi_switch_to(v); > } > > > @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc, > }; > > +/* Handle VT-d posted-interrupt when VCPU is blocked. */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > + struct arch_vmx_struct *vmx, *tmp; > + struct vcpu *v; > + spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id()); > + struct list_head *blocked_vcpus = > + &per_cpu(pi_blocked_vcpu, smp_processor_id()); > + LIST_HEAD(list); > + > + ack_APIC_irq(); > + this_cpu(irq_count)++; > + > + spin_lock(lock); > + > + /* > + * XXX: The length of the list depends on how many vCPU is current > + * blocked on this specific pCPU. This may hurt the interrupt latency > + * if the list grows to too many entries. > + */ > + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list) > + { > + if ( pi_test_on(&vmx->pi_desc) ) > + { > + list_del(&vmx->pi_blocked_vcpu_list); > + vmx->pi_block_cpu = NR_CPUS; > + v = container_of(vmx, struct vcpu, arch.hvm_vmx); > + vcpu_unblock(v); > + } > + } > + > + spin_unlock(lock); > +} > + > /* Handle VT-d posted-interrupt when VCPU is running. */ > static void pi_notification_interrupt(struct cpu_user_regs *regs) > { > @@ -2100,7 +2267,10 @@ const struct hvm_function_table * __init > start_vmx(void) > if ( cpu_has_vmx_posted_intr_processing ) > { > if ( iommu_intpost ) > + { > alloc_direct_apic_vector(&posted_intr_vector, > pi_notification_interrupt); > + alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt); > + } > else > alloc_direct_apic_vector(&posted_intr_vector, > event_check_interrupt); > } > @@ -3543,6 +3713,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs > *regs) > struct hvm_vcpu_asid *p_asid; > bool_t need_flush; > > + vmx_pi_state_to_normal(curr); > + > if ( !cpu_has_vmx_vpid ) > goto out; > if ( nestedhvm_vcpu_in_guestmode(curr) ) > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index c195129..fc18035 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -802,6 +802,8 @@ void vcpu_block(void) > > set_bit(_VPF_blocked, &v->pause_flags); > > + arch_vcpu_block(v); > + > /* Check for events /after/ blocking: avoids wakeup waiting race. */ > if ( local_events_need_delivery() ) > { > @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll) > v->poll_evtchn = -1; > set_bit(v->vcpu_id, d->poll_mask); > > + arch_vcpu_block(v); > + > #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */ > /* Check for events /after/ setting flags: avoids wakeup waiting race. */ > smp_mb(); > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index e7e40da..ada146c 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -310,6 +310,8 @@ static inline void free_vcpu_guest_context(struct > vcpu_guest_context *vgc) > xfree(vgc); > } > > +static inline void arch_vcpu_block(struct vcpu *v) {} > + > #endif /* __ASM_DOMAIN_H__ */ > > /* > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index c825975..135f7f9 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -494,6 +494,8 @@ struct arch_vcpu > void (*ctxt_switch_from) (struct vcpu *); > void (*ctxt_switch_to) (struct vcpu *); > > + void (*vcpu_block) (struct vcpu *); > + > struct vpmu_struct vpmu; > > /* Virtual Machine Extensions */ > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h > index f80e143..336fa62 100644 > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -560,6 +560,8 @@ void altp2m_vcpu_update_vmfunc_ve(struct vcpu *v); > /* emulates #VE */ > bool_t altp2m_vcpu_emulate_ve(struct vcpu *v); > > +void arch_vcpu_block(struct vcpu *v); > + > #endif /* __ASM_X86_HVM_HVM_H__ */ > > /* > diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm- > x86/hvm/vmx/vmcs.h > index b3b0946..7b3a931 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -160,6 +160,15 @@ struct arch_vmx_struct { > struct page_info *vmwrite_bitmap; > > struct page_info *pml_pg; > + > + struct list_head pi_blocked_vcpu_list; > + > + /* > + * Before vCPU is blocked, it is added to the global per-cpu list > + * of 'pi_block_cpu', then VT-d engine can send wakeup notification > + * event to 'pi_block_cpu' and wakeup the related vCPU. > + */ > + unsigned int pi_block_cpu; > }; > > int vmx_create_vmcs(struct vcpu *v); > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm- > x86/hvm/vmx/vmx.h > index 1719965..8129bff 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmx.h > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h > @@ -28,6 +28,8 @@ > #include <asm/hvm/trace.h> > #include <asm/hvm/vmx/vmcs.h> > > +extern uint8_t pi_wakeup_vector; > + > typedef union { > struct { > u64 r : 1, /* bit 0 - Read permission */ > @@ -563,6 +565,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m); > void free_p2m_hap_data(struct p2m_domain *p2m); > void p2m_init_hap_data(struct p2m_domain *p2m); > > +void vmx_pi_per_cpu_init(unsigned int cpu); > + > /* EPT violation qualifications definitions */ > #define _EPT_READ_VIOLATION 0 > #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) > -- > 2.1.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.