|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
Ping...
> -----Original Message-----
> From: Wu, Feng
> Sent: Thursday, December 3, 2015 4:36 PM
> To: xen-devel@xxxxxxxxxxxxx
> Cc: Wu, Feng <feng.wu@xxxxxxxxx>; Keir Fraser <keir@xxxxxxx>; Jan Beulich
> <jbeulich@xxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Tian,
> Kevin <kevin.tian@xxxxxxxxx>; George Dunlap <george.dunlap@xxxxxxxxxxxxx>;
> Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> Subject: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
>
> This is the core logic handling for VT-d posted-interrupts. Basically it
> deals with how and when to update posted-interrupts during the following
> scenarios:
> - vCPU is preempted
> - vCPU is slept
> - vCPU is blocked
>
> When vCPU is preempted/slept, we update the posted-interrupts during
> scheduling by introducing two new architecutral scheduler hooks:
> vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
> introduce a new architectural hooks: arch_vcpu_block() to update
> posted-interrupts descriptor.
>
> Besides that, before VM-entry, we will make sure the 'NV' filed is set
> to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
> is needed when vCPU is running in non-root mode. The reason we do this check
> is because we change the posted-interrupts descriptor in vcpu_block(),
> however, we don't change it back in vcpu_unblock() or when vcpu_block()
> directly returns due to event delivery (in fact, we don't need to do it
> in the two places, that is why we do it before VM-Entry).
>
> When we handle the lazy context switch for the following two scenarios:
> - Preempted by a tasklet, which uses in an idle context.
> - the prev vcpu is in offline and no new available vcpus in run queue.
> We don't change the 'SN' bit in posted-interrupt descriptor, this
> may incur spurious PI notification events, but since PI notification
> event is only sent when 'ON' is clear, and once the PI notificatoin
> is sent, ON is set by hardware, hence no more notification events
> before 'ON' is clear. Besides that, spurious PI notification events are
> going to happen from time to time in Xen hypervisor, such as, when
> guests trap to Xen and PI notification event happens, there is
> nothing Xen actually needs to do about it, the interrupts will be
> delivered to guest atht the next time we do a VMENTRY.
>
> CC: Keir Fraser <keir@xxxxxxx>
> CC: Jan Beulich <jbeulich@xxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Kevin Tian <kevin.tian@xxxxxxxxx>
> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> CC: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> Suggested-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> Suggested-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
> Suggested-by: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx>
> ---
> v10:
> - Check iommu_intpost first
> - Remove pointless checking of has_hvm_container_vcpu(v)
> - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
> - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
> don't need use another list to save the vCPUs with 'ON' set, just
> directly call vcpu_unblock(v).
>
> v9:
> - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
> - Add vmx_pi_state_change() and call it before VM Entry
>
> v8:
> - Remove the lazy context switch handling for PI state transition
> - Change PI state in vcpu_block() and do_poll() when the vCPU
> is going to be blocked
>
> v7:
> - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
> into this patch, so it is self-contained and more convenient
> for code review.
> - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> - Coding style
> - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
> already disabled.
> - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> - Call .pi_ctxt_switch_to() __context_switch() instead of directly
> calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> - Make .pi_block_cpu unsigned int
> - Use list_del() instead of list_del_init()
> - Coding style
>
> One remaining item in v7:
> Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> need Dario or George's input about this.
>
> v6:
> - Add two static inline functions for pi context switch
> - Fix typos
>
> v5:
> - Rename arch_vcpu_wake to arch_vcpu_wake_prepare
> - Make arch_vcpu_wake_prepare() inline for ARM
> - Merge the ARM dummy hook with together
> - Changes to some code comments
> - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
> PI is disabled or the vCPU is not in HVM
> - Coding style
>
> v4:
> - Newly added
>
> Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
> v6:
> - Fix some typos
> - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()
>
> v4:
> - Use local variables in pi_wakeup_interrupt()
> - Remove vcpu from the blocked list when pi_desc.on==1, this
> - avoid kick vcpu multiple times.
> - Remove tasklet
>
> v3:
> - This patch is generated by merging the following three patches in v2:
> [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
> [RFC v2 10/15] vmx: Define two per-cpu variables
> [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
>
> xen/arch/x86/hvm/hvm.c | 6 ++
> xen/arch/x86/hvm/vmx/vmcs.c | 2 +
> xen/arch/x86/hvm/vmx/vmx.c | 172
> +++++++++++++++++++++++++++++++++++++
> xen/common/schedule.c | 4 +
> xen/include/asm-arm/domain.h | 2 +
> xen/include/asm-x86/domain.h | 2 +
> xen/include/asm-x86/hvm/hvm.h | 2 +
> xen/include/asm-x86/hvm/vmx/vmcs.h | 9 ++
> xen/include/asm-x86/hvm/vmx/vmx.h | 4 +
> 9 files changed, 203 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6c2b512..3368cf2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
> hvm_destroy_all_ioreq_servers(d);
> }
>
> +void arch_vcpu_block(struct vcpu *v)
> +{
> + if ( v->arch.vcpu_block )
> + v->arch.vcpu_block(v);
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 000d06e..0f23fce 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -676,6 +676,8 @@ int vmx_cpu_up(void)
> if ( cpu_has_vmx_vpid )
> vpid_sync_all();
>
> + vmx_pi_per_cpu_init(cpu);
> +
> return 0;
> }
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 39dc500..0d9462e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr,
> uint64_t msr_content);
> static void vmx_invlpg_intercept(unsigned long vaddr);
> static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
> uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> + INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> + spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> + unsigned long flags;
> + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> + if ( !has_arch_pdevs(v->domain) )
> + return;
> +
> + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> + /*
> + * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> + * the per-CPU list, we also save it to posted-interrupt descriptor and
> + * make it as the destination of the wake-up notification event.
> + */
> + v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> + v->arch.hvm_vmx.pi_block_cpu), flags);
> + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> + &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> + v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> + ASSERT(!pi_test_sn(pi_desc));
> +
> + /*
> + * We don't need to set the 'NDST' field, since it should point to
> + * the same pCPU as v->processor.
> + */
> +
> + write_atomic(&pi_desc->nv, pi_wakeup_vector);
> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> + if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> + test_bit(_VPF_blocked, &v->pause_flags) )
> + return;
> +
> + /*
> + * The vCPU has been preempted or went to sleep. We don't need to send
> + * notification event to a non-running vcpu, the interrupt information
> + * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> + * to run next time.
> + */
> + pi_set_sn(pi_desc);
> +}
> +
> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> + if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> + return;
> +
> + if ( x2apic_enabled )
> + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> + else
> + write_atomic(&pi_desc->ndst,
> + MASK_INSR(cpu_physical_id(v->processor),
> + PI_xAPIC_NDST_MASK));
> +
> + pi_clear_sn(pi_desc);
> +}
> +
> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
> + unsigned long flags;
> + unsigned int pi_block_cpu;
> + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> + if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> + return;
> +
> + ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> + /*
> + * Set 'NV' field back to posted_intr_vector, so the
> + * Posted-Interrupts can be delivered to the vCPU when
> + * it is running in non-root mode.
> + */
> + if ( pi_desc->nv != posted_intr_vector )
> + write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> + /* the vCPU is not on any blocking list. */
> + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> + if ( pi_block_cpu == NR_CPUS )
> + return;
> +
> + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> + /*
> + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> + * removed from the blocking list while we are acquiring the lock.
> + */
> + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> + {
> + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> + return;
> + }
> +
> + list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +}
>
> static int vmx_domain_initialise(struct domain *d)
> {
> @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>
> spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>
> + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
> + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
> v->arch.schedule_tail = vmx_do_resume;
> v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> v->arch.ctxt_switch_to = vmx_ctxt_switch_to;
>
> + if ( iommu_intpost )
> + v->arch.vcpu_block = vmx_vcpu_block;
> +
> if ( (rc = vmx_create_vmcs(v)) != 0 )
> {
> dprintk(XENLOG_WARNING,
> @@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
> vmx_save_guest_msrs(v);
> vmx_restore_host_msrs();
> vmx_save_dr(v);
> + vmx_pi_switch_from(v);
> }
>
> static void vmx_ctxt_switch_to(struct vcpu *v)
> @@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>
> vmx_restore_guest_msrs(v);
> vmx_restore_dr(v);
> + vmx_pi_switch_to(v);
> }
>
>
> @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
> };
>
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> + struct arch_vmx_struct *vmx, *tmp;
> + struct vcpu *v;
> + spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> + struct list_head *blocked_vcpus =
> + &per_cpu(pi_blocked_vcpu, smp_processor_id());
> + LIST_HEAD(list);
> +
> + ack_APIC_irq();
> + this_cpu(irq_count)++;
> +
> + spin_lock(lock);
> +
> + /*
> + * XXX: The length of the list depends on how many vCPU is current
> + * blocked on this specific pCPU. This may hurt the interrupt latency
> + * if the list grows to too many entries.
> + */
> + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> + {
> + if ( pi_test_on(&vmx->pi_desc) )
> + {
> + list_del(&vmx->pi_blocked_vcpu_list);
> + vmx->pi_block_cpu = NR_CPUS;
> + v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> + vcpu_unblock(v);
> + }
> + }
> +
> + spin_unlock(lock);
> +}
> +
> /* Handle VT-d posted-interrupt when VCPU is running. */
> static void pi_notification_interrupt(struct cpu_user_regs *regs)
> {
> @@ -2100,7 +2267,10 @@ const struct hvm_function_table * __init
> start_vmx(void)
> if ( cpu_has_vmx_posted_intr_processing )
> {
> if ( iommu_intpost )
> + {
> alloc_direct_apic_vector(&posted_intr_vector,
> pi_notification_interrupt);
> + alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> + }
> else
> alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt);
> }
> @@ -3543,6 +3713,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs
> *regs)
> struct hvm_vcpu_asid *p_asid;
> bool_t need_flush;
>
> + vmx_pi_state_to_normal(curr);
> +
> if ( !cpu_has_vmx_vpid )
> goto out;
> if ( nestedhvm_vcpu_in_guestmode(curr) )
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index c195129..fc18035 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -802,6 +802,8 @@ void vcpu_block(void)
>
> set_bit(_VPF_blocked, &v->pause_flags);
>
> + arch_vcpu_block(v);
> +
> /* Check for events /after/ blocking: avoids wakeup waiting race. */
> if ( local_events_need_delivery() )
> {
> @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
> v->poll_evtchn = -1;
> set_bit(v->vcpu_id, d->poll_mask);
>
> + arch_vcpu_block(v);
> +
> #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
> /* Check for events /after/ setting flags: avoids wakeup waiting race. */
> smp_mb();
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index e7e40da..ada146c 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -310,6 +310,8 @@ static inline void free_vcpu_guest_context(struct
> vcpu_guest_context *vgc)
> xfree(vgc);
> }
>
> +static inline void arch_vcpu_block(struct vcpu *v) {}
> +
> #endif /* __ASM_DOMAIN_H__ */
>
> /*
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index c825975..135f7f9 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -494,6 +494,8 @@ struct arch_vcpu
> void (*ctxt_switch_from) (struct vcpu *);
> void (*ctxt_switch_to) (struct vcpu *);
>
> + void (*vcpu_block) (struct vcpu *);
> +
> struct vpmu_struct vpmu;
>
> /* Virtual Machine Extensions */
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index f80e143..336fa62 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -560,6 +560,8 @@ void altp2m_vcpu_update_vmfunc_ve(struct vcpu *v);
> /* emulates #VE */
> bool_t altp2m_vcpu_emulate_ve(struct vcpu *v);
>
> +void arch_vcpu_block(struct vcpu *v);
> +
> #endif /* __ASM_X86_HVM_HVM_H__ */
>
> /*
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> x86/hvm/vmx/vmcs.h
> index b3b0946..7b3a931 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,15 @@ struct arch_vmx_struct {
> struct page_info *vmwrite_bitmap;
>
> struct page_info *pml_pg;
> +
> + struct list_head pi_blocked_vcpu_list;
> +
> + /*
> + * Before vCPU is blocked, it is added to the global per-cpu list
> + * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> + * event to 'pi_block_cpu' and wakeup the related vCPU.
> + */
> + unsigned int pi_block_cpu;
> };
>
> int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-
> x86/hvm/vmx/vmx.h
> index 1719965..8129bff 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,8 @@
> #include <asm/hvm/trace.h>
> #include <asm/hvm/vmx/vmcs.h>
>
> +extern uint8_t pi_wakeup_vector;
> +
> typedef union {
> struct {
> u64 r : 1, /* bit 0 - Read permission */
> @@ -563,6 +565,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
> void free_p2m_hap_data(struct p2m_domain *p2m);
> void p2m_init_hap_data(struct p2m_domain *p2m);
>
> +void vmx_pi_per_cpu_init(unsigned int cpu);
> +
> /* EPT violation qualifications definitions */
> #define _EPT_READ_VIOLATION 0
> #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION)
> --
> 2.1.0
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |