[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
>>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d) > hvm_destroy_all_ioreq_servers(d); > } > > +void arch_vcpu_block(struct vcpu *v) > +{ > + if ( v->arch.vcpu_block ) > + v->arch.vcpu_block(v); > +} Perhaps better as an inline function? In no case would it (and its declaration) belong in a HVM-specific file if the hook isn't HVM-specific. Or if the hook was, then it ought to be put in struct hvm_vcpu. > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, > uint64_t msr_content); > static void vmx_invlpg_intercept(unsigned long vaddr); > static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); > > +/* > + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we > + * can find which vCPU should be woken up. > + */ > +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu); > +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock); > + > uint8_t __read_mostly posted_intr_vector; > +uint8_t __read_mostly pi_wakeup_vector; I'm pretty sure I complained about this before - this is used only in this file, and hence should be static. > +void vmx_pi_per_cpu_init(unsigned int cpu) > +{ > + INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu)); > + spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu)); > +} > + > +void vmx_vcpu_block(struct vcpu *v) > +{ > + unsigned long flags; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !has_arch_pdevs(v->domain) ) > + return; Is there no locking needed for this check to be safe? Same goes for vmx_pi_switch_{from,to}() then obviously... > + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS); > + > + /* > + * The vCPU is blocking, we need to add it to one of the per pCPU lists. > + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for > + * the per-CPU list, we also save it to posted-interrupt descriptor and > + * make it as the destination of the wake-up notification event. > + */ > + v->arch.hvm_vmx.pi_block_cpu = v->processor; > + > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, > + &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu)); > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > + v->arch.hvm_vmx.pi_block_cpu), flags); > + > + ASSERT(!pi_test_sn(pi_desc)); > + > + /* > + * We don't need to set the 'NDST' field, since it should point to > + * the same pCPU as v->processor. > + */ > + > + write_atomic(&pi_desc->nv, pi_wakeup_vector); Stray blank line between comment and corresponding code. Also the ASSERT() is rather more disconnected from the write than seems desirable: Wouldn't it be better to cmpxchg() the whole 32-bit value, validating that SN is clear in the result? > +static void vmx_pi_switch_to(struct vcpu *v) > +{ > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !iommu_intpost || !has_arch_pdevs(v->domain) ) > + return; > + > + if ( x2apic_enabled ) > + write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor)); > + else > + write_atomic(&pi_desc->ndst, > + MASK_INSR(cpu_physical_id(v->processor), > + PI_xAPIC_NDST_MASK)); Considering these are atomic writes, I'm not sure the compiler will manage to fold the two. Could I therefor talk you into morphing this into write_atomic(&pi_desc->ndst, x2apic_enabled ? ... : ... ); ? Indentation will need to be fixed in any case. > +static void vmx_pi_state_to_normal(struct vcpu *v) > +{ > + unsigned long flags; > + unsigned int pi_block_cpu; > + struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc; > + > + if ( !iommu_intpost || !has_arch_pdevs(v->domain) ) > + return; > + > + ASSERT(!test_bit(_VPF_blocked, &v->pause_flags)); > + > + /* > + * Set 'NV' field back to posted_intr_vector, so the > + * Posted-Interrupts can be delivered to the vCPU when > + * it is running in non-root mode. > + */ > + if ( pi_desc->nv != posted_intr_vector ) > + write_atomic(&pi_desc->nv, posted_intr_vector); > + > + /* the vCPU is not on any blocking list. */ Comment style. > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > + if ( pi_block_cpu == NR_CPUS ) > + return; Are this condition and the one in the immediately preceding if() connected in any way? I.e. could the latter perhaps become an ASSERT() by connecting the two blocks suitably? I'm simply trying to limit the number of different cases to consider mentally when looking at this code ... > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags); > + > + /* > + * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was > + * removed from the blocking list while we are acquiring the lock. > + */ > + if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS ) With you wanting to deal with changes behind your back here, isn't there come kind of barrier needed between reading and using pi_block_cpu, such that the compiler won't convert the local variable accesses into multiple reads of v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)? > + { > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > + return; > + } > + > + list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; Perhaps (again for readability's sake) better to invert the if() condition and move these two lines into its body, ... > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), > flags); > +} ... while this would then be common. > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); > > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > + > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > + > v->arch.schedule_tail = vmx_do_resume; > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > > + if ( iommu_intpost ) > + v->arch.vcpu_block = vmx_vcpu_block; Considering the locking question above - couldn't this be done only when a device gets added, and the pointer zapped upon removal of the last device, at once saving the conditional inside the hook? > @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc, > }; > > +/* Handle VT-d posted-interrupt when VCPU is blocked. */ > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs) > +{ > + struct arch_vmx_struct *vmx, *tmp; > + struct vcpu *v; > + spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id()); > + struct list_head *blocked_vcpus = > + &per_cpu(pi_blocked_vcpu, smp_processor_id()); > + LIST_HEAD(list); Unused local variable? > + ack_APIC_irq(); > + this_cpu(irq_count)++; > + > + spin_lock(lock); > + > + /* > + * XXX: The length of the list depends on how many vCPU is current > + * blocked on this specific pCPU. This may hurt the interrupt latency > + * if the list grows to too many entries. > + */ > + list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list) > + { > + if ( pi_test_on(&vmx->pi_desc) ) > + { > + list_del(&vmx->pi_blocked_vcpu_list); > + vmx->pi_block_cpu = NR_CPUS; > + v = container_of(vmx, struct vcpu, arch.hvm_vmx); > + vcpu_unblock(v); It not being used elsewhere, I don't see the need for the local variable v. If you really want to keep it, move its declaration into this most narrow scope please. > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -802,6 +802,8 @@ void vcpu_block(void) > > set_bit(_VPF_blocked, &v->pause_flags); > > + arch_vcpu_block(v); > + > /* Check for events /after/ blocking: avoids wakeup waiting race. */ > if ( local_events_need_delivery() ) > { ISTR some previous discussion on this placement - wasn't it determined that putting it in the else part here would be sufficient for your needs and better in terms of overhead? Or is there some race possible if moved past the invocation of local_events_need_delivery()? > @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll) > v->poll_evtchn = -1; > set_bit(v->vcpu_id, d->poll_mask); > > + arch_vcpu_block(v); > + > #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */ > /* Check for events /after/ setting flags: avoids wakeup waiting race. */ > smp_mb(); Same question here then regarding moving this further down. > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -160,6 +160,15 @@ struct arch_vmx_struct { > struct page_info *vmwrite_bitmap; > > struct page_info *pml_pg; > + > + struct list_head pi_blocked_vcpu_list; > + > + /* > + * Before vCPU is blocked, it is added to the global per-cpu list global or per-cpu? > + * of 'pi_block_cpu', then VT-d engine can send wakeup notification > + * event to 'pi_block_cpu' and wakeup the related vCPU. > + */ Perhaps to help understanding the comment the part after the second comma could be a separate sentence? > + unsigned int pi_block_cpu; Looking at all the use sites I wonder whether it wouldn't make for easier to read code if you instead stored the lock to acquire. This would then perhaps also prepare a road for balancing lists growing too large (i.e. rather than having one list per CPU, you could then have a variable number of lists, depending on current needs, and [kind of] freely select one of them). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |