[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling



>>> On 20.01.16 at 12:20, <feng.wu@xxxxxxxxx> wrote:
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Wednesday, January 20, 2016 4:35 PM
>> >>> On 20.01.16 at 08:49, <feng.wu@xxxxxxxxx> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> >> Sent: Monday, January 18, 2016 11:14 PM
>> >> >>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote:
>> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
>> >> msr, uint64_t msr_content);
>> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
>> >> > +
>> >> > +    /*
>> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
>> lists.
>> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
>> it
>> >> for
>> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
>> and
>> >> > +     * make it as the destination of the wake-up notification event.
>> >> > +     */
>> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
>> >> > +
>> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
>> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
>> >> > +                  &per_cpu(pi_blocked_vcpu, v-
>> >arch.hvm_vmx.pi_block_cpu));
>> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
>> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> > +
>> >> > +    ASSERT(!pi_test_sn(pi_desc));
>> >> > +
>> >> > +    /*
>> >> > +     * We don't need to set the 'NDST' field, since it should point to
>> >> > +     * the same pCPU as v->processor.
>> >> > +     */
>> >> > +
>> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>> >>
>> >> Stray blank line between comment and corresponding code.
>> >>
>> >> Also the ASSERT() is rather more disconnected from the write
>> >> than seems desirable: Wouldn't it be better to cmpxchg() the
>> >> whole 32-bit value, validating that SN is clear in the result?
>> >
>> > Not quite understand this. The control field is 64 bits, do you
>> > mean cmpxchg() the whole 64-bit value like follows:
>> >
>> > +        do {
>> > +            old.control = new.control = pi_desc->control;
>> > +            new.nv = pi_wakeup_vector;
>> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
>> > +                  != old.control );
>> >
>> > This a somewhat like the implementation in the earlier versions,
>> > why do you want to change it back?
>> 
>> Did you read what I've said? I'm worried about the disconnect of
>> assertion and update: You really care about SN being clear
>> _after_ the update aiui. 
> 
> No, the ASSERT has no connection with the update. the SN bit should
> be clear before updating pi_desc->nv, adding ASSERT here is just kind
> of protection code.

And SN can't get set behind your back between the ASSERT() and
the update? If the answer is "yes", then the code is indeed fine as is.

>> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> >> > +    if ( pi_block_cpu == NR_CPUS )
>> >> > +        return;
>> >>
>> >> Are this condition and the one in the immediately preceding if()
>> >> connected in any way?
>> >
>> > I am not sure I understand this correctly. Which one is
>> > " the one in the immediately preceding if()"?
>> 
>> If you hadn't ripped out too much of the context, it would be a
>> matter of pointing you back up. Now I have to re-quote the full
>> code:
>> 
>> +    if ( pi_desc->nv != posted_intr_vector )
>> +        write_atomic(&pi_desc->nv, posted_intr_vector);
>> +
>> +    /* the vCPU is not on any blocking list. */
>> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> +    if ( pi_block_cpu == NR_CPUS )
>> +        return;
>> 
>> These are the two if()-s the question is about.
>> 
>> >> I.e. could the latter perhaps become an
>> >> ASSERT() by connecting the two blocks suitably? I'm simply
>> >> trying to limit the number of different cases to consider mentally
>> >> when looking at this code ...
>> >
>> > If we get a true from above check, it means the vcpu was removed by
>> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
>> > we will acquire the spinlock as the following code does, there are two
>> > scenarios:
>> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
>> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
>> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
>> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
>> > to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
>> > to NR_CPUS.
>> 
>> This is all only about the second condition, but the question really is
>> whether one being true or false may imply the result of other. In
>> which case this would better be ASSERT()ed than handled by two
>> conditionals.
> 
> Thanks for the explanation. I don't think there are any connections
> Between " if ( pi_desc->nv != posted_intr_vector )" and
> " if ( pi_block_cpu == NR_CPUS )". 

Well, it would have seemed to me that some of the four possible
combinations can't validly occur, e.g. due to NV always having a
particular value when pi_block_cpu != NR_CPUS.

>> >> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >> >
>> >> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>> >> >
>> >> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>> >> > +
>> >> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
>> >> > +
>> >> >      v->arch.schedule_tail    = vmx_do_resume;
>> >> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>> >> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>> >> >
>> >> > +    if ( iommu_intpost )
>> >> > +        v->arch.vcpu_block = vmx_vcpu_block;
>> >>
>> >> Considering the locking question above - couldn't this be done only
>> >> when a device gets added, and the pointer zapped upon removal
>> >> of the last device, at once saving the conditional inside the hook?
>> >
>> > That is another solution, but I think it make simple thing difficult,
>> > I feel the current logic is simple and clear. Btw, this hook is not
>> > a critical path, another conditional in it should not a big deal.
>> 
>> Then you didn't understand: The question isn't this path, but the
>> path where the hook gets called if non-NULL (and hence the
>> possibility to avoid such needless calls).
> 
> I understand you mean the overhead happens when the hooks
> is called. My point is the hook is not called in a critical path, so I doubt
> whether it worth doing so to make the logic complex.

Are you sure scheduling code is not a critical path?

>> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> >>
>> >> > +    unsigned int         pi_block_cpu;
>> >>
>> >> Looking at all the use sites I wonder whether it wouldn't make for
>> >> easier to read code if you instead stored the lock to acquire. This
>> >> would then perhaps also prepare a road for balancing lists growing
>> >> too large (i.e. rather than having one list per CPU, you could
>> >> then have a variable number of lists, depending on current needs,
>> >> and [kind of] freely select one of them).
>> >
>> > Since this patch is still experimental, can we put the "length of the list"
>> > issue in the next stage when we turn it to default on?
>> 
>> I in no way meant to say that the list length issue needs to be
>> dealt with right away. All I was saying was that storing a lock
>> pointer here instead of a CPU number may make that easier (and
>> might therefore be a good idea to do right away).
> 
> Do you mean storing a pointer to ' pi_blocked_vcpu_lock ' here?

Whatever you name it, but yes.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.