[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Wednesday, January 20, 2016 4:35 PM
> To: Wu, Feng <feng.wu@xxxxxxxxx>
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario Faggioli
> <dario.faggioli@xxxxxxxxxx>; George Dunlap <george.dunlap@xxxxxxxxxxxxx>;
> Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx; Keir Fraser
> <keir@xxxxxxx>
> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> >>> On 20.01.16 at 08:49, <feng.wu@xxxxxxxxx> wrote:
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Monday, January 18, 2016 11:14 PM
> >> >>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote:
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
> >> msr, uint64_t msr_content);
> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> >> > +
> >> > +    /*
> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
> lists.
> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> it
> >> for
> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
> and
> >> > +     * make it as the destination of the wake-up notification event.
> >> > +     */
> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> >> > +
> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> >> > +                  &per_cpu(pi_blocked_vcpu, v-
> >arch.hvm_vmx.pi_block_cpu));
> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> >> > +
> >> > +    ASSERT(!pi_test_sn(pi_desc));
> >> > +
> >> > +    /*
> >> > +     * We don't need to set the 'NDST' field, since it should point to
> >> > +     * the same pCPU as v->processor.
> >> > +     */
> >> > +
> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> >>
> >> Stray blank line between comment and corresponding code.
> >>
> >> Also the ASSERT() is rather more disconnected from the write
> >> than seems desirable: Wouldn't it be better to cmpxchg() the
> >> whole 32-bit value, validating that SN is clear in the result?
> >
> > Not quite understand this. The control field is 64 bits, do you
> > mean cmpxchg() the whole 64-bit value like follows:
> >
> > +        do {
> > +            old.control = new.control = pi_desc->control;
> > +            new.nv = pi_wakeup_vector;
> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                  != old.control );
> >
> > This a somewhat like the implementation in the earlier versions,
> > why do you want to change it back?
> 
> Did you read what I've said? I'm worried about the disconnect of
> assertion and update: You really care about SN being clear
> _after_ the update aiui. 

No, the ASSERT has no connection with the update. the SN bit should
be clear before updating pi_desc->nv, adding ASSERT here is just kind
of protection code.

> And since NDST doesn't change, a 32-bit
> CMPXCHG would seem to suffice.
> 
> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >> > +    if ( pi_block_cpu == NR_CPUS )
> >> > +        return;
> >>
> >> Are this condition and the one in the immediately preceding if()
> >> connected in any way?
> >
> > I am not sure I understand this correctly. Which one is
> > " the one in the immediately preceding if()"?
> 
> If you hadn't ripped out too much of the context, it would be a
> matter of pointing you back up. Now I have to re-quote the full
> code:
> 
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> 
> These are the two if()-s the question is about.
> 
> >> I.e. could the latter perhaps become an
> >> ASSERT() by connecting the two blocks suitably? I'm simply
> >> trying to limit the number of different cases to consider mentally
> >> when looking at this code ...
> >
> > If we get a true from above check, it means the vcpu was removed by
> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
> > we will acquire the spinlock as the following code does, there are two
> > scenarios:
> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
> > to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
> > to NR_CPUS.
> 
> This is all only about the second condition, but the question really is
> whether one being true or false may imply the result of other. In
> which case this would better be ASSERT()ed than handled by two
> conditionals.

Thanks for the explanation. I don't think there are any connections
Between " if ( pi_desc->nv != posted_intr_vector )" and
" if ( pi_block_cpu == NR_CPUS )". 

> 
> >> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >> >
> >> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> >> >
> >> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> >> > +
> >> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> >> > +
> >> >      v->arch.schedule_tail    = vmx_do_resume;
> >> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> >> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> >> >
> >> > +    if ( iommu_intpost )
> >> > +        v->arch.vcpu_block = vmx_vcpu_block;
> >>
> >> Considering the locking question above - couldn't this be done only
> >> when a device gets added, and the pointer zapped upon removal
> >> of the last device, at once saving the conditional inside the hook?
> >
> > That is another solution, but I think it make simple thing difficult,
> > I feel the current logic is simple and clear. Btw, this hook is not
> > a critical path, another conditional in it should not a big deal.
> 
> Then you didn't understand: The question isn't this path, but the
> path where the hook gets called if non-NULL (and hence the
> possibility to avoid such needless calls).

I understand you mean the overhead happens when the hooks
is called. My point is the hook is not called in a critical path, so I doubt
whether it worth doing so to make the logic complex.

> 
> >> > --- a/xen/common/schedule.c
> >> > +++ b/xen/common/schedule.c
> >> > @@ -802,6 +802,8 @@ void vcpu_block(void)
> >> >
> >> >      set_bit(_VPF_blocked, &v->pause_flags);
> >> >
> >> > +    arch_vcpu_block(v);
> >> > +
> >> >      /* Check for events /after/ blocking: avoids wakeup waiting race. */
> >> >      if ( local_events_need_delivery() )
> >> >      {
> >>
> >> ISTR some previous discussion on this placement - wasn't it
> >> determined that putting it in the else part here would be
> >> sufficient for your needs and better in terms of overhead?
> >> Or is there some race possible if moved past the invocation
> >> of local_events_need_delivery()?
> >
> > The discussion happened several month ago, it is really hard to recall
> > every details, I have to look for it from my index. It is really bad to
> > last for so long for the review.
> 
> I understand the frustration on your end, but I can only justify
> it by the frustration at my end resulting from everyone dumping
> out their patch sets with only very few people helping to deal
> with that flood. Plus, after previous iterations, I expected the
> review to be more involved than it turned out in the end (i.e.
> you've done a good job in simplifying things after the feedback
> you had received - thanks for that).
> 
> > We need to call arch_vcpu_block() before local_events_need_delivery(),
> > since VT-d hardware can issue notification event when we are in
> > arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check
> > it in local_events_need_delivery(). If we do arch_vcpu_block() in the
> > else part, we cannot handle this. This is one reason I can recall, I am
> > not sure whether there are other concerns since it has been really
> > a long time. The current implementation is fully discussed with scheduler
> > maintainers.
> 
> Okay, this all makes sense. It's just that I don't see how the ON
> bit would get checked by local_events_need_delivery(). But that
> may in turn be because, with the long turnaround, I've lost some
> of the details of the rest of this series.

Here is how ON is checked:
local_events_need_delivery() -->
        hvm_local_events_need_delivery() -->
                hvm_vcpu_has_pending_irq() -->
                        vlapic_has_pending_irq() -->
                                vlapic_find_highest_irr() -->
                                        vmx_sync_pir_to_irr() -->
                                                pi_test_and_clear_on()

> 
> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> >>
> >> > +    unsigned int         pi_block_cpu;
> >>
> >> Looking at all the use sites I wonder whether it wouldn't make for
> >> easier to read code if you instead stored the lock to acquire. This
> >> would then perhaps also prepare a road for balancing lists growing
> >> too large (i.e. rather than having one list per CPU, you could
> >> then have a variable number of lists, depending on current needs,
> >> and [kind of] freely select one of them).
> >
> > Since this patch is still experimental, can we put the "length of the list"
> > issue in the next stage when we turn it to default on?
> 
> I in no way meant to say that the list length issue needs to be
> dealt with right away. All I was saying was that storing a lock
> pointer here instead of a CPU number may make that easier (and
> might therefore be a good idea to do right away).

Do you mean storing a pointer to ' pi_blocked_vcpu_lock ' here?
Are you strong with this method. I will think about it a bit more to
see whether it affects current implementation a lot.

Thanks,
Feng

> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.