[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling




> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> Sent: Wednesday, July 15, 2015 12:03 AM
> To: Wu, Feng
> Cc: Jan Beulich; Tian, Kevin; keir@xxxxxxx; George Dunlap;
> andrew.cooper3@xxxxxxxxxx; xen-devel; Zhang, Yang Z
> Subject: Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On Tue, 2015-07-14 at 14:08 +0000, Wu, Feng wrote:
> >
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> 
> > >  - do you need to perform an action upon context switch (on prev and/or
> > >    next vcpu)? If yes, there's an arch specific path in there already;
> > >  - do you need to perform an action when a vcpu wakes-up? If yes, we
> > >    need an arch hook in vcpu_wake();
> > >  - do you need to perform an action when a vcpu goes to sleep? If yes,
> > >    we need an arch hook in vcpu_sleep_nosync();
> > >
> > > I think this makes a more than fair solution. I happen to like it even
> > > better than the centralized approach, actually! That is for personal
> > > taste, but also because I think it may be useful for others too, in
> > > future, to be able to execute arch specific code, e.g., upon wakes-up,
> > > in which case we will be able to use the hook that we're introducing
> > > here for PI.
> > >
> > > Thanks and Regards,
> > > Dario
> >
> > Hi Dario,
> >
> Hi,
> 
> > Thanks for the suggestion! I made a draft patch for this idea,
> >
> Great!
> 
> > It may have
> > some issues since It is just a draft version, kind of like prototype, I post
> > it here just like to know whether it is meet your expectation, if it is I
> > can continue with this direction and this may speed up the upstreaming
> > process.
> >
> Yes, I think this is a good approach, and the proper way for this
> feature to interact with the scheduler.
> 
> I appreciate it is a draft, so I'm not performing a thorough review, but
> I'll try to at least give some comments, in the hope that it helps.

Thanks, any comments are good for my next post!

> 
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 6eebc1a..7e678c8 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -740,6 +740,81 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
> >      vmx_save_guest_msrs(v);
> >      vmx_restore_host_msrs();
> >      vmx_save_dr(v);
> > +
> > +    if ( iommu_intpost )
> > +    {
> >
> I'd put an helper together ( vmx_<something>_pi() ) and put the body of
> this if in it.
> 
> Then, either just call it unconditionally from here and have, in the
> helper, something like this:
> 
>  if ( !iommu_intpost )
>    return;
> 
> Or just have this in here:
> 
>  if ( iommu_intpost )
>   vmx_<something>_pi();
> 
> > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +        struct pi_desc old, new;
> > +        unsigned long flags;
> > +
> > +        if ( vcpu_runnable(v) || !test_bit(_VPF_blocked,
> &v->pause_flags) )
> > +        {
> >
> Aha! So, AFAICT, this means we can deal with preemptions, sleeps and
> blockings (as can be seen below) here in _ctxt_switch_from,

Yes, here we can handle the above cases, from the runstate point of
view, this can handle the following cases in this hook:

running -> runnable
running -> blocked
running -> offline

> i.e., we
> don't have to call in this code from vcpu_sleep_nosync(), like we were,
> when tying this to vcpu_runstate_change())... nice! :-D

Yes, in vcpu_sleep_nosync(), there are mainly three cases:
runnable -> offline: we don't need anything for PI
running -> offline: covered here

So , I think we don't need to add an arch hook in vcpu_sleep_nosync().

> 
> > +            /*
> > +             * The vCPU is preempted or sleeped.
> >
> "has been preempted or went to sleep" ?
> 
> > We don't need to send
> > +             * notification event to a non-running vcpu, the interrupt
> > +             * information will be delivered to it before VM-ENTRY when
> > +             * the vcpu is scheduled to run next time.
> > +             */
> > +            pi_set_sn(pi_desc);
> > +
> > +        }
> > +        else if ( test_bit(_VPF_blocked, &v->pause_flags) )
> > +        {
> > +            /* The vCPU is blocked */
> >
> This comment does not add much, I'd kill it.
> 
> > +            ASSERT(v->arch.hvm_vmx.pi_block_cpu == -1);
> > +
> > +            /*
> > +             * The vCPU is blocked on the block list.
> >
> What about "The vCPU is blocking, we need to add it to one of the per
> pCPU lists."
> 
> > Add the blocked
> > +             * vCPU on the list of the v->arch.hvm_vmx.pi_block_cpu,
> >
> What you're doing seems more "Add the vCPU to the blocked list of
> v->processor, which will be the target of the wake-up notification".

Yes, but v->arch.hvm_vmx.pi_block_cpu gets the value of v->processor here.
So maybe we can improve the description here.

> 
> > +             * which is the destination of the wake-up notification event.
> > +             */
> > +            v->arch.hvm_vmx.pi_block_cpu = v->processor;
> > +            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> > +                              v->arch.hvm_vmx.pi_block_cpu),
> flags);
> > +            list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> > +                          &per_cpu(pi_blocked_vcpu,
> v->arch.hvm_vmx.pi_block_cpu));
> > +            spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> > +                               v->arch.hvm_vmx.pi_block_cpu),
> flags);
> > +
> > +            do {
> > +                old.control = new.control = pi_desc->control;
> > +
> > +                /*
> > +                 * We should not block the vCPU if
> > +                 * an interrupt was posted for it.
> > +                 */
> > +
> > +                if ( old.on )
> > +                {
> > +                    /*
> > +                     * The vCPU will be removed from the block list
> > +                     * during its state transferring from
> RUNSTATE_blocked
> > +                     * to RUNSTATE_runnable after the following
> tasklet
> > +                     * is executed.
> >
> We can avoid referencing RUNSTATEs at all, can't we? Just say something
> about the vCPU leaving the blocked vCPUs list on the wake-up path.

Sure, I just copy these code from the original one here.

> 
> > +                     */
> > +
> tasklet_schedule(&v->arch.hvm_vmx.pi_vcpu_wakeup_tasklet);
> > +                    return;
> > +                }
> > +
> > +                /*
> > +                 * Change the 'NDST' field to
> v->arch.hvm_vmx.pi_block_cpu,
> > +                 * so when external interrupts from assigned deivces
> happen,
> > +                 * wakeup notifiction event will go to
> > +                 * v->arch.hvm_vmx.pi_block_cpu, then in
> pi_wakeup_interrupt()
> > +                 * we can find the vCPU in the right list to wake up.
> > +                 */
> > +                if ( x2apic_enabled )
> > +                    new.ndst =
> cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
> > +                else
> > +                    new.ndst = MASK_INSR(cpu_physical_id(
> > +
> v->arch.hvm_vmx.pi_block_cpu),
> > +                                     PI_xAPIC_NDST_MASK);
> > +                new.sn = 0;
> > +                new.nv = pi_wakeup_vector;
> > +            } while ( cmpxchg(&pi_desc->control, old.control,
> new.control)
> > +                      != old.control );
> > +        }
> > +    }
> ISTR, Jan had some comments on this code (variable names, etc.). It's
> probably goes without saying that those still applies.

Absolutely I will address Jan's comments in the next version.

> 
> >  static void vmx_ctxt_switch_to(struct vcpu *v)
> > @@ -764,6 +839,22 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
> >
> >      vmx_restore_guest_msrs(v);
> >      vmx_restore_dr(v);
> > +
> > +    if ( iommu_intpost )
> > +    {
> >
> You may consider having an helper for this too, for symmetry with the
> above case, but this is less of an issue, IMO.
> 
> > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +        ASSERT( pi_desc->sn == 1 );
>                   ^space
> 
> Above you wrote:
> 
>   ASSERT(v->arch.hvm_vmx.pi_block_cpu == -1);
>          ^no space
> 
> Please, pick up one format (ideally, following suit from other
> occurrences in the file, if any), and be consistent.
> 
> > +
> > +        if ( x2apic_enabled )
> > +            write_atomic(&pi_desc->ndst,
> cpu_physical_id(v->processor));
> > +        else
> > +            write_atomic(&pi_desc->ndst,
> > +                         MASK_INSR(cpu_physical_id(v->processor),
> > +                         PI_xAPIC_NDST_MASK));
> > +
> > +        pi_clear_sn(pi_desc);
> > +    }
> >  }
> 
> > +void arch_vcpu_wake(struct vcpu *v)
> > +{
> > +    if ( !iommu_intpost || (v->runstate.state != RUNSTATE_blocked) )
> > +        return;
> > +
> > +    if ( likely(vcpu_runnable(v)) ||
> > +         !test_bit(_VPF_blocked, &v->pause_flags) )
> > +    {
> Invert this and bail if true? Well, a matter of taste, I guess... but it
> will save one level of indentation.
> 
> > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +        unsigned long flags;
> > +
> > +        /*
> > +         * blocked -> runnable/offline
> > +         * If the state is transferred from RUNSTATE_blocked,
> > +         * we should set 'NV' feild back to posted_intr_vector,
> > +         * so the Posted-Interrupts can be delivered to the vCPU
> > +         * by VT-d HW after it is scheduled to run.
> > +         */
> >
> Again, make the comment describe things in a RUNSTATE independent way
> (e.g., in terms of 'generic states', like "it's preempted", "it's
> blocked", "it's runnable"; or in terms of flags; or both).

Thanks for your comments and suggestion, Dario!

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.