[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Wednesday, June 10, 2015 2:44 PM
> To: Wu, Feng
> Cc: andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxxxxx; Tian, Kevin;
> Zhang, Yang Z; xen-devel@xxxxxxxxxxxxx; keir@xxxxxxx
> Subject: Re: [RFC v2 13/15] Update Posted-Interrupts Descriptor during vCPU
> scheduling
> 
> >>> On 08.05.15 at 11:07, <feng.wu@xxxxxxxxx> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1711,6 +1711,131 @@ static void vmx_handle_eoi(u8 vector)
> >      __vmwrite(GUEST_INTR_STATUS, status);
> >  }
> >
> > +static void vmx_pi_desc_update(struct vcpu *v, int old_state)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    struct pi_desc old, new;
> > +    unsigned long flags;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> 
> Considering how the adjustment to start_vmx(), this at best should
> be an ASSERT() (if a check is needed at all).
> 
> > +    switch ( v->runstate.state )
> > +    {
> > +    case RUNSTATE_runnable:
> > +    case RUNSTATE_offline:
> > +        /*
> > +         * We don't need to send notification event to a non-running
> > +         * vcpu, the interrupt information will be delivered to it before
> > +         * VM-ENTRY when the vcpu is scheduled to run next time.
> > +         */
> > +        pi_desc->sn = 1;
> > +
> > +        /*
> > +         * If the state is transferred from RUNSTATE_blocked,
> > +         * we should set 'NV' feild back to posted_intr_vector,
> > +         * so the Posted-Interrupts can be delivered to the vCPU
> > +         * by VT-d HW after it is scheduled to run.
> > +         */
> > +        if ( old_state == RUNSTATE_blocked )
> > +        {
> > +            do
> > +            {
> > +                old.control = new.control = pi_desc->control;
> > +                new.nv = posted_intr_vector;
> > +            }
> > +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                    != old.control );
> 
> So why is it okay to do the SN update non-atomically when the
> vector update is done atomically?

After thinking this a bit more. It should be atomic to update the 'SN' filed, 
since
the hardware will perform a coherent atomic read-modify-write operation of
the posted-interrupt descriptor when posting interrupts.

> 
> Also the curly braces do _not_ go on separate lines for do/while
> constructs.
> 
> And then do you really need to atomically update the whole 64 bits
> here, rather than just the nv field? By not making it a bit field
> you could perhaps just write_atomic() it?

I think only atomically updating the nv field is fine.

> 
> > +
> > +           /*
> > +            * Delete the vCPU from the related block list
> > +            * if we are resuming from blocked state
> > +            */
> > +           spin_lock_irqsave(&per_cpu(blocked_vcpu_lock,
> > +                             v->pre_pcpu), flags);
> > +           list_del(&v->blocked_vcpu_list);
> > +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_lock,
> > +                                  v->pre_pcpu), flags);
> > +        }
> > +        break;
> > +
> > +    case RUNSTATE_blocked:
> > +        /*
> > +         * The vCPU is blocked on the block list.
> > +         * Add the blocked vCPU on the list of the
> > +         * vcpu->pre_pcpu, which is the destination
> > +         * of the wake-up notification event.
> > +         */
> > +        v->pre_pcpu = v->processor;
> 
> Is latching this upon runstate change really enough? I.e. what about
> the v->processor changes that sched_move_domain() or individual
> schedulers do? Or if it really just matters on which CPU's blocked list
> the vCPU is (while its ->processor changing subsequently doesn't
> matter) I'd like to see the field named more after its purpose (e.g.
> pi_block_cpu; list and lock should btw also have a connection to PI
> in their names).

Yes, It doesn't matter if vCPU changes. The key point is that we put
the vCPU on a pCPU list and we change the NDST field to this pCPU,
then the wakeup notification event will get there. You are right, I
need to rename them to reflect the real purpose of it.

> 
> In the end, if the placement on a list followed v->processor, you
> would likely get away without the extra new field. Are there
> synchronization constraints speaking against such a model?

I don't understand this quit well. Do you mean using 'v->processor'
as the pCPU list for the blocked vCPUs? Then what about 'v->processor'
changes, seems we cannot handle this case.

> 
> > +        spin_lock_irqsave(&per_cpu(blocked_vcpu_lock,
> > +                          v->pre_pcpu), flags);
> > +        list_add_tail(&v->blocked_vcpu_list,
> > +                      &per_cpu(blocked_vcpu, v->pre_pcpu));
> > +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_lock,
> > +                               v->pre_pcpu), flags);
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +
> > +            /*
> > +             * We should not block the vCPU if
> > +             * an interrupt was posted for it.
> > +             */
> > +
> > +            if ( old.on == 1 )
> 
> I think I said so elsewhere, but just in case I didn't: Please don't
> compare boolean fields with 1 - e.g. in the case here "if ( old.on )"
> suffices.
> 
> > +            {
> > +                /*
> > +                 * The vCPU will be removed from the block list
> > +                 * during its state transferring from RUNSTATE_blocked
> > +                 * to RUNSTATE_runnable after the following tasklet
> > +                 * is scheduled to run.
> > +                 */
> 
> Precise comments please - I don't think the scheduling of the
> tasklet makes any difference; I suppose it's the execution of it
> that does.
> 
> > +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > +                return;
> > +            }
> > +
> > +            /*
> > +             * Change the 'NDST' field to v->pre_pcpu, so when
> > +             * external interrupts from assigned deivces happen,
> > +             * wakeup notifiction event will go to v->pre_pcpu,
> > +             * then in pi_wakeup_interrupt() we can find the
> > +             * vCPU in the right list to wake up.
> > +             */
> > +            if ( x2apic_enabled )
> > +                new.ndst = cpu_physical_id(v->pre_pcpu);
> > +            else
> > +                new.ndst = MASK_INSR(cpu_physical_id(v->pre_pcpu),
> > +                                     PI_xAPIC_NDST_MASK);
> > +            new.sn = 0;
> > +            new.nv = pi_wakeup_vector;
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> > +        break;
> > +
> > +    case RUNSTATE_running:
> > +        ASSERT( pi_desc->sn == 1 );
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +            if ( x2apic_enabled )
> > +                new.ndst = cpu_physical_id(v->processor);
> > +            else
> > +                new.ndst = (cpu_physical_id(v->processor) << 8) &
> 0xFF00;
> 
> Why are you not using MASK_INSR() here like you do above?
> 
> > +
> > +            new.sn = 0;
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> 
> Same here - can't this be a write_atomic() of ndst and a clear_bit()
> of sn instead of a loop over cmpxchg()?

Your suggestion is good!

> 
> > +        break;
> > +
> > +    default:
> > +        break;
> 
> This seems dangerous: I think you want at least an
> ASSERT_UNREACHABLE() here.
> 
> > @@ -1842,7 +1967,12 @@ const struct hvm_function_table * __init
> start_vmx(void)
> >          alloc_direct_apic_vector(&posted_intr_vector,
> pi_notification_interrupt);
> >
> >          if ( iommu_intpost )
> > +        {
> >              alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> > +            vmx_function_table.pi_desc_update = vmx_pi_desc_update;
> > +        }
> > +        else
> > +            vmx_function_table.pi_desc_update = NULL;
> 
> Isn't that field NULL anyway?
> 
> > @@ -157,7 +158,11 @@ static inline void vcpu_runstate_change(
> >          v->runstate.state_entry_time = new_entry_time;
> >      }
> >
> > +    old_state = v->runstate.state;
> >      v->runstate.state = new_state;
> > +
> > +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> > +        hvm_funcs.pi_desc_update(v, old_state);
> 
> I don't see how this would build on ARM.

So what about adding " #ifdef CONFIG_X86 ..." here?

Thanks,
Feng

> 
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -142,6 +142,8 @@ struct vcpu
> >
> >      int              processor;
> >
> > +    int              pre_pcpu;
> 
> This again doesn't belong into the common structure (and should
> be accompanied by a comment, and should be unsigned int).
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.