[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts



>>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote:
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Monday, September 07, 2015 8:55 PM
>> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote:
>> > @@ -1605,9 +1621,12 @@ void context_switch(struct vcpu *prev, struct
>> vcpu *next)
>> >
>> >      set_current(next);
>> >
>> > +    pi_ctxt_switch_from(prev);
>> > +
>> >      if ( (per_cpu(curr_vcpu, cpu) == next) ||
>> >           (is_idle_domain(nextd) && cpu_online(cpu)) )
>> >      {
>> > +        pi_ctxt_switch_to(next);
>> >          local_irq_enable();
>> 
>> This placement, if really intended that way, needs explanation (in a
>> comment) and perhaps even renaming of the involved symbols, as
> 
> I think maybe removing the static wrapper function can make thing
> clearer. What is your opinion?

Considering that there's going to be a second use of the
switch-to variant, I think the helpers are okay to be kept (but
I wouldn't insist on that), just that they need to be named in
a away making clear what their purpose is.

>> > @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>> >      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
>> >
>> > +    v->arch.hvm_vmx.pi_block_cpu = -1;
>> > +
>> > +    spin_lock_init(&v->arch.hvm_vmx.pi_lock);
>> > +
>> >      v->arch.schedule_tail    = vmx_do_resume;
>> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>> >
>> > +    if ( iommu_intpost && is_hvm_vcpu(v) )
>> > +    {
>> > +        v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
>> > +        v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
>> > +    }
>> 
>> Why conditional upon is_hvm_vcpu()?
> 
> I only think we need to add these hooks for PV guests, right?

"... we don't need to ..."?

> Or you mean I should use has_hvm_container_vcpu() instead?

Exactly.

>> > +
>> > +        /*
>> > +         * Delete the vCPU from the related block list
>> > +         * if we are resuming from blocked state.
>> > +         */
>> > +        if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
>> > +        {
>> > +            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
>> > +                              v->arch.hvm_vmx.pi_block_cpu),
>> flags);
>> > +            list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>> 
>> Shouldn't you set .pi_block_cpu back to -1 along with removing
>> the vCPU from the list? Which then ought to allow using just
>> list_del() here?
> 
> Here is the story about using list_del_init() instead of list_del(), (the
> same reason for using list_del_init() in pi_wakeup_interrupt() ).
> If we use list_del(), that means we need to set .pi_block_cpu to
> -1 after removing the vCPU from the block list, there are two
> places where the vCPU is removed from the list:
> - arch_vcpu_wake_prepare()
> - pi_wakeup_interrupt()
> 
> That means we also need to set .pi_block_cpu to -1 in
> pi_wakeup_interrupt(), which seems problematic to me, since
> .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> to change it in pi_wakeup_interrupt(), maybe someone is using
> it at the same time.
> 
> And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> is only used when .pi_block_cpu is set to -1 at the initial time.
> 
> If you have any good suggestions about this, I will be all ears.

Latching pi_block_cpu into a local variable would take care of that
part of the problem. Of course you then may also need to check
that while waiting to acquire the lock pi_block_cpu didn't change.
And if that absolutely can't be made work correctly, these
apparently strange uses of list_del_init() would each need a
justifying comment.

>> > +        do {
>> > +            old.control = new.control = pi_desc->control;
>> > +
>> > +            /* Should not block the vCPU if an interrupt was posted for 
>> > it.
>> */
>> > +            if ( pi_test_on(&old) )
>> > +            {
>> > +                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock,
>> gflags);
>> > +                vcpu_unblock(v);
>> 
>> Calling vcpu_unblock() in the middle of context_switch()? Why? And
>> is this safe? 
> 
> I cannot see anything unsafe so far, can some scheduler maintainer
> help to confirm it? Dario? George?

But first and foremost you ought to answer the "why".

>> And if really needed to cover something not dealt with
>> elsewhere, wouldn't this need to happen _after_ having switched
>> the notification mechanism below?
> 
> If ON is set, we don't need to block the vCPU hence no need to change the
> notification vector to the wakeup one, which is used when vCPU is blocked.
> 
>> 
>> > +                return;

Oh, right, I somehow managed to ignore the "return" here.

>> > +static void vmx_post_ctx_switch_pi(struct vcpu *v)
>> > +{
>> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> > +
>> > +    if ( !has_arch_pdevs(v->domain) )
>> > +        return;
>> > +
>> > +    if ( x2apic_enabled )
>> > +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
>> > +    else
>> > +        write_atomic(&pi_desc->ndst,
>> > +                     MASK_INSR(cpu_physical_id(v->processor),
>> > +                     PI_xAPIC_NDST_MASK));
>> > +
>> > +    pi_clear_sn(pi_desc);
>> > +}
>> 
>> So you alter where notifications go, but not via which vector? How
>> is the vCPU going to get removed from the blocked list then?
> 
> vmx_post_ctx_switch_pi() is called when the vCPU is about to run,
> in that case, the vector has been set properly and it has been removed
> from the block list if it was blocked before.

So why do you two updates then (first [elsewhere] to set the vector
and then here to set the destination)?

>> > @@ -756,6 +902,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>> >
>> >      vmx_restore_guest_msrs(v);
>> >      vmx_restore_dr(v);
>> > +    vmx_post_ctx_switch_pi(v);
>> >  }
>> 
>> Ah, here is the other invocation! But no, this shouldn't be done this
>> way. This really belongs into vendor independent code, even if the
>> indirect call overhead is slightly higher.
> 
> Why do you say this is vendor independent? What is your suggestion?

This needs to become a second invocation of pi_ctxt_switch_to()
from the context switch code.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.