[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling



On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
> 
>>> -----Original Message-----
>>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> 
>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>> in debug builds?
>>
>> Does the following information in Config.mk mean it is a debug build?
>>
>> # A debug build of Xen and tools?
>> debug ?= y
>> debug_symbols ?= $(debug)
>>
> I think so. But as I said in my other email, I was wrong, and this is
> probably not an issue.
> 
>>> And in either case (just tossing out ideas) would it be
>>> possible to deal with the "interrupt already raised when blocking" case:
>>
>> Thanks for the suggestions below!
>>
> :-)
> 
>>>  - later in the context switching function ?
>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() instead
>> of calling vcpu_unblock() directly, then when it returns to context_switch(),
>> we can check the flag and don't really block the vCPU. 
>>
> Yeah, and that would still be rather hard to understand and maintain,
> IMO.
> 
>> But I don't have a clear
>> picture about how to archive this, here are some questions from me:
>> - When we are in context_switch(), we have done the following changes to
>> vcpu's state:
>>      * sd->curr is set to next
>>      * vCPU's running state (both prev and next ) is changed by
>>        vcpu_runstate_change()
>>      * next->is_running is set to 1
>>      * periodic timer for prev is stopped
>>      * periodic timer for next is setup
>>      ......
>>
>> So what point should we perform the action to _unblock_ the vCPU? We
>> Need to roll back the formal changes to the vCPU's state, right?
>>
> Mmm... not really. Not blocking prev does not mean that prev would be
> kept running on the pCPU, and that's true for your current solution as
> well! As you say yourself, you're already in the process of switching
> between prev and next, at a point where it's already a thing that next
> will be the vCPU that will run. Not blocking means that prev is
> reinserted to the runqueue, and a new invocation to the scheduler is
> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> __runq_tickle()), but it's only when such new scheduling happens that
> prev will (potentially) be selected to run again.
> 
> So, no, unless I'm fully missing your point, there wouldn't be no
> rollback required. However, I still would like the other solution (doing
> stuff in vcpu_block()) better (see below).
> 
>>>  - with another hook, perhaps in vcpu_block() ?
>>
>> We could check this in vcpu_block(), however, the logic here is that before
>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>> issues a notification event because interrupts from the assigned devices
>> is coming, we don't need to block the vCPU and hence no need to update
>> the PI descriptor in this case. 
>>
> Yep, I saw that. But could it be possible to do *everything* related to
> blocking, including the update of the descriptor, in vcpu_block(), if no
> interrupt have been raised yet at that time? I mean, would you, if
> updating the descriptor in there, still get the event that allows you to
> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> the blocking, no matter whether that resulted in an actual context
> switch already or not?
> 
> I appreciate that this narrows the window for such an event to happen by
> quite a bit, making the logic itself a little less useful (it makes
> things more similar to "regular" blocking vs. event delivery, though,
> AFAICT), but if it's correct, ad if it allows us to save the ugly
> invocation of vcpu_unblock from context switch context, I'd give it a
> try.
> 
> After all, this PI thing requires actions to be taken when a vCPU is
> scheduled or descheduled because of blocking, unblocking and
> preemptions, and it would seem natural to me to:
>  - deal with blockings in vcpu_block()
>  - deal with unblockings in vcpu_wake()
>  - deal with preemptions in context_switch()
> 
> This does not mean being able to consolidate some of the cases (like
> blockings and preemptions, in the current version of the code) were not
> a nice thing... But we don't want it at all costs . :-)

So just to clarify the situation...

If a vcpu configured for the "running" state (i.e., NV set to
"posted_intr_vector", notifications enabled), and an interrupt happens
in the hypervisor -- what happens?

Is it the case that the interrupt is not actually delivered to the
processor, but that the pending bit will be set in the pi field, so that
the interrupt will be delivered the next time the hypervisor returns
into the guest?

(I am assuming that is the case, because if the hypervisor *does* get an
interrupt, then it can just unblock it there.)

This sort of race condition -- where we get an interrupt to wake up a
vcpu as we're blocking -- is already handled for "old-style" interrupts
in vcpu_block:

void vcpu_block(void)
{
    struct vcpu *v = current;

    set_bit(_VPF_blocked, &v->pause_flags);

    /* Check for events /after/ blocking: avoids wakeup waiting race. */
    if ( local_events_need_delivery() )
    {
        clear_bit(_VPF_blocked, &v->pause_flags);
    }
    else
    {
        TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
        raise_softirq(SCHEDULE_SOFTIRQ);
    }
}

That is, we set _VPF_blocked, so that any interrupt which would wake it
up actually wakes it up, and then we check local_events_need_delivery()
to see if there were any that came in after we decided to block but
before we made sure that an interrupt would wake us up.

I think it would be best if we could keep all the logic that does the
same thing in the same place.  Which would mean in vcpu_block(), after
calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
then extending local_events_need_delivery() to also look for pending PI
events.

Looking a bit more at your states, I think the actions that need to be
taken on all the transitions are these (collapsing 'runnable' and
'offline' into the same state):

blocked -> runnable (vcpu_wake)
 - NV = posted_intr_vector
 - Take vcpu off blocked list
 - SN = 1
runnable -> running (context_switch)
 - SN = 0
running -> runnable (context_switch)
 - SN = 1
running -> blocked (vcpu_block)
 - NV = pi_wakeup_vector
 - Add vcpu to blocked list

This actually has a few pretty nice properties:
1. You have a nice pair of complementary actions -- block / wake, run /
preempt
2. The potentially long actions with lists happen in vcpu_wake and
vcpu_block, not on the context switch path

And at this point, you don't have the "lazy context switch" issue
anymore, do we?  Because we've handled the "blocking" case in
vcpu_block(), we don't need to do anything in the main context_switch()
(which may do the lazy context switching into idle).  We only need to do
something in the actual __context_switch().

And at that point, could we actually get rid of the PI-specific context
switch hooks altogether, and just put the SN state changes required for
running->runnable and runnable->running in vmx_ctxt_switch_from() and
vmx_ctxt_switch_to()?

If so, then the only hooks we need to add are vcpu_block and vcpu_wake.
 To keep these consistent with other scheduling-related functions, I
would put these in arch_vcpu, next to ctxt_switch_from() and
ctxt_switch_to().

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.