[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling



On 09/21/2015 06:08 AM, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
>> Sent: Thursday, September 17, 2015 5:38 PM
>> To: Dario Faggioli; Wu, Feng
>> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
>> Cooper; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
>> logic
>> handling
>>
>> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
>>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
>>>
>>>>> So, I guess, first of all, can you confirm whether or not it's exploding
>>>>> in debug builds?
>>>>
>>>> Does the following information in Config.mk mean it is a debug build?
>>>>
>>>> # A debug build of Xen and tools?
>>>> debug ?= y
>>>> debug_symbols ?= $(debug)
>>>>
>>> I think so. But as I said in my other email, I was wrong, and this is
>>> probably not an issue.
>>>
>>>>> And in either case (just tossing out ideas) would it be
>>>>> possible to deal with the "interrupt already raised when blocking" case:
>>>>
>>>> Thanks for the suggestions below!
>>>>
>>> :-)
>>>
>>>>>  - later in the context switching function ?
>>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() 
>>>> instead
>>>> of calling vcpu_unblock() directly, then when it returns to 
>>>> context_switch(),
>>>> we can check the flag and don't really block the vCPU.
>>>>
>>> Yeah, and that would still be rather hard to understand and maintain,
>>> IMO.
>>>
>>>> But I don't have a clear
>>>> picture about how to archive this, here are some questions from me:
>>>> - When we are in context_switch(), we have done the following changes to
>>>> vcpu's state:
>>>>    * sd->curr is set to next
>>>>    * vCPU's running state (both prev and next ) is changed by
>>>>      vcpu_runstate_change()
>>>>    * next->is_running is set to 1
>>>>    * periodic timer for prev is stopped
>>>>    * periodic timer for next is setup
>>>>    ......
>>>>
>>>> So what point should we perform the action to _unblock_ the vCPU? We
>>>> Need to roll back the formal changes to the vCPU's state, right?
>>>>
>>> Mmm... not really. Not blocking prev does not mean that prev would be
>>> kept running on the pCPU, and that's true for your current solution as
>>> well! As you say yourself, you're already in the process of switching
>>> between prev and next, at a point where it's already a thing that next
>>> will be the vCPU that will run. Not blocking means that prev is
>>> reinserted to the runqueue, and a new invocation to the scheduler is
>>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
>>> __runq_tickle()), but it's only when such new scheduling happens that
>>> prev will (potentially) be selected to run again.
>>>
>>> So, no, unless I'm fully missing your point, there wouldn't be no
>>> rollback required. However, I still would like the other solution (doing
>>> stuff in vcpu_block()) better (see below).
>>>
>>>>>  - with another hook, perhaps in vcpu_block() ?
>>>>
>>>> We could check this in vcpu_block(), however, the logic here is that before
>>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
>>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
>>>> issues a notification event because interrupts from the assigned devices
>>>> is coming, we don't need to block the vCPU and hence no need to update
>>>> the PI descriptor in this case.
>>>>
>>> Yep, I saw that. But could it be possible to do *everything* related to
>>> blocking, including the update of the descriptor, in vcpu_block(), if no
>>> interrupt have been raised yet at that time? I mean, would you, if
>>> updating the descriptor in there, still get the event that allows you to
>>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
>>> the blocking, no matter whether that resulted in an actual context
>>> switch already or not?
>>>
>>> I appreciate that this narrows the window for such an event to happen by
>>> quite a bit, making the logic itself a little less useful (it makes
>>> things more similar to "regular" blocking vs. event delivery, though,
>>> AFAICT), but if it's correct, ad if it allows us to save the ugly
>>> invocation of vcpu_unblock from context switch context, I'd give it a
>>> try.
>>>
>>> After all, this PI thing requires actions to be taken when a vCPU is
>>> scheduled or descheduled because of blocking, unblocking and
>>> preemptions, and it would seem natural to me to:
>>>  - deal with blockings in vcpu_block()
>>>  - deal with unblockings in vcpu_wake()
>>>  - deal with preemptions in context_switch()
>>>
>>> This does not mean being able to consolidate some of the cases (like
>>> blockings and preemptions, in the current version of the code) were not
>>> a nice thing... But we don't want it at all costs . :-)
>>
>> So just to clarify the situation...
>>
>> If a vcpu configured for the "running" state (i.e., NV set to
>> "posted_intr_vector", notifications enabled), and an interrupt happens
>> in the hypervisor -- what happens?
>>
>> Is it the case that the interrupt is not actually delivered to the
>> processor, but that the pending bit will be set in the pi field, so that
>> the interrupt will be delivered the next time the hypervisor returns
>> into the guest?
>>
>> (I am assuming that is the case, because if the hypervisor *does* get an
>> interrupt, then it can just unblock it there.)
>>
>> This sort of race condition -- where we get an interrupt to wake up a
>> vcpu as we're blocking -- is already handled for "old-style" interrupts
>> in vcpu_block:
>>
>> void vcpu_block(void)
>> {
>>     struct vcpu *v = current;
>>
>>     set_bit(_VPF_blocked, &v->pause_flags);
>>
>>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>>     if ( local_events_need_delivery() )
>>     {
>>         clear_bit(_VPF_blocked, &v->pause_flags);
>>     }
>>     else
>>     {
>>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>>         raise_softirq(SCHEDULE_SOFTIRQ);
>>     }
>> }
>>
>> That is, we set _VPF_blocked, so that any interrupt which would wake it
>> up actually wakes it up, and then we check local_events_need_delivery()
>> to see if there were any that came in after we decided to block but
>> before we made sure that an interrupt would wake us up.
>>
>> I think it would be best if we could keep all the logic that does the
>> same thing in the same place.  Which would mean in vcpu_block(), after
>> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
>> then extending local_events_need_delivery() to also look for pending PI
>> events.
>>
>> Looking a bit more at your states, I think the actions that need to be
>> taken on all the transitions are these (collapsing 'runnable' and
>> 'offline' into the same state):
>>
>> blocked -> runnable (vcpu_wake)
>>  - NV = posted_intr_vector
>>  - Take vcpu off blocked list
>>  - SN = 1
>> runnable -> running (context_switch)
>>  - SN = 0
> 
> Need set the 'NDST' field to the right dest vCPU as well.
> 
>> running -> runnable (context_switch)
>>  - SN = 1
>> running -> blocked (vcpu_block)
>>  - NV = pi_wakeup_vector
>>  - Add vcpu to blocked list
> 
> Need set the 'NDST' field to the pCPU which owns the blocking list,
> So we can wake up the vCPU from the right blocking list in the wakeup
> event handler.
> 
>>
>> This actually has a few pretty nice properties:
>> 1. You have a nice pair of complementary actions -- block / wake, run /
>> preempt
>> 2. The potentially long actions with lists happen in vcpu_wake and
>> vcpu_block, not on the context switch path
>>
>> And at this point, you don't have the "lazy context switch" issue
>> anymore, do we?  Because we've handled the "blocking" case in
>> vcpu_block(), we don't need to do anything in the main context_switch()
>> (which may do the lazy context switching into idle).  We only need to do
>> something in the actual __context_switch().
> 
> I think the handling for lazy context switch is not only for the blocking 
> case,
> we still need to do something for lazy context switch even we handled the
> blocking case in vcpu_block(), such as,
> 1. For non-idle -> idle
> - set 'SN'

If we set SN in vcpu_block(), then we don't need to set it on context
switch -- æäæï

> 2. For idle -> non-idle
> - clear 'SN'
> - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe
> this one doesn't belong to lazy context switch, if the cpu of the non-idle
> vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(),
> hence it will go to __context_switch() directly, right?)

If we clear SN* in vcpu_wake(), then we don't need to clear it on a
context switch.  And the only way we can transition from "idle lazy
context switch" to "runnable" is if the vcpu was the last vcpu running
on this pcpu -- in which case, NDST should already be set to this pcpu,
right?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.