[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling




> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
> Sent: Monday, September 21, 2015 5:19 PM
> To: Wu, Feng; Dario Faggioli
> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core 
> logic
> handling
> 
> On 09/21/2015 06:08 AM, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx]
> >> Sent: Thursday, September 17, 2015 5:38 PM
> >> To: Dario Faggioli; Wu, Feng
> >> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap;
> Andrew
> >> Cooper; Jan Beulich
> >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core
> logic
> >> handling
> >>
> >> On 09/17/2015 09:48 AM, Dario Faggioli wrote:
> >>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx]
> >>>
> >>>>> So, I guess, first of all, can you confirm whether or not it's exploding
> >>>>> in debug builds?
> >>>>
> >>>> Does the following information in Config.mk mean it is a debug build?
> >>>>
> >>>> # A debug build of Xen and tools?
> >>>> debug ?= y
> >>>> debug_symbols ?= $(debug)
> >>>>
> >>> I think so. But as I said in my other email, I was wrong, and this is
> >>> probably not an issue.
> >>>
> >>>>> And in either case (just tossing out ideas) would it be
> >>>>> possible to deal with the "interrupt already raised when blocking" case:
> >>>>
> >>>> Thanks for the suggestions below!
> >>>>
> >>> :-)
> >>>
> >>>>>  - later in the context switching function ?
> >>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi()
> instead
> >>>> of calling vcpu_unblock() directly, then when it returns to
> context_switch(),
> >>>> we can check the flag and don't really block the vCPU.
> >>>>
> >>> Yeah, and that would still be rather hard to understand and maintain,
> >>> IMO.
> >>>
> >>>> But I don't have a clear
> >>>> picture about how to archive this, here are some questions from me:
> >>>> - When we are in context_switch(), we have done the following changes to
> >>>> vcpu's state:
> >>>>  * sd->curr is set to next
> >>>>  * vCPU's running state (both prev and next ) is changed by
> >>>>    vcpu_runstate_change()
> >>>>  * next->is_running is set to 1
> >>>>  * periodic timer for prev is stopped
> >>>>  * periodic timer for next is setup
> >>>>  ......
> >>>>
> >>>> So what point should we perform the action to _unblock_ the vCPU? We
> >>>> Need to roll back the formal changes to the vCPU's state, right?
> >>>>
> >>> Mmm... not really. Not blocking prev does not mean that prev would be
> >>> kept running on the pCPU, and that's true for your current solution as
> >>> well! As you say yourself, you're already in the process of switching
> >>> between prev and next, at a point where it's already a thing that next
> >>> will be the vCPU that will run. Not blocking means that prev is
> >>> reinserted to the runqueue, and a new invocation to the scheduler is
> >>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in
> >>> __runq_tickle()), but it's only when such new scheduling happens that
> >>> prev will (potentially) be selected to run again.
> >>>
> >>> So, no, unless I'm fully missing your point, there wouldn't be no
> >>> rollback required. However, I still would like the other solution (doing
> >>> stuff in vcpu_block()) better (see below).
> >>>
> >>>>>  - with another hook, perhaps in vcpu_block() ?
> >>>>
> >>>> We could check this in vcpu_block(), however, the logic here is that 
> >>>> before
> >>>> vCPU is blocked, we need to change the posted-interrupt descriptor,
> >>>> and during changing it, if 'ON' bit is set, which means VT-d hardware
> >>>> issues a notification event because interrupts from the assigned devices
> >>>> is coming, we don't need to block the vCPU and hence no need to update
> >>>> the PI descriptor in this case.
> >>>>
> >>> Yep, I saw that. But could it be possible to do *everything* related to
> >>> blocking, including the update of the descriptor, in vcpu_block(), if no
> >>> interrupt have been raised yet at that time? I mean, would you, if
> >>> updating the descriptor in there, still get the event that allows you to
> >>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo
> >>> the blocking, no matter whether that resulted in an actual context
> >>> switch already or not?
> >>>
> >>> I appreciate that this narrows the window for such an event to happen by
> >>> quite a bit, making the logic itself a little less useful (it makes
> >>> things more similar to "regular" blocking vs. event delivery, though,
> >>> AFAICT), but if it's correct, ad if it allows us to save the ugly
> >>> invocation of vcpu_unblock from context switch context, I'd give it a
> >>> try.
> >>>
> >>> After all, this PI thing requires actions to be taken when a vCPU is
> >>> scheduled or descheduled because of blocking, unblocking and
> >>> preemptions, and it would seem natural to me to:
> >>>  - deal with blockings in vcpu_block()
> >>>  - deal with unblockings in vcpu_wake()
> >>>  - deal with preemptions in context_switch()
> >>>
> >>> This does not mean being able to consolidate some of the cases (like
> >>> blockings and preemptions, in the current version of the code) were not
> >>> a nice thing... But we don't want it at all costs . :-)
> >>
> >> So just to clarify the situation...
> >>
> >> If a vcpu configured for the "running" state (i.e., NV set to
> >> "posted_intr_vector", notifications enabled), and an interrupt happens
> >> in the hypervisor -- what happens?
> >>
> >> Is it the case that the interrupt is not actually delivered to the
> >> processor, but that the pending bit will be set in the pi field, so that
> >> the interrupt will be delivered the next time the hypervisor returns
> >> into the guest?
> >>
> >> (I am assuming that is the case, because if the hypervisor *does* get an
> >> interrupt, then it can just unblock it there.)
> >>
> >> This sort of race condition -- where we get an interrupt to wake up a
> >> vcpu as we're blocking -- is already handled for "old-style" interrupts
> >> in vcpu_block:
> >>
> >> void vcpu_block(void)
> >> {
> >>     struct vcpu *v = current;
> >>
> >>     set_bit(_VPF_blocked, &v->pause_flags);
> >>
> >>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
> >>     if ( local_events_need_delivery() )
> >>     {
> >>         clear_bit(_VPF_blocked, &v->pause_flags);
> >>     }
> >>     else
> >>     {
> >>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id,
> v->vcpu_id);
> >>         raise_softirq(SCHEDULE_SOFTIRQ);
> >>     }
> >> }
> >>
> >> That is, we set _VPF_blocked, so that any interrupt which would wake it
> >> up actually wakes it up, and then we check local_events_need_delivery()
> >> to see if there were any that came in after we decided to block but
> >> before we made sure that an interrupt would wake us up.
> >>
> >> I think it would be best if we could keep all the logic that does the
> >> same thing in the same place.  Which would mean in vcpu_block(), after
> >> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and
> >> then extending local_events_need_delivery() to also look for pending PI
> >> events.
> >>
> >> Looking a bit more at your states, I think the actions that need to be
> >> taken on all the transitions are these (collapsing 'runnable' and
> >> 'offline' into the same state):
> >>
> >> blocked -> runnable (vcpu_wake)
> >>  - NV = posted_intr_vector
> >>  - Take vcpu off blocked list
> >>  - SN = 1
> >> runnable -> running (context_switch)
> >>  - SN = 0
> >
> > Need set the 'NDST' field to the right dest vCPU as well.
> >
> >> running -> runnable (context_switch)
> >>  - SN = 1
> >> running -> blocked (vcpu_block)
> >>  - NV = pi_wakeup_vector
> >>  - Add vcpu to blocked list
> >
> > Need set the 'NDST' field to the pCPU which owns the blocking list,
> > So we can wake up the vCPU from the right blocking list in the wakeup
> > event handler.
> >
> >>
> >> This actually has a few pretty nice properties:
> >> 1. You have a nice pair of complementary actions -- block / wake, run /
> >> preempt
> >> 2. The potentially long actions with lists happen in vcpu_wake and
> >> vcpu_block, not on the context switch path
> >>
> >> And at this point, you don't have the "lazy context switch" issue
> >> anymore, do we?  Because we've handled the "blocking" case in
> >> vcpu_block(), we don't need to do anything in the main context_switch()
> >> (which may do the lazy context switching into idle).  We only need to do
> >> something in the actual __context_switch().
> >
> > I think the handling for lazy context switch is not only for the blocking 
> > case,
> > we still need to do something for lazy context switch even we handled the
> > blocking case in vcpu_block(), such as,
> > 1. For non-idle -> idle
> > - set 'SN'
> 
> If we set SN in vcpu_block(), then we don't need to set it on context
> switch -- æäæï

For preemption case (not blocking case) , we still need to clear/set SN, and
this has no business with vcpu_block()/vcpu_wake(), right? Do I miss something
here? BTW, you Chinese is good! :)

> 
> > 2. For idle -> non-idle
> > - clear 'SN'
> > - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe
> > this one doesn't belong to lazy context switch, if the cpu of the non-idle
> > vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(),
> > hence it will go to __context_switch() directly, right?)
> 
> If we clear SN* in vcpu_wake(), then we don't need to clear it on a
> context switch. 

The same as above.

> And the only way we can transition from "idle lazy
> context switch" to "runnable" is if the vcpu was the last vcpu running
> on this pcpu -- in which case, NDST should already be set to this pcpu,
> right?

Yes, like I mentioned above, in this case, the NDST filed is not changed, so
we don't need to update it for lazy idle vcpu to "runnable" transition.

Thanks,
Feng

> 
>  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.