Xen project Mailing List

Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling

On 09/21/2015 06:08 AM, Wu, Feng wrote: > > >> -----Original Message----- >> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] >> Sent: Thursday, September 17, 2015 5:38 PM >> To: Dario Faggioli; Wu, Feng >> Cc: xen-devel@xxxxxxxxxxxxx; Tian, Kevin; Keir Fraser; George Dunlap; Andrew >> Cooper; Jan Beulich >> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core >> logic >> handling >> >> On 09/17/2015 09:48 AM, Dario Faggioli wrote: >>> On Thu, 2015-09-17 at 08:00 +0000, Wu, Feng wrote: >>> >>>>> -----Original Message----- >>>>> From: Dario Faggioli [mailto:dario.faggioli@xxxxxxxxxx] >>> >>>>> So, I guess, first of all, can you confirm whether or not it's exploding >>>>> in debug builds? >>>> >>>> Does the following information in Config.mk mean it is a debug build? >>>> >>>> # A debug build of Xen and tools? >>>> debug ?= y >>>> debug_symbols ?= $(debug) >>>> >>> I think so. But as I said in my other email, I was wrong, and this is >>> probably not an issue. >>> >>>>> And in either case (just tossing out ideas) would it be >>>>> possible to deal with the "interrupt already raised when blocking" case: >>>> >>>> Thanks for the suggestions below! >>>> >>> :-) >>> >>>>> - later in the context switching function ? >>>> In this case, we might need to set a flag in vmx_pre_ctx_switch_pi() >>>> instead >>>> of calling vcpu_unblock() directly, then when it returns to >>>> context_switch(), >>>> we can check the flag and don't really block the vCPU. >>>> >>> Yeah, and that would still be rather hard to understand and maintain, >>> IMO. >>> >>>> But I don't have a clear >>>> picture about how to archive this, here are some questions from me: >>>> - When we are in context_switch(), we have done the following changes to >>>> vcpu's state: >>>> * sd->curr is set to next >>>> * vCPU's running state (both prev and next ) is changed by >>>> vcpu_runstate_change() >>>> * next->is_running is set to 1 >>>> * periodic timer for prev is stopped >>>> * periodic timer for next is setup >>>> ...... >>>> >>>> So what point should we perform the action to _unblock_ the vCPU? We >>>> Need to roll back the formal changes to the vCPU's state, right? >>>> >>> Mmm... not really. Not blocking prev does not mean that prev would be >>> kept running on the pCPU, and that's true for your current solution as >>> well! As you say yourself, you're already in the process of switching >>> between prev and next, at a point where it's already a thing that next >>> will be the vCPU that will run. Not blocking means that prev is >>> reinserted to the runqueue, and a new invocation to the scheduler is >>> (potentially) queued as well (via raising SCHEDULE_SOFTIRQ, in >>> __runq_tickle()), but it's only when such new scheduling happens that >>> prev will (potentially) be selected to run again. >>> >>> So, no, unless I'm fully missing your point, there wouldn't be no >>> rollback required. However, I still would like the other solution (doing >>> stuff in vcpu_block()) better (see below). >>> >>>>> - with another hook, perhaps in vcpu_block() ? >>>> >>>> We could check this in vcpu_block(), however, the logic here is that before >>>> vCPU is blocked, we need to change the posted-interrupt descriptor, >>>> and during changing it, if 'ON' bit is set, which means VT-d hardware >>>> issues a notification event because interrupts from the assigned devices >>>> is coming, we don't need to block the vCPU and hence no need to update >>>> the PI descriptor in this case. >>>> >>> Yep, I saw that. But could it be possible to do *everything* related to >>> blocking, including the update of the descriptor, in vcpu_block(), if no >>> interrupt have been raised yet at that time? I mean, would you, if >>> updating the descriptor in there, still get the event that allows you to >>> call vcpu_wake(), and hence vmx_vcpu_wake_prepare(), which would undo >>> the blocking, no matter whether that resulted in an actual context >>> switch already or not? >>> >>> I appreciate that this narrows the window for such an event to happen by >>> quite a bit, making the logic itself a little less useful (it makes >>> things more similar to "regular" blocking vs. event delivery, though, >>> AFAICT), but if it's correct, ad if it allows us to save the ugly >>> invocation of vcpu_unblock from context switch context, I'd give it a >>> try. >>> >>> After all, this PI thing requires actions to be taken when a vCPU is >>> scheduled or descheduled because of blocking, unblocking and >>> preemptions, and it would seem natural to me to: >>> - deal with blockings in vcpu_block() >>> - deal with unblockings in vcpu_wake() >>> - deal with preemptions in context_switch() >>> >>> This does not mean being able to consolidate some of the cases (like >>> blockings and preemptions, in the current version of the code) were not >>> a nice thing... But we don't want it at all costs . :-) >> >> So just to clarify the situation... >> >> If a vcpu configured for the "running" state (i.e., NV set to >> "posted_intr_vector", notifications enabled), and an interrupt happens >> in the hypervisor -- what happens? >> >> Is it the case that the interrupt is not actually delivered to the >> processor, but that the pending bit will be set in the pi field, so that >> the interrupt will be delivered the next time the hypervisor returns >> into the guest? >> >> (I am assuming that is the case, because if the hypervisor *does* get an >> interrupt, then it can just unblock it there.) >> >> This sort of race condition -- where we get an interrupt to wake up a >> vcpu as we're blocking -- is already handled for "old-style" interrupts >> in vcpu_block: >> >> void vcpu_block(void) >> { >> struct vcpu *v = current; >> >> set_bit(_VPF_blocked, &v->pause_flags); >> >> /* Check for events /after/ blocking: avoids wakeup waiting race. */ >> if ( local_events_need_delivery() ) >> { >> clear_bit(_VPF_blocked, &v->pause_flags); >> } >> else >> { >> TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id); >> raise_softirq(SCHEDULE_SOFTIRQ); >> } >> } >> >> That is, we set _VPF_blocked, so that any interrupt which would wake it >> up actually wakes it up, and then we check local_events_need_delivery() >> to see if there were any that came in after we decided to block but >> before we made sure that an interrupt would wake us up. >> >> I think it would be best if we could keep all the logic that does the >> same thing in the same place. Which would mean in vcpu_block(), after >> calling set_bit(_VPF_blocked), changing the NV to pi_wakeup_vector, and >> then extending local_events_need_delivery() to also look for pending PI >> events. >> >> Looking a bit more at your states, I think the actions that need to be >> taken on all the transitions are these (collapsing 'runnable' and >> 'offline' into the same state): >> >> blocked -> runnable (vcpu_wake) >> - NV = posted_intr_vector >> - Take vcpu off blocked list >> - SN = 1 >> runnable -> running (context_switch) >> - SN = 0 > > Need set the 'NDST' field to the right dest vCPU as well. > >> running -> runnable (context_switch) >> - SN = 1 >> running -> blocked (vcpu_block) >> - NV = pi_wakeup_vector >> - Add vcpu to blocked list > > Need set the 'NDST' field to the pCPU which owns the blocking list, > So we can wake up the vCPU from the right blocking list in the wakeup > event handler. > >> >> This actually has a few pretty nice properties: >> 1. You have a nice pair of complementary actions -- block / wake, run / >> preempt >> 2. The potentially long actions with lists happen in vcpu_wake and >> vcpu_block, not on the context switch path >> >> And at this point, you don't have the "lazy context switch" issue >> anymore, do we? Because we've handled the "blocking" case in >> vcpu_block(), we don't need to do anything in the main context_switch() >> (which may do the lazy context switching into idle). We only need to do >> something in the actual __context_switch(). > > I think the handling for lazy context switch is not only for the blocking > case, > we still need to do something for lazy context switch even we handled the > blocking case in vcpu_block(), such as, > 1. For non-idle -> idle > - set 'SN' If we set SN in vcpu_block(), then we don't need to set it on context switch -- æäæï > 2. For idle -> non-idle > - clear 'SN' > - set 'NDST' filed to the right cpu the vCPU is going to running on. (Maybe > this one doesn't belong to lazy context switch, if the cpu of the non-idle > vCPU was changed, then per_cpu(curr_vcpu, cpu) != next in context_switch(), > hence it will go to __context_switch() directly, right?) If we clear SN* in vcpu_wake(), then we don't need to clear it on a context switch. And the only way we can transition from "idle lazy context switch" to "runnable" is if the vcpu was the last vcpu running on this pcpu -- in which case, NDST should already be set to this pcpu, right? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.