Xen project Mailing List

Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

Date: Wed, 20 Jan 2016 11:20:35 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>

Delivery-date: Wed, 20 Jan 2016 11:20:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHRUgLrDSmjehpflEaKY+UPZo7xw58Dr8ZA///hGgCAAInb4A==

Thread-topic: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling

> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Sent: Wednesday, January 20, 2016 4:35 PM > To: Wu, Feng <feng.wu@xxxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario Faggioli > <dario.faggioli@xxxxxxxxxx>; George Dunlap <george.dunlap@xxxxxxxxxxxxx>; > Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-devel@xxxxxxxxxxxxx; Keir Fraser > <keir@xxxxxxx> > Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling > > >>> On 20.01.16 at 08:49, <feng.wu@xxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> Sent: Monday, January 18, 2016 11:14 PM > >> >>> On 03.12.15 at 09:35, <feng.wu@xxxxxxxxx> wrote: > >> > --- a/xen/arch/x86/hvm/vmx/vmx.c > >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c > >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int > >> msr, uint64_t msr_content); > >> > + ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS); > >> > + > >> > + /* > >> > + * The vCPU is blocking, we need to add it to one of the per pCPU > lists. > >> > + * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use > it > >> for > >> > + * the per-CPU list, we also save it to posted-interrupt descriptor > and > >> > + * make it as the destination of the wake-up notification event. > >> > + */ > >> > + v->arch.hvm_vmx.pi_block_cpu = v->processor; > >> > + > >> > + spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, > >> > + v->arch.hvm_vmx.pi_block_cpu), flags); > >> > + list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list, > >> > + &per_cpu(pi_blocked_vcpu, v- > >arch.hvm_vmx.pi_block_cpu)); > >> > + spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, > >> > + v->arch.hvm_vmx.pi_block_cpu), flags); > >> > + > >> > + ASSERT(!pi_test_sn(pi_desc)); > >> > + > >> > + /* > >> > + * We don't need to set the 'NDST' field, since it should point to > >> > + * the same pCPU as v->processor. > >> > + */ > >> > + > >> > + write_atomic(&pi_desc->nv, pi_wakeup_vector); > >> > >> Stray blank line between comment and corresponding code. > >> > >> Also the ASSERT() is rather more disconnected from the write > >> than seems desirable: Wouldn't it be better to cmpxchg() the > >> whole 32-bit value, validating that SN is clear in the result? > > > > Not quite understand this. The control field is 64 bits, do you > > mean cmpxchg() the whole 64-bit value like follows: > > > > + do { > > + old.control = new.control = pi_desc->control; > > + new.nv = pi_wakeup_vector; > > + } while ( cmpxchg(&pi_desc->control, old.control, new.control) > > + != old.control ); > > > > This a somewhat like the implementation in the earlier versions, > > why do you want to change it back? > > Did you read what I've said? I'm worried about the disconnect of > assertion and update: You really care about SN being clear > _after_ the update aiui. No, the ASSERT has no connection with the update. the SN bit should be clear before updating pi_desc->nv, adding ASSERT here is just kind of protection code. > And since NDST doesn't change, a 32-bit > CMPXCHG would seem to suffice. > > >> > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > >> > + if ( pi_block_cpu == NR_CPUS ) > >> > + return; > >> > >> Are this condition and the one in the immediately preceding if() > >> connected in any way? > > > > I am not sure I understand this correctly. Which one is > > " the one in the immediately preceding if()"? > > If you hadn't ripped out too much of the context, it would be a > matter of pointing you back up. Now I have to re-quote the full > code: > > + if ( pi_desc->nv != posted_intr_vector ) > + write_atomic(&pi_desc->nv, posted_intr_vector); > + > + /* the vCPU is not on any blocking list. */ > + pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu; > + if ( pi_block_cpu == NR_CPUS ) > + return; > > These are the two if()-s the question is about. > > >> I.e. could the latter perhaps become an > >> ASSERT() by connecting the two blocks suitably? I'm simply > >> trying to limit the number of different cases to consider mentally > >> when looking at this code ... > > > > If we get a true from above check, it means the vcpu was removed by > > pi_wakeup_interrupt(), then we just need to return. If we get a false, > > we will acquire the spinlock as the following code does, there are two > > scenarios: > > - pi_wakeup_interrupt() acquires the spinlock before us, then when > > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been > > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing. > > - We get the spinlock before pi_wakeup_interrupt(), this time we need > > to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu' > > to NR_CPUS. > > This is all only about the second condition, but the question really is > whether one being true or false may imply the result of other. In > which case this would better be ASSERT()ed than handled by two > conditionals. Thanks for the explanation. I don't think there are any connections Between " if ( pi_desc->nv != posted_intr_vector )" and " if ( pi_block_cpu == NR_CPUS )". > > >> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v) > >> > > >> > spin_lock_init(&v->arch.hvm_vmx.vmcs_lock); > >> > > >> > + INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list); > >> > + > >> > + v->arch.hvm_vmx.pi_block_cpu = NR_CPUS; > >> > + > >> > v->arch.schedule_tail = vmx_do_resume; > >> > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > >> > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > >> > > >> > + if ( iommu_intpost ) > >> > + v->arch.vcpu_block = vmx_vcpu_block; > >> > >> Considering the locking question above - couldn't this be done only > >> when a device gets added, and the pointer zapped upon removal > >> of the last device, at once saving the conditional inside the hook? > > > > That is another solution, but I think it make simple thing difficult, > > I feel the current logic is simple and clear. Btw, this hook is not > > a critical path, another conditional in it should not a big deal. > > Then you didn't understand: The question isn't this path, but the > path where the hook gets called if non-NULL (and hence the > possibility to avoid such needless calls). I understand you mean the overhead happens when the hooks is called. My point is the hook is not called in a critical path, so I doubt whether it worth doing so to make the logic complex. > > >> > --- a/xen/common/schedule.c > >> > +++ b/xen/common/schedule.c > >> > @@ -802,6 +802,8 @@ void vcpu_block(void) > >> > > >> > set_bit(_VPF_blocked, &v->pause_flags); > >> > > >> > + arch_vcpu_block(v); > >> > + > >> > /* Check for events /after/ blocking: avoids wakeup waiting race. */ > >> > if ( local_events_need_delivery() ) > >> > { > >> > >> ISTR some previous discussion on this placement - wasn't it > >> determined that putting it in the else part here would be > >> sufficient for your needs and better in terms of overhead? > >> Or is there some race possible if moved past the invocation > >> of local_events_need_delivery()? > > > > The discussion happened several month ago, it is really hard to recall > > every details, I have to look for it from my index. It is really bad to > > last for so long for the review. > > I understand the frustration on your end, but I can only justify > it by the frustration at my end resulting from everyone dumping > out their patch sets with only very few people helping to deal > with that flood. Plus, after previous iterations, I expected the > review to be more involved than it turned out in the end (i.e. > you've done a good job in simplifying things after the feedback > you had received - thanks for that). > > > We need to call arch_vcpu_block() before local_events_need_delivery(), > > since VT-d hardware can issue notification event when we are in > > arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check > > it in local_events_need_delivery(). If we do arch_vcpu_block() in the > > else part, we cannot handle this. This is one reason I can recall, I am > > not sure whether there are other concerns since it has been really > > a long time. The current implementation is fully discussed with scheduler > > maintainers. > > Okay, this all makes sense. It's just that I don't see how the ON > bit would get checked by local_events_need_delivery(). But that > may in turn be because, with the long turnaround, I've lost some > of the details of the rest of this series. Here is how ON is checked: local_events_need_delivery() --> hvm_local_events_need_delivery() --> hvm_vcpu_has_pending_irq() --> vlapic_has_pending_irq() --> vlapic_find_highest_irr() --> vmx_sync_pir_to_irr() --> pi_test_and_clear_on() > > >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > >> > >> > + unsigned int pi_block_cpu; > >> > >> Looking at all the use sites I wonder whether it wouldn't make for > >> easier to read code if you instead stored the lock to acquire. This > >> would then perhaps also prepare a road for balancing lists growing > >> too large (i.e. rather than having one list per CPU, you could > >> then have a variable number of lists, depending on current needs, > >> and [kind of] freely select one of them). > > > > Since this patch is still experimental, can we put the "length of the list" > > issue in the next stage when we turn it to default on? > > I in no way meant to say that the list length issue needs to be > dealt with right away. All I was saying was that storing a lock > pointer here instead of a CPU number may make that easier (and > might therefore be a good idea to do right away). Do you mean storing a pointer to ' pi_blocked_vcpu_lock ' here? Are you strong with this method. I will think about it a bit more to see whether it affects current implementation a lot. Thanks, Feng > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.