Xen project Mailing List

Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

On 09/03/16 05:22, Wu, Feng wrote: > > >> -----Original Message----- >> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] >> Sent: Wednesday, March 9, 2016 1:06 AM >> To: Jan Beulich <JBeulich@xxxxxxxx>; George Dunlap >> <George.Dunlap@xxxxxxxxxxxxx>; Wu, Feng <feng.wu@xxxxxxxxx> >> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario Faggioli >> <dario.faggioli@xxxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen- >> devel@xxxxxxxxxxxxx; Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Keir >> Fraser <keir@xxxxxxx> >> Subject: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt >> core logic handling >> >> On 08/03/16 15:42, Jan Beulich wrote: >>>>>> On 08.03.16 at 15:42, <George.Dunlap@xxxxxxxxxxxxx> wrote: >>>> On Tue, Mar 8, 2016 at 1:10 PM, Wu, Feng <feng.wu@xxxxxxxxx> wrote: >>>>>> -----Original Message----- >>>>>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] >>>> [snip] >>>>>> It seems like there are a couple of ways we could approach this: >>>>>> >>>>>> 1. Try to optimize the reverse look-up code so that it's not a linear >>>>>> linked list (getting rid of the theoretical fear) >>>>> >>>>> Good point. >>>>> >>>>>> >>>>>> 2. Try to test engineered situations where we expect this to be a >>>>>> problem, to see how big of a problem it is (proving the theory to be >>>>>> accurate or inaccurate in this case) >>>>> >>>>> Maybe we can run a SMP guest with all the vcpus pinned to a dedicated >>>>> pCPU, we can run some benchmark in the guest with VT-d PI and without >>>>> VT-d PI, then see the performance difference between these two sceanrios. >>>> >>>> This would give us an idea what the worst-case scenario would be. >>> >>> How would a single VM ever give us an idea about the worst >>> case? Something getting close to worst case is a ton of single >>> vCPU guests all temporarily pinned to one and the same pCPU >>> (could be multi-vCPU ones, but the more vCPU-s the more >>> artificial this pinning would become) right before they go into >>> blocked state (i.e. through one of the two callers of >>> arch_vcpu_block()), the pinning removed while blocked, and >>> then all getting woken at once. >> >> Why would removing the pinning be important? >> >> And I guess it's actually the case that it doesn't need all VMs to >> actually be *receiving* interrupts; it just requires them to be >> *capable* of receiving interrupts, for there to be a long chain all >> blocked on the same physical cpu. >> >>> >>>> But >>>> pinning all vcpus to a single pcpu isn't really a sensible use case we >>>> want to support -- if you have to do something stupid to get a >>>> performance regression, then I as far as I'm concerned it's not a >>>> problem. >>>> >>>> Or to put it a different way: If we pin 10 vcpus to a single pcpu and >>>> then pound them all with posted interrupts, and there is *no* >>>> significant performance regression, then that will conclusively prove >>>> that the theoretical performance regression is of no concern, and we >>>> can enable PI by default. >>> >>> The point isn't the pinning. The point is what pCPU they're on when >>> going to sleep. And that could involve quite a few more than just >>> 10 vCPU-s, provided they all sleep long enough. >>> >>> And the "theoretical performance regression is of no concern" is >>> also not a proper way of looking at it, I would say: Even if such >>> a situation would happen extremely rarely, if it can happen at all, >>> it would still be a security issue. >> >> What I'm trying to get at is -- exactly what situation? What actually >> constitutes a problematic interrupt latency / interrupt processing >> workload, how many vcpus must be sleeping on the same pcpu to actually >> risk triggering that latency / workload, and how feasible is it that >> such a situation would arise in a reasonable scenario? >> >> If 200us is too long, and it only takes 3 sleeping vcpus to get there, >> then yes, there is a genuine problem we need to try to address before we >> turn it on by default. If we say that up to 500us is tolerable, and it >> takes 100 sleeping vcpus to reach that latency, then this is something I >> don't really think we need to worry about. >> >> "I think something bad may happen" is a really difficult to work with. >> "I want to make sure that even a high number of blocked cpus won't cause >> the interrupt latency to exceed 500us; and I want it to be basically >> impossible for the interrupt latency to exceed 5ms under any >> circumstances" is a concrete target someone can either demonstrate that >> they meet, or aim for when trying to improve the situation. >> >> Feng: It should be pretty easy for you to: > > George, thanks a lot for you to pointing the possible way to move forward. > >> * Implement a modified version of Xen where >> - *All* vcpus get put on the waitqueue > > So this means, all the vcpus are blocked, and hence waiting in the > blocking list, right? No. For testing purposes, we need a lot of vcpus on the list, but we only need one vcpu to actually be woken up to see low long it takes to traverse the list. At the moment, a vcpu will only be put on the list if it has the arch_block callback defined; and it will have the arch_block callback defined only if the domain it's a part of has a device assigned to it. But it would be easy enough to make it so that *all* VMs have the arch_block callback defined; then all vcpus would end up on the pi_blocked list when they're blocked, even if they don't have a device assigned. That way you could have a really long pi_blocked list while only needing a single device to pass through to the guest. >> - Measure how long it took to run the loop in pi_wakeup_interrupt >> * Have one VM receiving posted interrupts on a regular basis. >> * Slowly increase the number of vcpus blocked on a single cpu (e.g., by >> creating more guests), stopping when you either reach 500us or 500 >> vcpus. :-) > > This may depends on the environment, I was using a 10G NIC to do the > test, if we increase the number of guests, I need more NICs to get assigned > to the guests, I will see if I can get them. ...which is why I suggested setting the arch_block() callback for all domains, even those which don't have devices assigned, so that you could get away with a single passed-through device. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.