[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] long tail latency caused by rate-limit in Xen credit2
On Tue, 2017-06-13 at 14:59 -0500, T S wrote: > Hi all, > Hi, Nice to hear from you again... You guys always have interesting things to say/report/etc., about scheduling... I truly appreciate what you do! :-) > When I was executing the latency-sensitive applications in VMs on the > latest Xen, > I found the rate limit will cause the long tail latency for VMs > sharing CPU with other VMs. > Yeah, I totally can see how this can be the case. Personally, I'm not a fan of context switch rate limiting. Not at all. But it has proven to be useful in some workloads, so it's good for it to be there. I think the scenario you describe is one of those cases where rate limiting is better be disabled. :-) > (1) Problem description > > [snip] > > (2) Problem analysis > > ------------Analysis---------------- > I read the source code in Xen credit2 scheduler. The vCPU priority > used in credit1 such as OVER, UNDER, BOOST, is all removed and all > the > vCPUs are just ordered by their credit. I traced vCPU credit and the > I/O-VM vCPU credit is always larger than the CPU-VM credit. So the > order of I/O-VM vCPU is always ahead of the CPU-VM vCPU. > > Next, I traced the time gap between vCPU wake and vCPU scheduler > function. I found that if the I/O-VM run alone, the time gap is about > 3,000ns; however, if the I/O-VM co-run with CPU-VM on the same core, > the time gap enlarged to 1,000,000ns and that happened in every vCPU > scheduling. That reminded me the ratelimit in the Xen credit > scheduler. The default ratelimit in Xen is 1000us. > > As I modified the the ratelimit to 100us in the terminal: > $ sudo /usr/local/sbin/xl sched-credit2 -s -r 100 > > The average latency is reduced from 300+us to 200+us and the tail > latency is also reduced. > Ok, good to hear that things are behaving as expected. :-D > [another snip] > > However, the minimum value of ratelimit is 100us which means there > still exists the gap between the mix running VMs case and the running > alone VM case. (P.S. the valid range of ratelimit is from 100 to > 500000us). To mitigate the latency, the users have to run the I/O VMs > on a dedicated core but that will waste lots of CPU resources on the > other hand. > > As an experiment test, I modified the Xen source code to allow the > ratelimit could be set as 0. As below, here is the result when I set > the ratelimit to 0. Both average latency and tail latency when > co-running with CPU-VMs is at the same magnitude and range of that in > I/O-VM running alone. > Wait... it is already possible to disable ratelimiting. I mean, you're right, you can't set it to 50us, because, if it's not 0, then it have to be > 100us. But you can do: $ sudo /usr/local/sbin/xl sched-credit2 -s -r 0 and it will be disabled. That was possible last time I tried. If it's not right now, then you've found a bug (I'll double check this tomorrow morning). > sockperf: ====> avg-lat= 71.766 (std-dev=1.618) > sockperf: # dropped messages = 0; # duplicated messages = 0; # > out-of-order messages = 0 > sockperf: Summary: Latency is 71.766 usec > sockperf: Total 1999 observations; each percentile contains 19.99 > observations > sockperf: ---> <MAX> observation = 99.257 > sockperf: ---> percentile 99.999 = 99.257 > sockperf: ---> percentile 99.990 = 99.257 > sockperf: ---> percentile 99.900 = 84.155 > sockperf: ---> percentile 99.000 = 78.873 > sockperf: ---> percentile 90.000 = 73.920 > sockperf: ---> percentile 75.000 = 72.546 > sockperf: ---> percentile 50.000 = 71.458 > sockperf: ---> percentile 25.000 = 70.518 > sockperf: ---> <MIN> observation = 63.150 > Well, not too bad, considering it's running concurrently with another VM. It means the scheduler is doing a good job "prioritizing" I/O the bound workload. > Similar problem could also be found in credit1 scheduler. > And again, it should be possible to disable ratelimiting while on Credit1 as well, in a similar manner (I'll check this too). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |