[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] long tail latency caused by rate-limit in Xen credit2
On Tue, Jun 13, 2017 at 3:51 PM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote: > On Tue, 2017-06-13 at 14:59 -0500, T S wrote: >> Hi all, >> > Hi, > > Nice to hear from you again... You guys always have interesting things > to say/report/etc., about scheduling... I truly appreciate what you > do! :-) > Thank you for your reply, Dario. : ) >> When I was executing the latency-sensitive applications in VMs on the >> latest Xen, >> I found the rate limit will cause the long tail latency for VMs >> sharing CPU with other VMs. >> > Yeah, I totally can see how this can be the case. > > Personally, I'm not a fan of context switch rate limiting. Not at all. > But it has proven to be useful in some workloads, so it's good for it > to be there. > > I think the scenario you describe is one of those cases where rate > limiting is better be disabled. :-) > >> (1) Problem description >> >> [snip] >> >> (2) Problem analysis >> >> ------------Analysis---------------- >> I read the source code in Xen credit2 scheduler. The vCPU priority >> used in credit1 such as OVER, UNDER, BOOST, is all removed and all >> the >> vCPUs are just ordered by their credit. I traced vCPU credit and the >> I/O-VM vCPU credit is always larger than the CPU-VM credit. So the >> order of I/O-VM vCPU is always ahead of the CPU-VM vCPU. >> >> Next, I traced the time gap between vCPU wake and vCPU scheduler >> function. I found that if the I/O-VM run alone, the time gap is about >> 3,000ns; however, if the I/O-VM co-run with CPU-VM on the same core, >> the time gap enlarged to 1,000,000ns and that happened in every vCPU >> scheduling. That reminded me the ratelimit in the Xen credit >> scheduler. The default ratelimit in Xen is 1000us. >> >> As I modified the the ratelimit to 100us in the terminal: >> $ sudo /usr/local/sbin/xl sched-credit2 -s -r 100 >> >> The average latency is reduced from 300+us to 200+us and the tail >> latency is also reduced. >> > Ok, good to hear that things are behaving as expected. :-D > >> [another snip] >> >> However, the minimum value of ratelimit is 100us which means there >> still exists the gap between the mix running VMs case and the running >> alone VM case. (P.S. the valid range of ratelimit is from 100 to >> 500000us). To mitigate the latency, the users have to run the I/O VMs >> on a dedicated core but that will waste lots of CPU resources on the >> other hand. >> >> As an experiment test, I modified the Xen source code to allow the >> ratelimit could be set as 0. As below, here is the result when I set >> the ratelimit to 0. Both average latency and tail latency when >> co-running with CPU-VMs is at the same magnitude and range of that in >> I/O-VM running alone. >> > Wait... it is already possible to disable ratelimiting. I mean, you're > right, you can't set it to 50us, because, if it's not 0, then it have > to be > 100us. > > But you can do: > > $ sudo /usr/local/sbin/xl sched-credit2 -s -r 0 > > and it will be disabled. > > That was possible last time I tried. If it's not right now, then you've > found a bug (I'll double check this tomorrow morning). > Yes. You are right. I just double check on my clean Xen 4.8.1. I can disable the ratelimit by $ sudo /usr/local/sbin/xl sched-credit2 -s -r 0 >> sockperf: ====> avg-lat= 71.766 (std-dev=1.618) >> sockperf: # dropped messages = 0; # duplicated messages = 0; # >> out-of-order messages = 0 >> sockperf: Summary: Latency is 71.766 usec >> sockperf: Total 1999 observations; each percentile contains 19.99 >> observations >> sockperf: ---> <MAX> observation = 99.257 >> sockperf: ---> percentile 99.999 = 99.257 >> sockperf: ---> percentile 99.990 = 99.257 >> sockperf: ---> percentile 99.900 = 84.155 >> sockperf: ---> percentile 99.000 = 78.873 >> sockperf: ---> percentile 90.000 = 73.920 >> sockperf: ---> percentile 75.000 = 72.546 >> sockperf: ---> percentile 50.000 = 71.458 >> sockperf: ---> percentile 25.000 = 70.518 >> sockperf: ---> <MIN> observation = 63.150 >> > Well, not too bad, considering it's running concurrently with another > VM. It means the scheduler is doing a good job "prioritizing" I/O the > bound workload. > >> Similar problem could also be found in credit1 scheduler. >> > And again, it should be possible to disable ratelimiting while on > Credit1 as well, in a similar manner (I'll check this too). > I just checked on credit1 in Xen 4.8.1. I can disable the ratelimit by $ sudo /usr/local/sbin/xl sched-credit -s -r 0. Thanks. > Thanks and Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) -- ********************************** > Tony Suo > Computer Science, University of Texas at Arlington ********************************** _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |