[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available
> -----Original Message----- > From: Qemu-devel [mailto:qemu-devel- > bounces+paul.durrant=citrix.com@xxxxxxxxxx] On Behalf Of Paul Durrant > Sent: 21 June 2017 10:36 > To: Roger Pau Monne <roger.pau@xxxxxxxxxx>; Stefano Stabellini > <sstabellini@xxxxxxxxxx> > Cc: Kevin Wolf <kwolf@xxxxxxxxxx>; qemu-block@xxxxxxxxxx; qemu- > devel@xxxxxxxxxx; Max Reitz <mreitz@xxxxxxxxxx>; Anthony Perard > <anthony.perard@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx > Subject: Re: [Qemu-devel] [PATCH 1/3] xen-disk: only advertize feature- > persistent if grant copy is not available > > > -----Original Message----- > > From: Roger Pau Monne > > Sent: 21 June 2017 10:18 > > To: Stefano Stabellini <sstabellini@xxxxxxxxxx> > > Cc: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; xen- > devel@xxxxxxxxxxxxxxxxxxxx; > > qemu-devel@xxxxxxxxxx; qemu-block@xxxxxxxxxx; Anthony Perard > > <anthony.perard@xxxxxxxxxx>; Kevin Wolf <kwolf@xxxxxxxxxx>; Max > Reitz > > <mreitz@xxxxxxxxxx> > > Subject: Re: [PATCH 1/3] xen-disk: only advertize feature-persistent if > grant > > copy is not available > > > > On Tue, Jun 20, 2017 at 03:19:33PM -0700, Stefano Stabellini wrote: > > > On Tue, 20 Jun 2017, Paul Durrant wrote: > > > > If grant copy is available then it will always be used in preference to > > > > persistent maps. In this case feature-persistent should not be > advertized > > > > to the frontend, otherwise it may needlessly copy data into persistently > > > > granted buffers. > > > > > > > > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> > > > > > > CC'ing Roger. > > > > > > It is true that using feature-persistent together with grant copies is a > > > a very bad idea. > > > > > > But this change enstablishes an explicit preference of > > > feature_grant_copy over feature-persistent in the xen_disk backend. It > > > is not obvious to me that it should be the case. > > > > > > Why is feature_grant_copy (without feature-persistent) better than > > > feature-persistent (without feature_grant_copy)? Shouldn't we simply > > > avoid grant copies to copy data to persistent grants? > > > > When using persistent grants the frontend must always copy data from > > the buffer to the persistent grant, there's no way to avoid this. > > > > Using grant_copy we move the copy from the frontend to the backend, > > which means the CPU time of the copy is accounted to the backend. This > > is not ideal, but IMHO it's better than persistent grants because it > > avoids keeping a pool of mapped grants that consume memory and make > > the code more complex. > > > > Do you have some performance data showing the difference between > > persistent grants vs grant copy? > > > > No, but I can get some :-) > > For a little background... I've been trying to push throughput of fio running > in > a debian stretch guest on my skull canyon NUC. When I started out, I was > getting ~100MBbs. When I finished, with this patch, the IOThreads one, the > multi-page ring one and a bit of hackery to turn off all the aio flushes that > seem to occur even if the image is opened with O_DIRECT, I was getting > ~960Mbps... which is about line rate for the SSD in the in NUC. > > So, I'll force use of persistent grants on and see what sort of throughput I > get. A quick test with grant copy forced off (causing persistent grants to be used)... My VM is debian stretch using a 16 page shared ring from blkfront. The image backing xvdb is a fully inflated 10G qcow2. root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4 test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64 fio-2.16 Starting 1 process Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/539.4MB/0KB /s] [0/1078/0 iops] [eta 00m:05s] test: (groupid=0, jobs=1): err= 0: pid=633: Wed Jun 21 06:26:06 2017 write: io=6146.6MB, bw=795905KB/s, iops=1546, runt= 7908msec cpu : usr=2.07%, sys=34.00%, ctx=4490, majf=0, minf=1 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=166.9% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=0/w=12230/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: io=6146.6MB, aggrb=795904KB/s, minb=795904KB/s, maxb=795904KB/s, mint=7908msec, maxt=7908msec Disk stats (read/write): xvdb: ios=54/228860, merge=0/2230616, ticks=16/5403048, in_queue=5409068, util=98.26% The dom0 cpu usage for the relevant IOThread was ~60% The same test with grant copy... root@dhcp-237-70:~# fio --randrepeat=1 --ioengine=libaio --direct=0 --gtod_reduce=1 --name=test --filename=/dev/xvdb --bs=512k --iodepth=64 --size=10G --readwrite=randwrite --ramp_time=4 test: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=64 fio-2.16 Starting 1 process Jobs: 1 (f=1): [w(1)] [70.6% done] [0KB/607.7MB/0KB /s] [0/1215/0 iops] [eta 00m:05s] test: (groupid=0, jobs=1): err= 0: pid=483: Wed Jun 21 06:35:14 2017 write: io=6232.0MB, bw=810976KB/s, iops=1575, runt= 7869msec cpu : usr=2.44%, sys=37.42%, ctx=3570, majf=0, minf=1 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=164.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=0/w=12401/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: io=6232.0MB, aggrb=810975KB/s, minb=810975KB/s, maxb=810975KB/s, mint=7869msec, maxt=7869msec Disk stats (read/write): xvdb: ios=54/229583, merge=0/2235879, ticks=16/5409500, in_queue=5415080, util=98.27% So, higher throughput and iops. The dom0 cpu usage was running at ~70%, so there is definitely more dom0 overhead by using grant copy. The usage of grant copy could probably be improved through since the current code issues an copy ioctl per ioreq. With some batching I suspect some, if not all, of the extra overhead could be recovered. Cheers, Paul > > Cheers, > > Paul > > > Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |