[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [PATCH] blkback: Fix block I/O latency issue
Thanks for running the tests. Very useful data.
Re: Experiment to show latency improvement
I never ran anything on ramdisk.
You should be able to see the latency benefit with 'orion' tool but I am
sure other tools can be used as well. For a volume backed by a single disk
drive, keep the number of small random I/O outstanding to 2 (I think
"num_small" parameter in orion should do the job) with a 50-50 mix of
write and read. Measure the latency reported by the guest and Dom-0 &
compare them. For LVM volumes that present multiple drives as a single LUN
(inside the guest), the latency improvement will be the highest when the
number of I/O outstanding is 2X the number of spindles. This is the
'moderate I/O' scenario I was describing and you should see significant
improvement in latencies.
If you allow page cache to perform sequential I/O using dd or other
sequential non-direct I/O generation tool, you should find that the
interrupt rate doesn't go up for high I/O load. Thinking about this, I
think burstiness of I/O submission as seen by the driver is also a key
player particularly in the absence of I/O coalescing waits introduced by
I/O scheduler. Page cache draining is notoriously bursty.
>>queue depth of 256.
What 'queue depth' is this ? If I am not wrong, blkfront-blkback is
restricted to ~32 max pending I/Os due to the limit of one page being used
for mailbox entries - no ?
>>But to my surprise the case where the I/O latency is high, the interrupt
>>generation was quite small
If this patch results in an extra interrupt, it will very likely result in
reduction of latency for the next I/O. If the interrupt generation
increase is not high, then the number of I/Os whose latencies this patch
has improved is low. Looks like your workload belonged to this category.
Perhaps that's why you didn't much of an improvement in overall
performance ? I think this is close to the high I/O workload scenario I
>>But where the I/O latency was very very small (4 microseconds) the
>>interrupt generation was on average about 20K/s.
This is not a scenario I tested but the results aren't surprising. This
isn't the high I/O load I was describing though (I didn't test ramdisk).
SSD is probably the closest real world workload.
An increase of 20K/sec means this patch very likely improved latency of
20K I/Os per sec although the absolute value of latency improvements would
be smaller in this case. 20K/sec interrupt rate (50usec delay between
interrupt) is something I would be comfortable with if they directly
translate to latency improvement for the users. The graphs seem to
indicate a 5% increase in throughput for this case - Am I reading the
graphs right ?
Overall, Very useful tests indeed and I haven't seen anything too
concerning or unexpected except that I don't think you have seen the 50+%
latency benefit that the patch got me in my moderate I/O benchmark :-)
Feel free to ping me offline if you aren't able to see the latency impact
using the 'moderate I/O' methodology described above.
About IRQ coalescing: Stepping back a bit, there are few different use
cases that irq coalescing mechanism would be useful for
1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
useful for SSDs.
2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
which will essentially cap the theoretical interrupt rate to 5K.
3. Excessive CPU consumption Mitigation: This is similar to (2) but
includes the case of malicious guests. Perhaps not a big concern unless
you have lots of drives attached to each guest.
I suspect the implementation for (1) and (2) would be different (spin vs
sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
since a guest has the ability to abuse irq channel independent of what
'blkback' tries to tell 'blkfront' via 'req_event' manipulation.
(3) could be implemented in the hypervisor as a generic irq throttler that
could be leveraged for all irqs heading to Dom-0 from DomUs including
blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
as well. Thoughts ?
One crude way to address (3) for 'many disk drive' scenario is to pin
all/most blkback interrupts for an instance to the same CPU core in Dom-0
and throttle down the thread wake up (wake_up(&blkif->wq) in
blkif_notify_work) that usually results in IPIs. Not an elegant solution
but might be a good crutch.
Another angle to (1) and (2) is whether these irq coalesce settings should
be controllable by the guest, perhaps within limits set by the
Thoughts ? Suggestions ?
Konrad, Love to help out if you are already working on something around
irq coalescing. Or when I have irq coalescing functionality that can be
consumed by community I will certainly submit them.
Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
just because there is no irq coalescing functionality. Particularly since
the downside is very minimal on blkfront-blkback stack. My 2 cents..
Thanks much Konrad,
- Pradeep Vincent
On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@xxxxxxxxxx> wrote:
>On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
>> > >>what were the numbers when it came to high bandwidth numbers
>> > Under high I/O workload, where the blkfront would fill up the queue as
>> > blkback works the queue, the I/O latency problem in question doesn't
>> > manifest itself and as a result this patch doesn't make much of a
>> > difference in terms of interrupt rate. My benchmarks didn't show any
>> > significant effect.
>> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
>> writting as much as they can to a iSCSI disk), the IRQ rate for each
>> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
>> job on capturing the submission latency to see if the I/Os get the
>> response back as fast (or the same) as without your patch.
>> And the iSCSI disk on the target side was an RAMdisk, so latency
>> was quite small which is not fair to your problem.
>> Do you have a program to measure the latency for the workload you
>> had encountered? I would like to run those numbers myself.
>Ran some more benchmarks over this week. This time I tried to run it on:
> - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
> latency is set to 1msec).
> - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
> 4 microseconds completion with queue depth of one with 32K I/Os).
> - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
>With only one PV guest doing a round (three times) of two threads randomly
>writting I/Os with a queue depth of 256. Then a different round of four
>threads writting/reading (80/20) 512bytes up to 64K randomly over the
>I used the attached patch against #master
>to gauge how well we are doing (and what the interrupt generation rate
>These workloads I think would be considered 'high I/O' and I was expecting
>your patch to not have any influence on the numbers.
>But to my surprise the case where the I/O latency is high, the interrupt
>was quite small. But where the I/O latency was very very small (4
>the interrupt generation was on average about 20K/s. And this is with a
>of 256 with four threads. I was expecting the opposite. Hence quite
>to see your use case.
>What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
>With the high I/O load, the numbers came out to give us about 1% benefit
>patch. However, I am worried (maybe unneccassarily?) about the 20K
>when the iometer tests kicked in (this was only when using the
>The picture of this using iSCSI target:
>And when done on top of local RAMdisk:
Xen-devel mailing list