[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 00/10] Multi-queue support for xen-block driver

> -----Original Message-----
> From: Bob Liu [mailto:bob.liu@xxxxxxxxxx]
> Sent: 15 February 2015 08:19
> To: xen-devel@xxxxxxxxxxxxx
> Cc: David Vrabel; linux-kernel@xxxxxxxxxxxxxxx; Roger Pau Monne;
> konrad.wilk@xxxxxxxxxx; Felipe Franciosi; axboe@xxxxxx; hch@xxxxxxxxxxxxx;
> avanzini.arianna@xxxxxxxxx; Bob Liu
> Subject: [RFC PATCH 00/10] Multi-queue support for xen-block driver
> This patchset convert the Xen PV block driver to the multi-queue block layer 
> by sharing and using multiple I/O rings between the frontend and backend.
> History:
> It's based on the result of Arianna's internship for GNOME's Outreach Program
> for Women, in which she was mentored by Konrad Rzeszutek Wilk. I also
> worked on this patchset with her at that time, and now fully take over this 
> task.
> I've got her authorization to "change authorship or SoB to the patches as you
> like."
> A few words on block multi-queue layer:
> Multi-queue block layer improved block scalability a lot by split single 
> request
> queue to per-processor software queues and hardware dispatch queues. The
> linux blk-mq API will handle software queues, while specific block driver must
> deal with hardware queues.

IIUC, the main motivation around the blk-mq work was around locking issues on a 
block device's request queue when accessed concurrently from different NUMA 
nodes. I believe we are not stressing enough on the main benefit of taking such 
approach on Xen.

Many modern storage systems (e.g. NVMe devices) will respond much better 
(especially when it comes to IOPS) to a high number of outstanding requests. 
That can be achieved by having a single thread sustaining a high IO depth 
_and/or_ several different threads issuing requests at the same time. The 
former approach is often limited by CPU capacity; that is, we can suffer from 
only being able to handle so many interrupts being delivered to the (v)CPU that 
the single thread is running on (also simply observable by 'top' showing the 
thread smoking at 100%). The latter approach is more flexible, given that many 
threads can run over several different (v)CPUs. I have a lot of data around 
this topic and am happy to share if people are interested.

We can therefore use the multi-queue block layer in a guest to have more than 
one request queue associated with block front. These can be mapped over several 
rings to the backend, making it very easy for us to run multiple threads on the 
backend for a single virtual disk. I believe this is why Bob is seeing massive 
improvements when running 'fio' in a guest with an increased number of jobs.

In my opinion, this motivation should be highlighted behind the blk-mq adoption 
by Xen.


> The xen/block implementation:
> 1) Convert to blk-mq api with only one hardware queue.
> 2) Use more rings to act as multi hardware queues.
> 3) Negotiate number of hardware queues, the same as xen-net driver. The
> backend notify "multi-queue-max-queues" to frontend, then the front write
> back final number to "multi-queue-num-queues".
> Test result:
> fio's IOmeter emulation on a 16 cpus domU with a null_blk device, hardware
> queue number was 16.
> nr_fio_jobs      IOPS(before)   IOPS(after)     Diff
>       1                 57k             58k       0%
>       4                 95k            201k    +210%
>       8                 89k            372k    +410%
>        16                 68k            284k    +410%
>        32                 65k            196k    +300%
>        64                 63k            183k    +290%
> More results are coming, there was also big improvement on both write-IOPS
> and latency.
> Any comments or suggestions are welcome.
> Thank you,
> -Bob Liu
> Bob Liu (10):
>   xen/blkfront: convert to blk-mq API
>   xen/blkfront: drop legacy block layer support
>   xen/blkfront: reorg info->io_lock after using blk-mq API
>   xen/blkfront: separate ring information to an new struct
>   xen/blkback: separate ring information out of struct xen_blkif
>   xen/blkfront: pseudo support for multi hardware queues
>   xen/blkback: pseudo support for multi hardware queues
>   xen/blkfront: negotiate hardware queue number with backend
>   xen/blkback: get hardware queue number from blkfront
>   xen/blkfront: use work queue to fast blkif interrupt return
>  drivers/block/xen-blkback/blkback.c | 370 ++++++++-------  drivers/block/xen-
> blkback/common.h  |  54 ++-  drivers/block/xen-blkback/xenbus.c  | 415
> +++++++++++------
>  drivers/block/xen-blkfront.c        | 894 
> +++++++++++++++++++++---------------
>  4 files changed, 1018 insertions(+), 715 deletions(-)
> --

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.