[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail



On Thu, Jun 13, 2024 at 04:05:08PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> > Right.  AFAICT advertising "feature-barrier" and/or
> > "feature-flush-cache" could be done based on whether blkback
> > understand those commands, not on whether the underlying storage
> > supports the equivalent of them.
> > 
> > Worst case we can print a warning message once about the underlying
> > storage failing to complete flush/barrier requests, and that data
> > integrity might not be guaranteed going forward, and not propagate the
> > error to the upper layer?
> > 
> > What would be the consequence of propagating a flush error to the
> > upper layers?
> 
> If you propage the error to the upper layer you will generate an
> I/O error there, which usually leads to a file system shutdown.
> 
> > Given the description of the feature in the blkif header, I'm afraid
> > we cannot guarantee that seeing the feature exposed implies barrier or
> > flush support, since the request could fail at any time (or even from
> > the start of the disk attachment) and it would still sadly be a correct
> > implementation given the description of the options.
> 
> Well, then we could do something like the patch below, which keeps
> the existing behavior, but insolates the block layer from it and
> removes the only user of blk_queue_write_cache from interrupt
> context:

LGTM, I'm not sure there's much else we can do.

> ---
> From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@xxxxxx>
> Date: Thu, 30 May 2024 08:58:52 +0200
> Subject: xen-blkfront: don't disable cache flushes when they fail
> 
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes in the block layer as the
> flags handling is moving to the atomic queue limits API that needs
> user context to freeze the queue for that.  Instead handle the case
> of the feature flags cleared inside of blkfront.  This removes old
> debug code to check for such a mismatch which was previously impossible
> to hit, including the check for passthrough requests that blkfront
> never used to start with.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  drivers/block/xen-blkfront.c | 44 +++++++++++++++++++-----------------
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..e2c92d5095ff17 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>                        * A barrier request a superset of FUA, so we can
>                        * implement it the same way.  (It's also a FLUSH+FUA,
>                        * since it is guaranteed ordered WRT previous writes.)
> +                      *
> +                      * Note that can end up here with a FUA write and the
> +                      * flags cleared.  This happens when the flag was
> +                      * run-time disabled and raced with I/O submission in
> +                      * the block layer.  We submit it as a normal write

Since blkfront no longer signals that FUA is no longer available for the
device, getting a request with FUA is not actually a race I think?

> +                      * here.  A pure flush should never end up here with
> +                      * the flags cleared as they are completed earlier for
> +                      * the !feature_flush case.
>                        */
>                       if (info->feature_flush && info->feature_fua)
>                               ring_req->operation =
> @@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
>                       else if (info->feature_flush)
>                               ring_req->operation =
>                                       BLKIF_OP_FLUSH_DISKCACHE;
> -                     else
> -                             ring_req->operation = 0;
>               }
>               ring_req->u.rw.nr_segments = num_grant;
>               if (unlikely(require_extra_req)) {
> @@ -887,16 +893,6 @@ static inline void flush_requests(struct 
> blkfront_ring_info *rinfo)
>               notify_remote_via_irq(rinfo->irq);
>  }
>  
> -static inline bool blkif_request_flush_invalid(struct request *req,
> -                                            struct blkfront_info *info)
> -{
> -     return (blk_rq_is_passthrough(req) ||
> -             ((req_op(req) == REQ_OP_FLUSH) &&
> -              !info->feature_flush) ||
> -             ((req->cmd_flags & REQ_FUA) &&
> -              !info->feature_fua));
> -}
> -
>  static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>                         const struct blk_mq_queue_data *qd)
>  {
> @@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx 
> *hctx,
>       rinfo = get_rinfo(info, qid);
>       blk_mq_start_request(qd->rq);
>       spin_lock_irqsave(&rinfo->ring_lock, flags);
> -     if (RING_FULL(&rinfo->ring))
> -             goto out_busy;
>  
> -     if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
> -             goto out_err;
> +     /*
> +      * Check if the backend actually supports flushes.
> +      *
> +      * While the block layer won't send us flushes if we don't claim to
> +      * support them, the Xen protocol allows the backend to revoke support
> +      * at any time.  That is of course a really bad idea and dangerous, but
> +      * has been allowed for 10+ years.  In that case we simply clear the
> +      * flags, and directly return here for an empty flush and ignore the
> +      * FUA flag later on.
> +      */
> +     if (unlikely(req_op(qd->rq) == REQ_OP_FLUSH && !info->feature_flush))
> +             goto out;

Don't you need to complete the request here?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.