[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH -next RFC 01/14] block: add some bdev apis

To: Yu Kuai <yukuai1@xxxxxxxxxxxxxxx>
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Tue, 5 Dec 2023 22:14:00 -0800
Cc: axboe@xxxxxxxxx, roger.pau@xxxxxxxxxx, colyli@xxxxxxx, kent.overstreet@xxxxxxxxx, joern@xxxxxxxxxxxxxxx, miquel.raynal@xxxxxxxxxxx, richard@xxxxxx, vigneshr@xxxxxx, sth@xxxxxxxxxxxxx, hoeppner@xxxxxxxxxxxxx, hca@xxxxxxxxxxxxx, gor@xxxxxxxxxxxxx, agordeev@xxxxxxxxxxxxx, jejb@xxxxxxxxxxxxx, martin.petersen@xxxxxxxxxx, clm@xxxxxx, josef@xxxxxxxxxxxxxx, dsterba@xxxxxxxx, nico@xxxxxxxxxxx, xiang@xxxxxxxxxx, chao@xxxxxxxxxx, tytso@xxxxxxx, adilger.kernel@xxxxxxxxx, agruenba@xxxxxxxxxx, jack@xxxxxxxx, konishi.ryusuke@xxxxxxxxx, willy@xxxxxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, hare@xxxxxxx, p.raghav@xxxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-bcache@xxxxxxxxxxxxxxx, linux-mtd@xxxxxxxxxxxxxxxxxxx, linux-s390@xxxxxxxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx, linux-bcachefs@xxxxxxxxxxxxxxx, linux-btrfs@xxxxxxxxxxxxxxx, linux-erofs@xxxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, gfs2@xxxxxxxxxxxxxxx, linux-nilfs@xxxxxxxxxxxxxxx, yukuai3@xxxxxxxxxx, yi.zhang@xxxxxxxxxx, yangerkun@xxxxxxxxxx
Delivery-date: Wed, 06 Dec 2023 06:14:27 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

> +void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
> +                        pgoff_t end)
> +{
> +     invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
> +}
> +EXPORT_SYMBOL_GPL(invalidate_bdev_range);

All these could probably use kerneldoc comments.

For this one I really don't like it existing at all, but we'll have to
discuss that in the btrfs patch.

> +loff_t bdev_size(struct block_device *bdev)
> +{
> +     loff_t size;
> +
> +     spin_lock(&bdev->bd_size_lock);
> +     size = i_size_read(bdev->bd_inode);
> +     spin_unlock(&bdev->bd_size_lock);
> +
> +     return size;
> +}
> +EXPORT_SYMBOL_GPL(bdev_size);

No need for this one.  The callers can simply use bdev_nr_bytes.

> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
> +{
> +     return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio);
> +
> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
> +                               gfp_t gfp)
> +{
> +     return mapping_read_folio_gfp(bdev->bd_inode->i_mapping, index, gfp);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio_gfp);

I think we can just drop bdev_read_folio_gfp. Half of the callers simply
pass GPK_KERNEL, and the other half passes GFP_NOFS and could just use
memalloc_nofs_save().

> +void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
> +{
> +     return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
> +}
> +EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);

Hmm, this is just used for block2mtd, and feels a little too low-level
to me, as block2mtd really should be using the normal fileread/write
APIs.  I guess we'll have to live with it for now if we want to expedite
killing off bd_inode.

> +void bdev_correlate_mapping(struct block_device *bdev,
> +                         struct address_space *mapping)
> +{
> +     mapping->host = bdev->bd_inode;
> +}
> +EXPORT_SYMBOL_GPL(bdev_correlate_mapping);

Maybe associated insted of correlate?  Either way this basically
fully exposes the bdev inode again :(

> +gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp)
> +{
> +     return mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
> +}
> +EXPORT_SYMBOL_GPL(bdev_gfp_constraint);

The right fix here is to:

 - use memalloc_nofs_save in extet instead of using
   mapping_gfp_constraint to clear it from the mapping flags
 - remove __ext4_sb_bread_gfp and just have buffer.c helper that does
   the right thing (either by changing the calling conventions of an
   existing one, or adding a new one).

> +/*
> + * The del_gendisk() function uninitializes the disk-specific data
> + * structures, including the bdi structure, without telling anyone
> + * else.  Once this happens, any attempt to call mark_buffer_dirty()
> + * (for example, by ext4_commit_super), will cause a kernel OOPS.
> + * This is a kludge to prevent these oops until we can put in a proper
> + * hook in del_gendisk() to inform the VFS and file system layers.
> + */
> +int bdev_ejected(struct block_device *bdev)
> +{
> +     struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
> +
> +     return bdi->dev == NULL;
> +}
> +EXPORT_SYMBOL_GPL(bdev_ejected);

And this code in ext4 should just go away entirely.  The bdi should
always be valid for a live bdev for years.

> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1119,6 +1119,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio 
> *folio, size_t len,
>       WARN_ON_ONCE(off > UINT_MAX);
>       __bio_add_page(bio, &folio->page, len, off);
>  }
> +EXPORT_SYMBOL_GPL(bio_add_folio_nofail);

How is this realted?  The export is fine, but really should be a
separate, well-documented commit.

>  
> +static inline u8 block_bits(struct block_device *bdev)
> +{
> +     return bdev->bd_inode->i_blkbits;
> +}

Not sure we should need this.  i_blkbits comes from the blocksize
the fs set, so it should have other ways to get at it.

Follow-Ups:
- Re: [PATCH -next RFC 01/14] block: add some bdev apis
  - From: Theodore Ts'o
- Re: [PATCH -next RFC 01/14] block: add some bdev apis
  - From: Yu Kuai

References:
- [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules
  - From: Yu Kuai
- [PATCH -next RFC 01/14] block: add some bdev apis
  - From: Yu Kuai

Prev by Date: [xen-unstable test] 183996: tolerable FAIL - PUSHED
Next by Date: Re: [RFC KERNEL PATCH v2 1/3] xen/pci: Add xen_reset_device_state function
Previous by thread: Re: [PATCH -next RFC 01/14] block: add some bdev apis
Next by thread: Re: [PATCH -next RFC 01/14] block: add some bdev apis
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.