summaryrefslogtreecommitdiff
path: root/drivers/md
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2025-10-02 10:16:56 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2025-10-02 10:16:56 -0700
commite1b1d03ceec343362524318c076b110066ffe305 (patch)
treeca71105f13d893118eab4feb826d20cca066c53d /drivers/md
parent5832d26433f2bd0d28f8b12526e3c2fdb203507f (diff)
parent130e6de62107116eba124647116276266be0f84c (diff)
downloadlinux-e1b1d03ceec343362524318c076b110066ffe305.tar.gz
linux-e1b1d03ceec343362524318c076b110066ffe305.tar.bz2
linux-e1b1d03ceec343362524318c076b110066ffe305.zip
Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...
Diffstat (limited to 'drivers/md')
-rw-r--r--drivers/md/Kconfig29
-rw-r--r--drivers/md/Makefile4
-rw-r--r--drivers/md/bcache/debug.c3
-rw-r--r--drivers/md/bcache/io.c3
-rw-r--r--drivers/md/bcache/journal.c2
-rw-r--r--drivers/md/bcache/movinggc.c8
-rw-r--r--drivers/md/bcache/super.c2
-rw-r--r--drivers/md/bcache/writeback.c8
-rw-r--r--drivers/md/dm-bufio.c2
-rw-r--r--drivers/md/dm-flakey.c2
-rw-r--r--drivers/md/dm-raid.c18
-rw-r--r--drivers/md/dm-vdo/vio.c2
-rw-r--r--drivers/md/dm.c4
-rw-r--r--drivers/md/md-bitmap.c89
-rw-r--r--drivers/md/md-bitmap.h107
-rw-r--r--drivers/md/md-cluster.c2
-rw-r--r--drivers/md/md-linear.c14
-rw-r--r--drivers/md/md-llbitmap.c1626
-rw-r--r--drivers/md/md.c382
-rw-r--r--drivers/md/md.h24
-rw-r--r--drivers/md/raid0.c30
-rw-r--r--drivers/md/raid1-10.c2
-rw-r--r--drivers/md/raid1.c119
-rw-r--r--drivers/md/raid1.h4
-rw-r--r--drivers/md/raid10.c107
-rw-r--r--drivers/md/raid10.h2
-rw-r--r--drivers/md/raid5.c74
27 files changed, 2333 insertions, 336 deletions
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index ddb37f6670de..07c19b2182ca 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -37,6 +37,32 @@ config BLK_DEV_MD
If unsure, say N.
+config MD_BITMAP
+ bool "MD RAID bitmap support"
+ default y
+ depends on BLK_DEV_MD
+ help
+ If you say Y here, support for the write intent bitmap will be
+ enabled. The bitmap can be used to optimize resync speed after power
+ failure or readding a disk, limiting it to recorded dirty sectors in
+ bitmap.
+
+ This feature can be added to existing MD array or MD array can be
+ created with bitmap via mdadm(8).
+
+ If unsure, say Y.
+
+config MD_LLBITMAP
+ bool "MD RAID lockless bitmap support"
+ depends on BLK_DEV_MD
+ help
+ If you say Y here, support for the lockless write intent bitmap will
+ be enabled.
+
+ Note, this is an experimental feature.
+
+ If unsure, say N.
+
config MD_AUTODETECT
bool "Autodetect RAID arrays during kernel boot"
depends on BLK_DEV_MD=y
@@ -54,6 +80,7 @@ config MD_AUTODETECT
config MD_BITMAP_FILE
bool "MD bitmap file support (deprecated)"
default y
+ depends on MD_BITMAP
help
If you say Y here, support for write intent bitmaps in files on an
external file system is enabled. This is an alternative to the internal
@@ -174,6 +201,7 @@ config MD_RAID456
config MD_CLUSTER
tristate "Cluster Support for MD"
+ select MD_BITMAP
depends on BLK_DEV_MD
depends on DLM
default n
@@ -393,6 +421,7 @@ config DM_RAID
select MD_RAID1
select MD_RAID10
select MD_RAID456
+ select MD_BITMAP
select BLK_DEV_MD
help
A dm target that supports RAID1, RAID10, RAID4, RAID5 and RAID6 mappings
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 87bdfc9fe14c..5a51b3408b70 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -27,7 +27,9 @@ dm-clone-y += dm-clone-target.o dm-clone-metadata.o
dm-verity-y += dm-verity-target.o
dm-zoned-y += dm-zoned-target.o dm-zoned-metadata.o dm-zoned-reclaim.o
-md-mod-y += md.o md-bitmap.o
+md-mod-y += md.o
+md-mod-$(CONFIG_MD_BITMAP) += md-bitmap.o
+md-mod-$(CONFIG_MD_LLBITMAP) += md-llbitmap.o
raid456-y += raid5.o raid5-cache.o raid5-ppl.o
linear-y += md-linear.o
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 7510d1c983a5..f327456fc4e0 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -115,8 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_kmalloc(nr_segs, GFP_NOIO);
if (!check)
return;
- bio_init(check, bio->bi_bdev, check->bi_inline_vecs, nr_segs,
- REQ_OP_READ);
+ bio_init_inline(check, bio->bi_bdev, nr_segs, REQ_OP_READ);
check->bi_iter.bi_sector = bio->bi_iter.bi_sector;
check->bi_iter.bi_size = bio->bi_iter.bi_size;
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 020712c5203f..2386d08bf4e4 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -26,8 +26,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
struct bbio *b = mempool_alloc(&c->bio_meta, GFP_NOIO);
struct bio *bio = &b->bio;
- bio_init(bio, NULL, bio->bi_inline_vecs,
- meta_bucket_pages(&c->cache->sb), 0);
+ bio_init_inline(bio, NULL, meta_bucket_pages(&c->cache->sb), 0);
return bio;
}
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 7ff14bd2feb8..d50eb82ccb4f 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -615,7 +615,7 @@ static void do_journal_discard(struct cache *ca)
atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT);
- bio_init(bio, ca->bdev, bio->bi_inline_vecs, 1, REQ_OP_DISCARD);
+ bio_init_inline(bio, ca->bdev, 1, REQ_OP_DISCARD);
bio->bi_iter.bi_sector = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_iter.bi_size = bucket_bytes(ca);
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index 26a6a535ec32..73918e55bf04 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -79,7 +79,7 @@ static void moving_init(struct moving_io *io)
{
struct bio *bio = &io->bio.bio;
- bio_init(bio, NULL, bio->bi_inline_vecs,
+ bio_init_inline(bio, NULL,
DIV_ROUND_UP(KEY_SIZE(&io->w->key), PAGE_SECTORS), 0);
bio_get(bio);
bio->bi_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0);
@@ -145,9 +145,9 @@ static void read_moving(struct cache_set *c)
continue;
}
- io = kzalloc(struct_size(io, bio.bio.bi_inline_vecs,
- DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS)),
- GFP_KERNEL);
+ io = kzalloc(sizeof(*io) + sizeof(struct bio_vec) *
+ DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS),
+ GFP_KERNEL);
if (!io)
goto err;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1492c8552255..6d250e366412 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2236,7 +2236,7 @@ static int cache_alloc(struct cache *ca)
__module_get(THIS_MODULE);
kobject_init(&ca->kobj, &bch_cache_ktype);
- bio_init(&ca->journal.bio, NULL, ca->journal.bio.bi_inline_vecs, 8, 0);
+ bio_init_inline(&ca->journal.bio, NULL, 8, 0);
/*
* When the cache disk is first registered, ca->sb.njournal_buckets
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 302e75f1fc4b..6ba73dc1a3df 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -331,7 +331,7 @@ static void dirty_init(struct keybuf_key *w)
struct dirty_io *io = w->private;
struct bio *bio = &io->bio;
- bio_init(bio, NULL, bio->bi_inline_vecs,
+ bio_init_inline(bio, NULL,
DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), 0);
if (!io->dc->writeback_percent)
bio->bi_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0);
@@ -536,9 +536,9 @@ static void read_dirty(struct cached_dev *dc)
for (i = 0; i < nk; i++) {
w = keys[i];
- io = kzalloc(struct_size(io, bio.bi_inline_vecs,
- DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS)),
- GFP_KERNEL);
+ io = kzalloc(sizeof(*io) + sizeof(struct bio_vec) *
+ DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS),
+ GFP_KERNEL);
if (!io)
goto err;
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index ff7595caf440..8f3a23f4b168 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1342,7 +1342,7 @@ static void use_bio(struct dm_buffer *b, enum req_op op, sector_t sector,
use_dmio(b, op, sector, n_sectors, offset, ioprio);
return;
}
- bio_init(bio, b->c->bdev, bio->bi_inline_vecs, 1, op);
+ bio_init_inline(bio, b->c->bdev, 1, op);
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_complete;
bio->bi_private = b;
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index cf17fd46e255..08925aca838c 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -441,7 +441,7 @@ static struct bio *clone_bio(struct dm_target *ti, struct flakey_c *fc, struct b
if (!clone)
return NULL;
- bio_init(clone, fc->dev->bdev, clone->bi_inline_vecs, nr_iovecs, bio->bi_opf);
+ bio_init_inline(clone, fc->dev->bdev, nr_iovecs, bio->bi_opf);
clone->bi_iter.bi_sector = flakey_map_sector(ti, bio->bi_iter.bi_sector);
clone->bi_private = bio;
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index f4b904e24328..0a1788fed68c 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3955,9 +3955,11 @@ static int __load_dirty_region_bitmap(struct raid_set *rs)
!test_and_set_bit(RT_FLAG_RS_BITMAP_LOADED, &rs->runtime_flags)) {
struct mddev *mddev = &rs->md;
- r = mddev->bitmap_ops->load(mddev);
- if (r)
- DMERR("Failed to load bitmap");
+ if (md_bitmap_enabled(mddev, false)) {
+ r = mddev->bitmap_ops->load(mddev);
+ if (r)
+ DMERR("Failed to load bitmap");
+ }
}
return r;
@@ -4072,10 +4074,12 @@ static int raid_preresume(struct dm_target *ti)
mddev->bitmap_info.chunksize != to_bytes(rs->requested_bitmap_chunk_sectors)))) {
int chunksize = to_bytes(rs->requested_bitmap_chunk_sectors) ?: mddev->bitmap_info.chunksize;
- r = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors,
- chunksize, false);
- if (r)
- DMERR("Failed to resize bitmap");
+ if (md_bitmap_enabled(mddev, false)) {
+ r = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors,
+ chunksize);
+ if (r)
+ DMERR("Failed to resize bitmap");
+ }
}
/* Check for any resize/reshape on @rs and adjust/initiate */
diff --git a/drivers/md/dm-vdo/vio.c b/drivers/md/dm-vdo/vio.c
index e7f4153e55e3..8fc22fb14196 100644
--- a/drivers/md/dm-vdo/vio.c
+++ b/drivers/md/dm-vdo/vio.c
@@ -212,7 +212,7 @@ int vio_reset_bio_with_size(struct vio *vio, char *data, int size, bio_end_io_t
return VDO_SUCCESS;
bio->bi_ioprio = 0;
- bio->bi_io_vec = bio->bi_inline_vecs;
+ bio->bi_io_vec = bio_inline_vecs(bio);
bio->bi_max_vecs = vio->block_count + 1;
if (VDO_ASSERT(size <= vio_size, "specified size %d is not greater than allocated %d",
size, vio_size) != VDO_SUCCESS)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a44e8c2dccee..7bd6fa05b00a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -403,9 +403,9 @@ static void do_deferred_remove(struct work_struct *w)
dm_deferred_remove();
}
-static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
+static int dm_blk_getgeo(struct gendisk *disk, struct hd_geometry *geo)
{
- struct mapped_device *md = bdev->bd_disk->private_data;
+ struct mapped_device *md = disk->private_data;
return dm_get_geometry(md, geo);
}
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 334b71404930..84b7e2af6dba 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -34,15 +34,6 @@
#include "md-bitmap.h"
#include "md-cluster.h"
-#define BITMAP_MAJOR_LO 3
-/* version 4 insists the bitmap is in little-endian order
- * with version 3, it is host-endian which is non-portable
- * Version 5 is currently set only for clustered devices
- */
-#define BITMAP_MAJOR_HI 4
-#define BITMAP_MAJOR_CLUSTERED 5
-#define BITMAP_MAJOR_HOSTENDIAN 3
-
/*
* in-memory bitmap:
*
@@ -224,6 +215,8 @@ struct bitmap {
int cluster_slot;
};
+static struct workqueue_struct *md_bitmap_wq;
+
static int __bitmap_resize(struct bitmap *bitmap, sector_t blocks,
int chunksize, bool init);
@@ -232,20 +225,19 @@ static inline char *bmname(struct bitmap *bitmap)
return bitmap->mddev ? mdname(bitmap->mddev) : "mdX";
}
-static bool __bitmap_enabled(struct bitmap *bitmap)
-{
- return bitmap->storage.filemap &&
- !test_bit(BITMAP_STALE, &bitmap->flags);
-}
-
-static bool bitmap_enabled(struct mddev *mddev)
+static bool bitmap_enabled(void *data, bool flush)
{
- struct bitmap *bitmap = mddev->bitmap;
+ struct bitmap *bitmap = data;
- if (!bitmap)
- return false;
+ if (!flush)
+ return true;
- return __bitmap_enabled(bitmap);
+ /*
+ * If caller want to flush bitmap pages to underlying disks, check if
+ * there are cached pages in filemap.
+ */
+ return !test_bit(BITMAP_STALE, &bitmap->flags) &&
+ bitmap->storage.filemap != NULL;
}
/*
@@ -484,7 +476,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
return -EINVAL;
}
- md_super_write(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit), page);
+ md_write_metadata(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit),
+ page, 0);
return 0;
}
@@ -1244,7 +1237,7 @@ static void __bitmap_unplug(struct bitmap *bitmap)
int dirty, need_write;
int writing = 0;
- if (!__bitmap_enabled(bitmap))
+ if (!bitmap_enabled(bitmap, true))
return;
/* look at each page to see if there are any set bits that need to be
@@ -1788,15 +1781,9 @@ static bool __bitmap_start_sync(struct bitmap *bitmap, sector_t offset,
sector_t *blocks, bool degraded)
{
bitmap_counter_t *bmc;
- bool rv;
+ bool rv = false;
- if (bitmap == NULL) {/* FIXME or bitmap set as 'failed' */
- *blocks = 1024;
- return true; /* always resync if no bitmap */
- }
spin_lock_irq(&bitmap->counts.lock);
-
- rv = false;
bmc = md_bitmap_get_counter(&bitmap->counts, offset, blocks, 0);
if (bmc) {
/* locked */
@@ -1845,10 +1832,6 @@ static void __bitmap_end_sync(struct bitmap *bitmap, sector_t offset,
bitmap_counter_t *bmc;
unsigned long flags;
- if (bitmap == NULL) {
- *blocks = 1024;
- return;
- }
spin_lock_irqsave(&bitmap->counts.lock, flags);
bmc = md_bitmap_get_counter(&bitmap->counts, offset, blocks, 0);
if (bmc == NULL)
@@ -2060,9 +2043,6 @@ static void bitmap_start_behind_write(struct mddev *mddev)
struct bitmap *bitmap = mddev->bitmap;
int bw;
- if (!bitmap)
- return;
-
atomic_inc(&bitmap->behind_writes);
bw = atomic_read(&bitmap->behind_writes);
if (bw > bitmap->behind_writes_used)
@@ -2076,9 +2056,6 @@ static void bitmap_end_behind_write(struct mddev *mddev)
{
struct bitmap *bitmap = mddev->bitmap;
- if (!bitmap)
- return;
-
if (atomic_dec_and_test(&bitmap->behind_writes))
wake_up(&bitmap->behind_wait);
pr_debug("dec write-behind count %d/%lu\n",
@@ -2593,15 +2570,14 @@ err:
return ret;
}
-static int bitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize,
- bool init)
+static int bitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
{
struct bitmap *bitmap = mddev->bitmap;
if (!bitmap)
return 0;
- return __bitmap_resize(bitmap, blocks, chunksize, init);
+ return __bitmap_resize(bitmap, blocks, chunksize, false);
}
static ssize_t
@@ -2990,12 +2966,19 @@ static struct attribute *md_bitmap_attrs[] = {
&max_backlog_used.attr,
NULL
};
-const struct attribute_group md_bitmap_group = {
+
+static struct attribute_group md_bitmap_group = {
.name = "bitmap",
.attrs = md_bitmap_attrs,
};
static struct bitmap_operations bitmap_ops = {
+ .head = {
+ .type = MD_BITMAP,
+ .id = ID_BITMAP,
+ .name = "bitmap",
+ },
+
.enabled = bitmap_enabled,
.create = bitmap_create,
.resize = bitmap_resize,
@@ -3013,6 +2996,9 @@ static struct bitmap_operations bitmap_ops = {
.start_write = bitmap_start_write,
.end_write = bitmap_end_write,
+ .start_discard = bitmap_start_write,
+ .end_discard = bitmap_end_write,
+
.start_sync = bitmap_start_sync,
.end_sync = bitmap_end_sync,
.cond_end_sync = bitmap_cond_end_sync,
@@ -3026,9 +3012,22 @@ static struct bitmap_operations bitmap_ops = {
.copy_from_slot = bitmap_copy_from_slot,
.set_pages = bitmap_set_pages,
.free = md_bitmap_free,
+
+ .group = &md_bitmap_group,
};
-void mddev_set_bitmap_ops(struct mddev *mddev)
+int md_bitmap_init(void)
+{
+ md_bitmap_wq = alloc_workqueue("md_bitmap", WQ_MEM_RECLAIM | WQ_UNBOUND,
+ 0);
+ if (!md_bitmap_wq)
+ return -ENOMEM;
+
+ return register_md_submodule(&bitmap_ops.head);
+}
+
+void md_bitmap_exit(void)
{
- mddev->bitmap_ops = &bitmap_ops;
+ destroy_workqueue(md_bitmap_wq);
+ unregister_md_submodule(&bitmap_ops.head);
}
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 59e9dd45cfde..b42a28fa83a0 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -9,10 +9,26 @@
#define BITMAP_MAGIC 0x6d746962
+/*
+ * version 3 is host-endian order, this is deprecated and not used for new
+ * array
+ */
+#define BITMAP_MAJOR_LO 3
+#define BITMAP_MAJOR_HOSTENDIAN 3
+/* version 4 is little-endian order, the default value */
+#define BITMAP_MAJOR_HI 4
+/* version 5 is only used for cluster */
+#define BITMAP_MAJOR_CLUSTERED 5
+/* version 6 is only used for lockless bitmap */
+#define BITMAP_MAJOR_LOCKLESS 6
+
/* use these for bitmap->flags and bitmap->sb->state bit-fields */
enum bitmap_state {
- BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
+ BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
BITMAP_WRITE_ERROR = 2, /* A write error has occurred */
+ BITMAP_FIRST_USE = 3, /* llbitmap is just created */
+ BITMAP_CLEAN = 4, /* llbitmap is created with assume_clean */
+ BITMAP_DAEMON_BUSY = 5, /* llbitmap daemon is not finished after daemon_sleep */
BITMAP_HOSTENDIAN =15,
};
@@ -61,11 +77,15 @@ struct md_bitmap_stats {
struct file *file;
};
+typedef void (md_bitmap_fn)(struct mddev *mddev, sector_t offset,
+ unsigned long sectors);
+
struct bitmap_operations {
- bool (*enabled)(struct mddev *mddev);
+ struct md_submodule_head head;
+
+ bool (*enabled)(void *data, bool flush);
int (*create)(struct mddev *mddev);
- int (*resize)(struct mddev *mddev, sector_t blocks, int chunksize,
- bool init);
+ int (*resize)(struct mddev *mddev, sector_t blocks, int chunksize);
int (*load)(struct mddev *mddev);
void (*destroy)(struct mddev *mddev);
@@ -80,10 +100,13 @@ struct bitmap_operations {
void (*end_behind_write)(struct mddev *mddev);
void (*wait_behind_writes)(struct mddev *mddev);
- void (*start_write)(struct mddev *mddev, sector_t offset,
- unsigned long sectors);
- void (*end_write)(struct mddev *mddev, sector_t offset,
- unsigned long sectors);
+ md_bitmap_fn *start_write;
+ md_bitmap_fn *end_write;
+ md_bitmap_fn *start_discard;
+ md_bitmap_fn *end_discard;
+
+ sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
+ bool (*blocks_synced)(struct mddev *mddev, sector_t offset);
bool (*start_sync)(struct mddev *mddev, sector_t offset,
sector_t *blocks, bool degraded);
void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
@@ -101,9 +124,75 @@ struct bitmap_operations {
sector_t *hi, bool clear_bits);
void (*set_pages)(void *data, unsigned long pages);
void (*free)(void *data);
+
+ struct attribute_group *group;
};
/* the bitmap API */
-void mddev_set_bitmap_ops(struct mddev *mddev);
+static inline bool md_bitmap_registered(struct mddev *mddev)
+{
+ return mddev->bitmap_ops != NULL;
+}
+
+static inline bool md_bitmap_enabled(struct mddev *mddev, bool flush)
+{
+ /* bitmap_ops must be registered before creating bitmap. */
+ if (!md_bitmap_registered(mddev))
+ return false;
+
+ if (!mddev->bitmap)
+ return false;
+
+ return mddev->bitmap_ops->enabled(mddev->bitmap, flush);
+}
+
+static inline bool md_bitmap_start_sync(struct mddev *mddev, sector_t offset,
+ sector_t *blocks, bool degraded)
+{
+ /* always resync if no bitmap */
+ if (!md_bitmap_enabled(mddev, false)) {
+ *blocks = 1024;
+ return true;
+ }
+
+ return mddev->bitmap_ops->start_sync(mddev, offset, blocks, degraded);
+}
+
+static inline void md_bitmap_end_sync(struct mddev *mddev, sector_t offset,
+ sector_t *blocks)
+{
+ if (!md_bitmap_enabled(mddev, false)) {
+ *blocks = 1024;
+ return;
+ }
+
+ mddev->bitmap_ops->end_sync(mddev, offset, blocks);
+}
+
+#ifdef CONFIG_MD_BITMAP
+int md_bitmap_init(void);
+void md_bitmap_exit(void);
+#else
+static inline int md_bitmap_init(void)
+{
+ return 0;
+}
+static inline void md_bitmap_exit(void)
+{
+}
+#endif
+
+#ifdef CONFIG_MD_LLBITMAP
+int md_llbitmap_init(void);
+void md_llbitmap_exit(void);
+#else
+static inline int md_llbitmap_init(void)
+{
+ return 0;
+}
+static inline void md_llbitmap_exit(void)
+{
+}
+#endif
#endif
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 6e9a0045f0ff..11f1e91d387d 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -630,7 +630,7 @@ static int process_recvd_msg(struct mddev *mddev, struct cluster_msg *msg)
if (le64_to_cpu(msg->high) != mddev->pers->size(mddev, 0, 0))
ret = mddev->bitmap_ops->resize(mddev,
le64_to_cpu(msg->high),
- 0, false);
+ 0);
break;
default:
ret = -1;
diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
index 3e1f165c2d20..7033d982d377 100644
--- a/drivers/md/md-linear.c
+++ b/drivers/md/md-linear.c
@@ -257,18 +257,10 @@ static bool linear_make_request(struct mddev *mddev, struct bio *bio)
if (unlikely(bio_end_sector(bio) > end_sector)) {
/* This bio crosses a device boundary, so we have to split it */
- struct bio *split = bio_split(bio, end_sector - bio_sector,
- GFP_NOIO, &mddev->bio_set);
-
- if (IS_ERR(split)) {
- bio->bi_status = errno_to_blk_status(PTR_ERR(split));
- bio_endio(bio);
+ bio = bio_submit_split_bioset(bio, end_sector - bio_sector,
+ &mddev->bio_set);
+ if (!bio)
return true;
- }
-
- bio_chain(split, bio);
- submit_bio_noacct(bio);
- bio = split;
}
md_account_bio(mddev, &bio);
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
new file mode 100644
index 000000000000..1eb434306162
--- /dev/null
+++ b/drivers/md/md-llbitmap.c
@@ -0,0 +1,1626 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/timer.h>
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/file.h>
+#include <linux/seq_file.h>
+#include <trace/events/block.h>
+
+#include "md.h"
+#include "md-bitmap.h"
+
+/*
+ * #### Background
+ *
+ * Redundant data is used to enhance data fault tolerance, and the storage
+ * methods for redundant data vary depending on the RAID levels. And it's
+ * important to maintain the consistency of redundant data.
+ *
+ * Bitmap is used to record which data blocks have been synchronized and which
+ * ones need to be resynchronized or recovered. Each bit in the bitmap
+ * represents a segment of data in the array. When a bit is set, it indicates
+ * that the multiple redundant copies of that data segment may not be
+ * consistent. Data synchronization can be performed based on the bitmap after
+ * power failure or readding a disk. If there is no bitmap, a full disk
+ * synchronization is required.
+ *
+ * #### Key Features
+ *
+ * - IO fastpath is lockless, if user issues lots of write IO to the same
+ * bitmap bit in a short time, only the first write has additional overhead
+ * to update bitmap bit, no additional overhead for the following writes;
+ * - support only resync or recover written data, means in the case creating
+ * new array or replacing with a new disk, there is no need to do a full disk
+ * resync/recovery;
+ *
+ * #### Key Concept
+ *
+ * ##### State Machine
+ *
+ * Each bit is one byte, contain 6 different states, see llbitmap_state. And
+ * there are total 8 different actions, see llbitmap_action, can change state:
+ *
+ * llbitmap state m