From bf779fb9afb5c5cc3c45d19a7a1ea7cd77c742f0 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:06 +0900 Subject: zram: introduce ZRAM_PP_SLOT flag Patch series "zram: optimal post-processing target selection", v5. Problem: -------- Both recompression and writeback perform a very simple linear scan of all zram slots in search for post-processing (writeback or recompress) candidate slots. This often means that we pick the worst candidate for pp (post-processing), e.g. a 48 bytes object for writeback, which is nearly useless, because it only releases 48 bytes from zsmalloc pool, but consumes an entire 4K slot in the backing device. Similarly, recompression of an 48 bytes objects is unlikely to save more memory that recompression of a 3000 bytes object. Both recompression and writeback consume constrained resources (CPU time, batter, backing device storage space) and quite often have a (daily) limit on the number of items they post-process, so we should utilize those constrained resources in the most optimal way. Solution: --------- This patch reworks the way we select pp targets. We, quite clearly, want to sort all the candidates and always pick the largest, be it recompression or writeback. Especially for writeback, because the larger object we writeback the more memory we release. This series introduces concept of pp buckets and pp scan/selection. The scan step is a simple iteration over all zram->table entries, just like what we currently do, but we don't post-process a candidate slot immediately. Instead we assign it to a PP (post-processing) bucket. PP bucket is, basically, a list which holds pp candidate slots that belong to the same size class. PP buckets are 64 bytes apart, slots are not strictly sorted within a bucket there is a 64 bytes variance. The select step simply iterates over pp buckets from highest to lowest and picks all candidate slots a particular buckets contains. So this gives us sorted candidates (in linear time) and allows us to select most optimal (largest) candidates for post-processing first. This patch (of 7): This flag indicates that the slot was selected as a candidate slot for post-processing (pp) and was assigned to a pp bucket. It does not necessarily mean that the slot is currently under post-processing, but may mean so. The slot can loose its PP_SLOT flag, while still being in the pp-bucket, if it's accessed or slot_free-ed. Link: https://lkml.kernel.org/r/20240917021020.883356-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20240917021020.883356-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 2 ++ drivers/block/zram/zram_drv.h | 1 + 2 files changed, 3 insertions(+) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index ad9c9bc3ccfc..d61750c1c5b5 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -178,6 +178,7 @@ static inline u32 zram_get_priority(struct zram *zram, u32 index) static void zram_accessed(struct zram *zram, u32 index) { zram_clear_flag(zram, index, ZRAM_IDLE); + zram_clear_flag(zram, index, ZRAM_PP_SLOT); #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME zram->table[index].ac_time = ktime_get_boottime(); #endif @@ -1354,6 +1355,7 @@ static void zram_free_page(struct zram *zram, size_t index) zram_clear_flag(zram, index, ZRAM_INCOMPRESSIBLE); zram_set_priority(zram, index, 0); + zram_clear_flag(zram, index, ZRAM_PP_SLOT); if (zram_test_flag(zram, index, ZRAM_WB)) { zram_clear_flag(zram, index, ZRAM_WB); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index cfc8c059db63..914cb6629969 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -48,6 +48,7 @@ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_UNDER_WB, /* page is under writeback */ + ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ ZRAM_IDLE, /* not accessed page since last idle marking */ ZRAM_INCOMPRESSIBLE, /* none of the algorithms could compress it */ -- cgit v1.2.3 From 58652f2b6d21f2874c9f060165ec7e03e8b1fc71 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:07 +0900 Subject: zram: permit only one post-processing operation at a time Both recompress and writeback soon will unlock slots during processing, which makes things too complex wrt possible race-conditions. We still want to clear PP_SLOT in slot_free, because this is how we figure out that slot that was selected for post-processing has been released under us and when we start post-processing we check if slot still has PP_SLOT set. At the same time, theoretically, we can have something like this: CPU0 CPU1 recompress scan slots set PP_SLOT unlock slot slot_free clear PP_SLOT allocate PP_SLOT writeback scan slots set PP_SLOT unlock slot select PP-slot test PP_SLOT So recompress will not detect that slot has been re-used and re-selected for concurrent writeback post-processing. Make sure that we only permit on post-processing operation at a time. So now recompress and writeback post-processing don't race against each other, we only need to handle slot re-use (slot_free and write), which is handled individually by each pp operation. Having recompress and writeback competing for the same slots is not exactly good anyway (can't imagine anyone doing that). Link: https://lkml.kernel.org/r/20240917021020.883356-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 16 ++++++++++++++++ drivers/block/zram/zram_drv.h | 1 + 2 files changed, 17 insertions(+) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index d61750c1c5b5..37a284f709ba 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -627,6 +627,12 @@ static ssize_t writeback_store(struct device *dev, goto release_init_lock; } + /* Do not permit concurrent post-processing actions. */ + if (atomic_xchg(&zram->pp_in_progress, 1)) { + up_read(&zram->init_lock); + return -EAGAIN; + } + if (!zram->backing_dev) { ret = -ENODEV; goto release_init_lock; @@ -753,6 +759,7 @@ next: free_block_bdev(zram, blk_idx); __free_page(page); release_init_lock: + atomic_set(&zram->pp_in_progress, 0); up_read(&zram->init_lock); return ret; @@ -1883,6 +1890,12 @@ static ssize_t recompress_store(struct device *dev, goto release_init_lock; } + /* Do not permit concurrent post-processing actions. */ + if (atomic_xchg(&zram->pp_in_progress, 1)) { + up_read(&zram->init_lock); + return -EAGAIN; + } + if (algo) { bool found = false; @@ -1950,6 +1963,7 @@ next: __free_page(page); release_init_lock: + atomic_set(&zram->pp_in_progress, 0); up_read(&zram->init_lock); return ret; } @@ -2146,6 +2160,7 @@ static void zram_reset_device(struct zram *zram) zram->disksize = 0; zram_destroy_comps(zram); memset(&zram->stats, 0, sizeof(zram->stats)); + atomic_set(&zram->pp_in_progress, 0); reset_bdev(zram); comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor); @@ -2383,6 +2398,7 @@ static int zram_add(void) zram->disk->fops = &zram_devops; zram->disk->private_data = zram; snprintf(zram->disk->disk_name, 16, "zram%d", device_id); + atomic_set(&zram->pp_in_progress, 0); /* Actual capacity set using sysfs (/sys/block/zram/disksize */ set_capacity(zram->disk, 0); diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 914cb6629969..73a9d47d76ba 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -140,5 +140,6 @@ struct zram { #ifdef CONFIG_ZRAM_MEMORY_TRACKING struct dentry *debugfs_dir; #endif + atomic_t pp_in_progress; }; #endif -- cgit v1.2.3 From 3f909a60cec19509f6bfa01f90ad878e410cec51 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:08 +0900 Subject: zram: rework recompress target selection strategy Target slot selection for recompression is just a simple iteration over zram->table entries (stored pages) from slot 0 to max slot. Given that zram->table slots are written in random order and are not sorted by size, a simple iteration over slots selects suboptimal targets for recompression. This is not a problem if we recompress every single zram->table slot, but we never do that in reality. In reality we limit the number of slots we can recompress (via max_pages parameter) and hence proper slot selection becomes very important. The strategy is quite simple, suppose we have two candidate slots for recompression, one of size 48 bytes and one of size 2800 bytes, and we can recompress only one, then it certainly makes more sense to pick 2800 entry for recompression. Because even if we manage to compress 48 bytes objects even further the savings are going to be very small. Potential savings after good re-compression of 2800 bytes objects are much higher. This patch reworks slot selection and introduces the strategy described above: among candidate slots always select the biggest ones first. For that the patch introduces zram_pp_ctl (post-processing) structure which holds NUM_PP_BUCKETS pp buckets of slots. Slots are assigned to a particular group based on their sizes - the larger the size of the slot the higher the group index. This, basically, sorts slots by size in liner time (we still perform just one iteration over zram->table slots). When we select slot for recompression we always first lookup in higher pp buckets (those that hold the largest slots). Which achieves the desired behavior. TEST ==== A very simple demonstration: zram is configured with zstd, and zstd with dict as a recompression stream. A limited (max 4096 pages) recompression is performed then, with a log of sizes of slots that were recompressed. You can see that patched zram selects slots for recompression in significantly different manner, which leads to higher memory savings (see column #2 of mm_stat output). BASE ---- *** initial state of zram device /sys/block/zram0/mm_stat 1750994944 504491413 514203648 0 514203648 1 0 34204 34204 *** recompress idle max_pages=4096 /sys/block/zram0/mm_stat 1750994944 504262229 514953216 0 514203648 1 0 34204 34204 Sizes of selected objects for recompression: ... 45 58 24 226 91 40 24 24 24 424 2104 93 2078 2078 2078 959 154 ... PATCHED ------- *** initial state of zram device /sys/block/zram0/mm_stat 1750982656 504492801 514170880 0 514170880 1 0 34204 34204 *** recompress idle max_pages=4096 /sys/block/zram0/mm_stat 1750982656 503716710 517586944 0 514170880 1 0 34204 34204 Sizes of selected objects for recompression: ... 3680 3694 3667 3590 3614 3553 3537 3548 3550 3542 3543 3537 ... Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE variation of sizes within particular bucket. [senozhatsky@chromium.org: do not skip the first bucket] Link: https://lkml.kernel.org/r/20241001085634.1948384-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20240917021020.883356-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Cc: Dan Carpenter Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 187 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 160 insertions(+), 27 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 37a284f709ba..f57ffb920166 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -184,6 +184,99 @@ static void zram_accessed(struct zram *zram, u32 index) #endif } +#ifdef CONFIG_ZRAM_MULTI_COMP +struct zram_pp_slot { + unsigned long index; + struct list_head entry; +}; + +/* + * A post-processing bucket is, essentially, a size class, this defines + * the range (in bytes) of pp-slots sizes in particular bucket. + */ +#define PP_BUCKET_SIZE_RANGE 64 +#define NUM_PP_BUCKETS ((PAGE_SIZE / PP_BUCKET_SIZE_RANGE) + 1) + +struct zram_pp_ctl { + struct list_head pp_buckets[NUM_PP_BUCKETS]; +}; + +static struct zram_pp_ctl *init_pp_ctl(void) +{ + struct zram_pp_ctl *ctl; + u32 idx; + + ctl = kmalloc(sizeof(*ctl), GFP_KERNEL); + if (!ctl) + return NULL; + + for (idx = 0; idx < NUM_PP_BUCKETS; idx++) + INIT_LIST_HEAD(&ctl->pp_buckets[idx]); + return ctl; +} + +static void release_pp_slot(struct zram *zram, struct zram_pp_slot *pps) +{ + list_del_init(&pps->entry); + + zram_slot_lock(zram, pps->index); + zram_clear_flag(zram, pps->index, ZRAM_PP_SLOT); + zram_slot_unlock(zram, pps->index); + + kfree(pps); +} + +static void release_pp_ctl(struct zram *zram, struct zram_pp_ctl *ctl) +{ + u32 idx; + + if (!ctl) + return; + + for (idx = 0; idx < NUM_PP_BUCKETS; idx++) { + while (!list_empty(&ctl->pp_buckets[idx])) { + struct zram_pp_slot *pps; + + pps = list_first_entry(&ctl->pp_buckets[idx], + struct zram_pp_slot, + entry); + release_pp_slot(zram, pps); + } + } + + kfree(ctl); +} + +static void place_pp_slot(struct zram *zram, struct zram_pp_ctl *ctl, + struct zram_pp_slot *pps) +{ + u32 idx; + + idx = zram_get_obj_size(zram, pps->index) / PP_BUCKET_SIZE_RANGE; + list_add(&pps->entry, &ctl->pp_buckets[idx]); + + zram_set_flag(zram, pps->index, ZRAM_PP_SLOT); +} + +static struct zram_pp_slot *select_pp_slot(struct zram_pp_ctl *ctl) +{ + struct zram_pp_slot *pps = NULL; + s32 idx = NUM_PP_BUCKETS - 1; + + /* The higher the bucket id the more optimal slot post-processing is */ + while (idx >= 0) { + pps = list_first_entry_or_null(&ctl->pp_buckets[idx], + struct zram_pp_slot, + entry); + if (pps) + break; + + idx--; + } + return pps; +} +#endif + static inline void update_used_max(struct zram *zram, const unsigned long pages) { @@ -1657,6 +1750,52 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, } #ifdef CONFIG_ZRAM_MULTI_COMP +#define RECOMPRESS_IDLE (1 << 0) +#define RECOMPRESS_HUGE (1 << 1) + +static int scan_slots_for_recompress(struct zram *zram, u32 mode, + struct zram_pp_ctl *ctl) +{ + unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; + struct zram_pp_slot *pps = NULL; + unsigned long index; + + for (index = 0; index < nr_pages; index++) { + if (!pps) + pps = kmalloc(sizeof(*pps), GFP_KERNEL); + if (!pps) + return -ENOMEM; + + INIT_LIST_HEAD(&pps->entry); + + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index)) + goto next; + + if (mode & RECOMPRESS_IDLE && + !zram_test_flag(zram, index, ZRAM_IDLE)) + goto next; + + if (mode & RECOMPRESS_HUGE && + !zram_test_flag(zram, index, ZRAM_HUGE)) + goto next; + + if (zram_test_flag(zram, index, ZRAM_WB) || + zram_test_flag(zram, index, ZRAM_SAME) || + zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) + goto next; + + pps->index = index; + place_pp_slot(zram, ctl, pps); + pps = NULL; +next: + zram_slot_unlock(zram, index); + } + + kfree(pps); + return 0; +} + /* * This function will decompress (unless it's ZRAM_HUGE) the page and then * attempt to compress it using provided compression algorithm priority @@ -1664,7 +1803,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, * * Corresponding ZRAM slot should be locked. */ -static int zram_recompress(struct zram *zram, u32 index, struct page *page, +static int recompress_slot(struct zram *zram, u32 index, struct page *page, u64 *num_recomp_pages, u32 threshold, u32 prio, u32 prio_max) { @@ -1807,20 +1946,17 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page, return 0; } -#define RECOMPRESS_IDLE (1 << 0) -#define RECOMPRESS_HUGE (1 << 1) - static ssize_t recompress_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { u32 prio = ZRAM_SECONDARY_COMP, prio_max = ZRAM_MAX_COMPS; struct zram *zram = dev_to_zram(dev); - unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; char *args, *param, *val, *algo = NULL; u64 num_recomp_pages = ULLONG_MAX; + struct zram_pp_ctl *ctl = NULL; + struct zram_pp_slot *pps; u32 mode = 0, threshold = 0; - unsigned long index; struct page *page; ssize_t ret; @@ -1922,36 +2058,32 @@ static ssize_t recompress_store(struct device *dev, goto release_init_lock; } + ctl = init_pp_ctl(); + if (!ctl) { + ret = -ENOMEM; + goto release_init_lock; + } + + scan_slots_for_recompress(zram, mode, ctl); + ret = len; - for (index = 0; index < nr_pages; index++) { + while ((pps = select_pp_slot(ctl))) { int err = 0; if (!num_recomp_pages) break; - zram_slot_lock(zram, index); - - if (!zram_allocated(zram, index)) - goto next; - - if (mode & RECOMPRESS_IDLE && - !zram_test_flag(zram, index, ZRAM_IDLE)) + zram_slot_lock(zram, pps->index); + if (!zram_test_flag(zram, pps->index, ZRAM_PP_SLOT)) goto next; - if (mode & RECOMPRESS_HUGE && - !zram_test_flag(zram, index, ZRAM_HUGE)) - goto next; - - if (zram_test_flag(zram, index, ZRAM_WB) || - zram_test_flag(zram, index, ZRAM_UNDER_WB) || - zram_test_flag(zram, index, ZRAM_SAME) || - zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) - goto next; - - err = zram_recompress(zram, index, page, &num_recomp_pages, - threshold, prio, prio_max); + err = recompress_slot(zram, pps->index, page, + &num_recomp_pages, threshold, + prio, prio_max); next: - zram_slot_unlock(zram, index); + zram_slot_unlock(zram, pps->index); + release_pp_slot(zram, pps); + if (err) { ret = err; break; @@ -1963,6 +2095,7 @@ next: __free_page(page); release_init_lock: + release_pp_ctl(zram, ctl); atomic_set(&zram->pp_in_progress, 0); up_read(&zram->init_lock); return ret; -- cgit v1.2.3 From 330edc2bc059a48b1f61a704521818d4f831767c Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:09 +0900 Subject: zram: rework writeback target selection strategy Writeback suffers from the same problem as recompression did before - target slot selection for writeback is just a simple iteration over zram->table entries (stored pages) which selects suboptimal targets for writeback. This is especially problematic for writeback, because we uncompress objects before writeback so each of them takes 4K out of limited writeback storage. For example, when we take a 48 bytes slot and store it as a 4K object to writeback device we only save 48 bytes of memory (release from zsmalloc pool). We naturally want to pick the largest objects for writeback, because then each writeback will release the largest amount of memory. This patch applies the same solution and strategy as for recompression target selection: pp control (post-process) with 16 buckets of candidate pp slots. Slots are assigned to pp buckets based on sizes - the larger the slot the higher the group index. This gives us sorted by size lists of candidate slots (in linear time), so that among post-processing candidate slots we always select the largest ones first and maximize the memory saving. TEST ==== A very simple demonstration: zram is configured with a writeback device. A limited writeback (wb_limit 2500 pages) is performed then, with a log of sizes of slots that were written back. You can see that patched zram selects slots for recompression in significantly different manner, which leads to higher memory savings (see column #2 of mm_stat output). BASE ---- *** initial state of zram device /sys/block/zram0/mm_stat 1750327296 619765836 631902208 0 631902208 1 0 34278 34278 *** writeback idle wb_limit 2500 /sys/block/zram0/mm_stat 1750327296 617622333 631578624 0 631902208 1 0 34278 34278 Sizes of selected objects for writeback: ... 193 349 46 46 46 46 852 1002 543 162 107 49 34 34 34 ... PATCHED ------- *** initial state of zram device /sys/block/zram0/mm_stat 1750319104 619760957 631992320 0 631992320 1 0 34278 34278 *** writeback idle wb_limit 2500 /sys/block/zram0/mm_stat 1750319104 612672056 626135040 0 631992320 1 0 34278 34278 Sizes of selected objects for writeback: ... 3667 3580 3581 3580 3581 3581 3581 3231 3211 3203 3231 3246 ... Note, pp-slots are not strictly sorted, there is a PP_BUCKET_SIZE_RANGE variation of sizes within particular bucket. Link: https://lkml.kernel.org/r/20240917021020.883356-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 83 +++++++++++++++++++++++++++++++++---------- 1 file changed, 64 insertions(+), 19 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index f57ffb920166..42f7195b80cb 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -184,7 +184,7 @@ static void zram_accessed(struct zram *zram, u32 index) #endif } -#ifdef CONFIG_ZRAM_MULTI_COMP +#if defined CONFIG_ZRAM_WRITEBACK || defined CONFIG_ZRAM_MULTI_COMP struct zram_pp_slot { unsigned long index; struct list_head entry; @@ -681,11 +681,57 @@ static void read_from_bdev_async(struct zram *zram, struct page *page, #define IDLE_WRITEBACK (1<<1) #define INCOMPRESSIBLE_WRITEBACK (1<<2) +static int scan_slots_for_writeback(struct zram *zram, u32 mode, + unsigned long nr_pages, + unsigned long index, + struct zram_pp_ctl *ctl) +{ + struct zram_pp_slot *pps = NULL; + + for (; nr_pages != 0; index++, nr_pages--) { + if (!pps) + pps = kmalloc(sizeof(*pps), GFP_KERNEL); + if (!pps) + return -ENOMEM; + + INIT_LIST_HEAD(&pps->entry); + + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index)) + goto next; + + if (zram_test_flag(zram, index, ZRAM_WB) || + zram_test_flag(zram, index, ZRAM_SAME)) + goto next; + + if (mode & IDLE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_IDLE)) + goto next; + if (mode & HUGE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_HUGE)) + goto next; + if (mode & INCOMPRESSIBLE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) + goto next; + + pps->index = index; + place_pp_slot(zram, ctl, pps); + pps = NULL; +next: + zram_slot_unlock(zram, index); + } + + kfree(pps); + return 0; +} + static ssize_t writeback_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { struct zram *zram = dev_to_zram(dev); unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; + struct zram_pp_ctl *ctl = NULL; + struct zram_pp_slot *pps; unsigned long index = 0; struct bio bio; struct bio_vec bio_vec; @@ -737,7 +783,15 @@ static ssize_t writeback_store(struct device *dev, goto release_init_lock; } - for (; nr_pages != 0; index++, nr_pages--) { + ctl = init_pp_ctl(); + if (!ctl) { + ret = -ENOMEM; + goto release_init_lock; + } + + scan_slots_for_writeback(zram, mode, nr_pages, index, ctl); + + while ((pps = select_pp_slot(ctl))) { spin_lock(&zram->wb_limit_lock); if (zram->wb_limit_enable && !zram->bd_wb_limit) { spin_unlock(&zram->wb_limit_lock); @@ -754,25 +808,10 @@ static ssize_t writeback_store(struct device *dev, } } + index = pps->index; zram_slot_lock(zram, index); - if (!zram_allocated(zram, index)) - goto next; - - if (zram_test_flag(zram, index, ZRAM_WB) || - zram_test_flag(zram, index, ZRAM_SAME) || - zram_test_flag(zram, index, ZRAM_UNDER_WB)) - goto next; - - if (mode & IDLE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_IDLE)) - goto next; - if (mode & HUGE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_HUGE)) - goto next; - if (mode & INCOMPRESSIBLE_WRITEBACK && - !zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) + if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) goto next; - /* * Clearing ZRAM_UNDER_WB is duty of caller. * IOW, zram_free_page never clear it. @@ -786,6 +825,8 @@ static ssize_t writeback_store(struct device *dev, zram_clear_flag(zram, index, ZRAM_UNDER_WB); zram_clear_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); + + release_pp_slot(zram, pps); continue; } @@ -804,6 +845,8 @@ static ssize_t writeback_store(struct device *dev, zram_clear_flag(zram, index, ZRAM_UNDER_WB); zram_clear_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); + + release_pp_slot(zram, pps); /* * BIO errors are not fatal, we continue and simply * attempt to writeback the remaining objects (pages). @@ -846,12 +889,14 @@ static ssize_t writeback_store(struct device *dev, spin_unlock(&zram->wb_limit_lock); next: zram_slot_unlock(zram, index); + release_pp_slot(zram, pps); } if (blk_idx) free_block_bdev(zram, blk_idx); __free_page(page); release_init_lock: + release_pp_ctl(zram, ctl); atomic_set(&zram->pp_in_progress, 0); up_read(&zram->init_lock); -- cgit v1.2.3 From b967fa1ba72b5da2b6d9bf95f0b13420a59e0701 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:10 +0900 Subject: zram: do not mark idle slots that cannot be idle ZRAM_SAME slots cannot be post-processed (writeback or recompress) so do not mark them ZRAM_IDLE. Same with ZRAM_WB slots, they cannot be ZRAM_IDLE because they are not in zsmalloc pool anymore. Link: https://lkml.kernel.org/r/20240917021020.883356-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 42f7195b80cb..41e408661940 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -392,17 +392,28 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) /* * Do not mark ZRAM_UNDER_WB slot as ZRAM_IDLE to close race. * See the comment in writeback_store. + * + * Also do not mark ZRAM_SAME slots as ZRAM_IDLE, because no + * post-processing (recompress, writeback) happens to the + * ZRAM_SAME slot. + * + * And ZRAM_WB slots simply cannot be ZRAM_IDLE. */ zram_slot_lock(zram, index); - if (zram_allocated(zram, index) && - !zram_test_flag(zram, index, ZRAM_UNDER_WB)) { + if (!zram_allocated(zram, index) || + zram_test_flag(zram, index, ZRAM_WB) || + zram_test_flag(zram, index, ZRAM_UNDER_WB) || + zram_test_flag(zram, index, ZRAM_SAME)) { + zram_slot_unlock(zram, index); + continue; + } + #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME - is_idle = !cutoff || ktime_after(cutoff, - zram->table[index].ac_time); + is_idle = !cutoff || + ktime_after(cutoff, zram->table[index].ac_time); #endif - if (is_idle) - zram_set_flag(zram, index, ZRAM_IDLE); - } + if (is_idle) + zram_set_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); } } -- cgit v1.2.3 From 1a1d0f8992d5c6c8059d28cd9cb263180dd98a28 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:11 +0900 Subject: zram: reshuffle zram_free_page() flags operations Drop some redundant zram_test_flag() calls and re-order zram_clear_flag() calls. Plus two small trivial coding style fixes. No functional changes. Link: https://lkml.kernel.org/r/20240917021020.883356-7-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 41e408661940..c59a3e9218a9 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1499,20 +1499,17 @@ static void zram_free_page(struct zram *zram, size_t index) #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME zram->table[index].ac_time = 0; #endif - if (zram_test_flag(zram, index, ZRAM_IDLE)) - zram_clear_flag(zram, index, ZRAM_IDLE); + + zram_clear_flag(zram, index, ZRAM_IDLE); + zram_clear_flag(zram, index, ZRAM_INCOMPRESSIBLE); + zram_clear_flag(zram, index, ZRAM_PP_SLOT); + zram_set_priority(zram, index, 0); if (zram_test_flag(zram, index, ZRAM_HUGE)) { zram_clear_flag(zram, index, ZRAM_HUGE); atomic64_dec(&zram->stats.huge_pages); } - if (zram_test_flag(zram, index, ZRAM_INCOMPRESSIBLE)) - zram_clear_flag(zram, index, ZRAM_INCOMPRESSIBLE); - - zram_set_priority(zram, index, 0); - zram_clear_flag(zram, index, ZRAM_PP_SLOT); - if (zram_test_flag(zram, index, ZRAM_WB)) { zram_clear_flag(zram, index, ZRAM_WB); free_block_bdev(zram, zram_get_element(zram, index)); @@ -1536,13 +1533,12 @@ static void zram_free_page(struct zram *zram, size_t index) zs_free(zram->mem_pool, handle); atomic64_sub(zram_get_obj_size(zram, index), - &zram->stats.compr_data_size); + &zram->stats.compr_data_size); out: atomic64_dec(&zram->stats.pages_stored); zram_set_handle(zram, index, 0); zram_set_obj_size(zram, index, 0); - WARN_ON_ONCE(zram->table[index].flags & - ~(1UL << ZRAM_UNDER_WB)); + WARN_ON_ONCE(zram->table[index].flags & ~(1UL << ZRAM_UNDER_WB)); } /* -- cgit v1.2.3 From 5e99893444a0e0582feb49d618195114b6e35760 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 17 Sep 2024 11:09:12 +0900 Subject: zram: remove UNDER_WB and simplify writeback We now have only one active post-processing at any time, so we don't have same race conditions that we had before. If slot selected for post-processing gets freed or freed and reallocated it loses its PP_SLOT flag and there is no way for such a slot to gain PP_SLOT flag again until current post-processing terminates. Link: https://lkml.kernel.org/r/20240917021020.883356-8-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 53 +++++++++++++------------------------------ drivers/block/zram/zram_drv.h | 1 - 2 files changed, 16 insertions(+), 38 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index c59a3e9218a9..263795c4aef7 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -390,10 +390,7 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) for (index = 0; index < nr_pages; index++) { /* - * Do not mark ZRAM_UNDER_WB slot as ZRAM_IDLE to close race. - * See the comment in writeback_store. - * - * Also do not mark ZRAM_SAME slots as ZRAM_IDLE, because no + * Do not mark ZRAM_SAME slots as ZRAM_IDLE, because no * post-processing (recompress, writeback) happens to the * ZRAM_SAME slot. * @@ -402,7 +399,6 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) zram_slot_lock(zram, index); if (!zram_allocated(zram, index) || zram_test_flag(zram, index, ZRAM_WB) || - zram_test_flag(zram, index, ZRAM_UNDER_WB) || zram_test_flag(zram, index, ZRAM_SAME)) { zram_slot_unlock(zram, index); continue; @@ -821,22 +817,17 @@ static ssize_t writeback_store(struct device *dev, index = pps->index; zram_slot_lock(zram, index); - if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) - goto next; /* - * Clearing ZRAM_UNDER_WB is duty of caller. - * IOW, zram_free_page never clear it. + * scan_slots() sets ZRAM_PP_SLOT and relases slot lock, so + * slots can change in the meantime. If slots are accessed or + * freed they lose ZRAM_PP_SLOT flag and hence we don't + * post-process them. */ - zram_set_flag(zram, index, ZRAM_UNDER_WB); - /* Need for hugepage writeback racing */ - zram_set_flag(zram, index, ZRAM_IDLE); + if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) + goto next; zram_slot_unlock(zram, index); - if (zram_read_page(zram, page, index, NULL)) { - zram_slot_lock(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); + if (zram_read_page(zram, page, index, NULL)) { release_pp_slot(zram, pps); continue; } @@ -852,11 +843,6 @@ static ssize_t writeback_store(struct device *dev, */ err = submit_bio_wait(&bio); if (err) { - zram_slot_lock(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); - zram_slot_unlock(zram, index); - release_pp_slot(zram, pps); /* * BIO errors are not fatal, we continue and simply @@ -871,25 +857,19 @@ static ssize_t writeback_store(struct device *dev, } atomic64_inc(&zram->stats.bd_writes); + zram_slot_lock(zram, index); /* - * We released zram_slot_lock so need to check if the slot was - * changed. If there is freeing for the slot, we can catch it - * easily by zram_allocated. - * A subtle case is the slot is freed/reallocated/marked as - * ZRAM_IDLE again. To close the race, idle_store doesn't - * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB. - * Thus, we could close the race by checking ZRAM_IDLE bit. + * Same as above, we release slot lock during writeback so + * slot can change under us: slot_free() or slot_free() and + * reallocation (zram_write_page()). In both cases slot loses + * ZRAM_PP_SLOT flag. No concurrent post-processing can set + * ZRAM_PP_SLOT on such slots until current post-processing + * finishes. */ - zram_slot_lock(zram, index); - if (!zram_allocated(zram, index) || - !zram_test_flag(zram, index, ZRAM_IDLE)) { - zram_clear_flag(zram, index, ZRAM_UNDER_WB); - zram_clear_flag(zram, index, ZRAM_IDLE); + if (!zram_test_flag(zram, index, ZRAM_PP_SLOT)) goto next; - } zram_free_page(zram, index); - zram_clear_flag(zram, index, ZRAM_UNDER_WB); zram_set_flag(zram, index, ZRAM_WB); zram_set_element(zram, index, blk_idx); blk_idx = 0; @@ -1538,7 +1518,6 @@ out: atomic64_dec(&zram->stats.pages_stored); zram_set_handle(zram, index, 0); zram_set_obj_size(zram, index, 0); - WARN_ON_ONCE(zram->table[index].flags & ~(1UL << ZRAM_UNDER_WB)); } /* diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 73a9d47d76ba..134be414e210 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -47,7 +47,6 @@ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ ZRAM_WB, /* page is stored on backing_device */ - ZRAM_UNDER_WB, /* page is under writeback */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ ZRAM_IDLE, /* not accessed page since last idle marking */ -- cgit v1.2.3 From 01a9097aa3ce4c6aef296779c163169ac403260e Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 9 Oct 2024 13:28:00 +0900 Subject: zram: do not open-code comp priority 0 A cosmetic change: do not open-code compression priority 0, use ZRAM_PRIMARY_COMP instead. Link: https://lkml.kernel.org/r/20241009042908.750260-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 263795c4aef7..e6d12e81241d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2285,7 +2285,7 @@ static void zram_destroy_comps(struct zram *zram) { u32 prio; - for (prio = 0; prio < ZRAM_MAX_COMPS; prio++) { + for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) { struct zcomp *comp = zram->comps[prio]; zram->comps[prio] = NULL; @@ -2357,7 +2357,7 @@ static ssize_t disksize_store(struct device *dev, goto out_unlock; } - for (prio = 0; prio < ZRAM_MAX_COMPS; prio++) { + for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) { if (!zram->comp_algs[prio]) continue; -- cgit v1.2.3 From f85219096648b251a81e9fe24a1974590cfc417d Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 29 Oct 2024 00:36:14 +0900 Subject: zram: clear IDLE flag after recompression Patch series "zram: IDLE flag handling fixes", v2. zram can wrongly preserve ZRAM_IDLE flag on its entries which can result in premature post-processing (writeback and recompression) of such entries. This patch (of 2) Recompression should clear ZRAM_IDLE flag on the entries it has accessed, because otherwise some entries, specifically those for which recompression has failed, become immediate candidate entries for another post-processing (e.g. writeback). Consider the following case: - recompression marks entries IDLE every 4 hours and attempts to recompress them - some entries are incompressible, so we keep them intact and hence preserve IDLE flag - writeback marks entries IDLE every 8 hours and writebacks IDLE entries, however we have IDLE entries left from recompression, so writeback prematurely writebacks those entries. The bug was reported by Shin Kawamura. Link: https://lkml.kernel.org/r/20241028153629.1479791-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20241028153629.1479791-2-senozhatsky@chromium.org Fixes: 84b33bf78889 ("zram: introduce recompress sysfs knob") Signed-off-by: Sergey Senozhatsky Reported-by: Shin Kawamura Acked-by: Brian Geffon Cc: Minchan Kim Signed-off-by: Andrew Morton Cc: --- drivers/block/zram/zram_drv.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e6d12e81241d..a16dbffcdca3 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1864,6 +1864,13 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page, if (ret) return ret; + /* + * We touched this entry so mark it as non-IDLE. This makes sure that + * we don't preserve IDLE flag and don't incorrectly pick this entry + * for different post-processing type (e.g. writeback). + */ + zram_clear_flag(zram, index, ZRAM_IDLE); + class_index_old = zs_lookup_class_index(zram->mem_pool, comp_len_old); /* * Iterate the secondary comp algorithms list (in order of priority) -- cgit v1.2.3 From d37da422edb0664a2037e6d7d42fe6d339aae78a Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Tue, 29 Oct 2024 00:36:15 +0900 Subject: zram: clear IDLE flag in mark_idle() If entry does not fulfill current mark_idle() parameters, e.g. cutoff time, then we should clear its ZRAM_IDLE from previous mark_idle() invocations. Consider the following case: - mark_idle() cutoff time 8h - mark_idle() cutoff time 4h - writeback() idle - will writeback entries with cutoff time 8h, while it should only pick entries with cutoff time 4h The bug was reported by Shin Kawamura. Link: https://lkml.kernel.org/r/20241028153629.1479791-3-senozhatsky@chromium.org Fixes: 755804d16965 ("zram: introduce an aged idle interface") Signed-off-by: Sergey Senozhatsky Reported-by: Shin Kawamura Acked-by: Brian Geffon Cc: Minchan Kim Signed-off-by: Andrew Morton Cc: --- drivers/block/zram/zram_drv.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index a16dbffcdca3..cee49bb0126d 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -410,6 +410,8 @@ static void mark_idle(struct zram *zram, ktime_t cutoff) #endif if (is_idle) zram_set_flag(zram, index, ZRAM_IDLE); + else + zram_clear_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); } } -- cgit v1.2.3 From fb56fdf8b9a2f7397f8a83dce50189f3f0cf71af Mon Sep 17 00:00:00 2001 From: Kairui Song Date: Tue, 5 Nov 2024 01:52:56 +0800 Subject: mm/list_lru: split the lock to per-cgroup scope Currently, every list_lru has a per-node lock that protects adding, deletion, isolation, and reparenting of all list_lru_one instances belonging to this list_lru on this node. This lock contention is heavy when multiple cgroups modify the same list_lru. This lock can be split into per-cgroup scope to reduce contention. To achieve this, we need a stable list_lru_one for every cgroup. This commit adds a lock to each list_lru_one and introduced a helper function lock_list_lru_of_memcg, making it possible to pin the list_lru of a memcg. Then reworked the reparenting process. Reparenting will switch the list_lru_one instances one by one. By locking each instance and marking it dead using the nr_items counter, reparenting ensures that all items in the corresponding cgroup (on-list or not, because items have a stable cgroup, see below) will see the list_lru_one switch synchronously. Objcg reparent is also moved after list_lru reparent so items will have a stable mem cgroup until all list_lru_one instances are drained. The only caller that doesn't work the *_obj interfaces are direct calls to list_lru_{add,del}. But it's only used by zswap and that's also based on objcg, so it's fine. This also changes the bahaviour of the isolation function when LRU_RETRY or LRU_REMOVED_RETRY is returned, because now releasing the lock could unblock reparenting and free the list_lru_one, isolation function will have to return withoug re-lock the lru. prepare() { mkdir /tmp/test-fs modprobe brd rd_nr=1 rd_size=33554432 mkfs.xfs -f /dev/ram0 mount -t xfs /dev/ram0 /tmp/test-fs for i in $(seq 1 512); do mkdir "/tmp/test-fs/$i" for j in $(seq 1 10240); do echo TEST-CONTENT > "/tmp/test-fs/$i/$j" done & done; wait } do_test() { read_worker() { sleep 1 tar -cv "$1" &>/dev/null } read_in_all() { cd "/tmp/test-fs" && ls for i in $(seq 1 512); do (exec sh -c 'echo "$PPID"') > "/sys/fs/cgroup/benchmark/$i/cgroup.procs" read_worker "$i" & done; wait } for i in $(seq 1 512); do mkdir -p "/sys/fs/cgroup/benchmark/$i" done echo +memory > /sys/fs/cgroup/benchmark/cgroup.subtree_control echo 512M > /sys/fs/cgroup/benchmark/memory.max echo 3 > /proc/sys/vm/drop_caches time read_in_all } Above script simulates compression of small files in multiple cgroups with memory pressure. Run prepare() then do_test for 6 times: Before: real 0m7.762s user 0m11.340s sys 3m11.224s real 0m8.123s user 0m11.548s sys 3m2.549s real 0m7.736s user 0m11.515s sys 3m11.171s real 0m8.539s user 0m11.508s sys 3m7.618s real 0m7.928s user 0m11.349s sys 3m13.063s real 0m8.105s user 0m11.128s sys 3m14.313s After this commit (about ~15% faster): real 0m6.953s user 0m11.327s sys 2m42.912s real 0m7.453s user 0m11.343s sys 2m51.942s real 0m6.916s user 0m11.269s sys 2m43.957s real 0m6.894s user 0m11.528s sys 2m45.346s real 0m6.911s user 0m11.095s sys 2m43.168s real 0m6.773s user 0m11.518s sys 2m40.774s Link: https://lkml.kernel.org/r/20241104175257.60853-6-ryncsn@gmail.com Signed-off-by: Kairui Song Cc: Chengming Zhou Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Muchun Song Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Waiman Long Signed-off-by: Andrew Morton --- drivers/android/binder_alloc.c | 1 - 1 file changed, 1 deletion(-) (limited to 'drivers') diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index b3acbc4174fb..86bbe40f4bcd 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1106,7 +1106,6 @@ enum lru_status binder_alloc_free_page(struct list_head *item, mmput_async(mm); __free_page(page_to_free); - spin_lock(lock); return LRU_REMOVED_RETRY; err_invalid_vma: -- cgit v1.2.3 From da0c02516c501b43bd39ad4aca5779c86153d143 Mon Sep 17 00:00:00 2001 From: Kairui Song Date: Tue, 5 Nov 2024 01:52:57 +0800 Subject: mm/list_lru: simplify the list_lru walk callback function Now isolation no longer takes the list_lru global node lock, only use the per-cgroup lock instead. And this lock is inside the list_lru_one being walked, no longer needed to pass the lock explicitly. Link: https://lkml.kernel.org/r/20241104175257.60853-7-ryncsn@gmail.com Signed-off-by: Kairui Song Cc: Chengming Zhou Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Muchun Song Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Waiman Long Signed-off-by: Andrew Morton --- drivers/android/binder_alloc.c | 7 +++---- drivers/android/binder_alloc.h | 2 +- 2 files changed, 4 insertions(+), 5 deletions(-) (limited to 'drivers') diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 86bbe40f4bcd..a738e7745865 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1047,7 +1047,7 @@ void binder_alloc_vma_close(struct binder_alloc *alloc) /** * binder_alloc_free_page() - shrinker callback to free pages * @item: item to free - * @lock: lock protecting the item + * @lru: list_lru instance of the item * @cb_arg: callback argument * * Called from list_lru_walk() in binder_shrink_scan() to free @@ -1055,9 +1055,8 @@ void binder_alloc_vma_close(struct binder_alloc *alloc) */ enum lru_status binder_alloc_free_page(struct list_head *item, struct list_lru_one *lru, - spinlock_t *lock, void *cb_arg) - __must_hold(lock) + __must_hold(&lru->lock) { struct binder_lru_page *page = container_of(item, typeof(*page), lru); struct binder_alloc *alloc = page->alloc; @@ -1092,7 +1091,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item, list_lru_isolate(lru, item); spin_unlock(&alloc->lock); - spin_unlock(lock); + spin_unlock(&lru->lock); if (vma) { trace_binder_unmap_user_start(alloc, index); diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h index 70387234477e..c02c8ebcb466 100644 --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -118,7 +118,7 @@ static inline void binder_selftest_alloc(struct binder_alloc *alloc) {} #endif enum lru_status binder_alloc_free_page(struct list_head *item, struct list_lru_one *lru, - spinlock_t *lock, void *cb_arg); + void *cb_arg); struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc, size_t data_size, size_t offsets_size, -- cgit v1.2.3 From 7591c127f3b17d5879f18819cad7058bf3a2e276 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 4 Nov 2024 11:19:44 +0000 Subject: kmemleak: iommu/iova: fix transient kmemleak false positive The introduction of iova_depot_pop() in 911aa1245da8 ("iommu/iova: Make the rcache depot scale better") confused kmemleak by moving a struct iova_magazine object from a singly linked list to rcache->depot and resetting the 'next' pointer referencing it. Unlike doubly linked lists, the content of the object being referred is never changed on removal from a singly linked list and the kmemleak checksum heuristics do not detect such scenario. This leads to false positives like: unreferenced object 0xffff8881a5301000 (size 1024): comm "softirq", pid 0, jiffies 4306297099 (age 462.991s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e7 7d 05 00 00 00 00 00 .........}...... 0f b4 05 00 00 00 00 00 b4 96 05 00 00 00 00 00 ................ backtrace: [] __kmem_cache_alloc_node+0x1e8/0x320 [] kmalloc_trace+0x2a/0x60 [] free_iova_fast+0x28e/0x4e0 [] fq_ring_free_locked+0x1b0/0x310 [] fq_flush_timeout+0x19d/0x2e0 [] call_timer_fn+0x19a/0x5c0 [] __run_timers+0x78b/0xb80 [] run_timer_softirq+0x5d/0xd0 [] __do_softirq+0x205/0x8b5 Introduce kmemleak_transient_leak() which resets the object checksum requiring another scan pass before it is reported (if still unreferenced). Call this new API in iova_depot_pop(). Link: https://lkml.kernel.org/r/20241104111944.2207155-1-catalin.marinas@arm.com Link: https://lore.kernel.org/r/ZY1osaGLyT-sdKE8@shredder/ Signed-off-by: Catalin Marinas Reported-by: Ido Schimmel Tested-by: Ido Schimmel Acked-by: Robin Murphy Cc: Joerg Roedel Cc: Will Deacon Signed-off-by: Andrew Morton --- drivers/iommu/iova.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'drivers') diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 16c6adff3eb7..5b5400efb657 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -6,6 +6,7 @@ */ #include +#include #include #include #include @@ -673,6 +674,11 @@ static struct iova_magazine *iova_depot_pop(struct iova_rcache *rcache) { struct iova_magazine *mag = rcache->depot; + /* + * As the mag->next pointer is moved to rcache->depot and reset via + * the mag->size assignment, mark it as a transient false positive. + */ + kmemleak_transient_leak(mag->next); rcache->depot = mag->next; mag->size = IOVA_MAG_SIZE; rcache->depot_size--; -- cgit v1.2.3 From 9f3310ccc71efff041fed3f8be5ad19b0feab30b Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 5 Nov 2024 12:50:35 +0100 Subject: zram: ZRAM_DEF_COMP should depend on ZRAM When Compressed RAM block device support is disabled, the CONFIG_ZRAM_DEF_COMP symbol still ends up in the generated config file: CONFIG_ZRAM_DEF_COMP="unset-value" While this causes no real harm, avoid polluting the config file by adding a dependency on ZRAM. Link: https://lkml.kernel.org/r/64e05bad68a9bd5cc322efd114a04d25de525940.1730807319.git.geert@linux-m68k.org Fixes: 917a59e81c34 ("zram: introduce custom comp backends API") Signed-off-by: Geert Uytterhoeven Reviewed-by: Sergey Senozhatsky Cc: Jens Axboe Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers') diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 6aea609b795c..402b7b175863 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -94,6 +94,7 @@ endchoice config ZRAM_DEF_COMP string + depends on ZRAM default "lzo-rle" if ZRAM_DEF_COMP_LZORLE default "lzo" if ZRAM_DEF_COMP_LZO default "lz4" if ZRAM_DEF_COMP_LZ4 -- cgit v1.2.3 From f364cdeb38938f9d03061682b8ff3779dd1730e5 Mon Sep 17 00:00:00 2001 From: Liu Shixin Date: Fri, 8 Nov 2024 18:01:47 +0800 Subject: zram: fix NULL pointer in comp_algorithm_show() LTP reported a NULL pointer dereference as followed: CPU: 7 UID: 0 PID: 5995 Comm: cat Kdump: loaded Not tainted 6.12.0-rc6+ #3 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __pi_strcmp+0x24/0x140 lr : zcomp_available_show+0x60/0x100 [zram] sp : ffff800088b93b90 x29: ffff800088b93b90 x28: 0000000000000001 x27: 0000000000400cc0 x26: 0000000000000ffe x25: ffff80007b3e2388 x24: 0000000000000000 x23: ffff80007b3e2390 x22: ffff0004041a9000 x21: ffff80007b3e2900 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: ffff80007b3e2900 x9 : ffff80007b3cb280 x8 : 0101010101010101 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000040 x4 : 0000000000000000 x3 : 00656c722d6f7a6c x2 : 0000000000000000 x1 : ffff80007b3e2900 x0 : 0000000000000000 Call trace: __pi_strcmp+0x24/0x140 comp_algorithm_show+0x40/0x70 [zram] dev_attr_show+0x28/0x80 sysfs_kf_seq_show+0x90/0x140 kernfs_seq_show+0x34/0x48 seq_read_iter+0x1d4/0x4e8 kernfs_fop_read_iter+0x40/0x58 new_sync_read+0x9c/0x168 vfs_read+0x1a8/0x1f8 ksys_read+0x74/0x108 __arm64_sys_read+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0x138 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190 The zram->comp_algs[ZRAM_PRIMARY_COMP] can be NULL in zram_add() if comp_algorithm_set() has not been called. User can access the zram device by sysfs after device_add_disk(), so there is a time window to trigger the NULL pointer dereference. Move it ahead device_add_disk() to make sure when user can access the zram device, it is ready. comp_algorithm_set() is protected by zram->init_lock in other places and no such problem. Link: https://lkml.kernel.org/r/20241108100147.3776123-1-liushixin2@huawei.com Fixes: 7ac07a26dea7 ("zram: preparation for multi-zcomp support") Signed-off-by: Liu Shixin Reviewed-by: Sergey Senozhatsky Cc: Jens Axboe Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'drivers') diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index cee49bb0126d..3dee026988dc 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -2572,6 +2572,8 @@ static int zram_add(void) zram->disk->private_data = zram; snprintf(zram->disk->disk_name, 16, "zram%d", device_id); atomic_set(&zram->pp_in_progress, 0); + zram_comp_params_reset(zram); + comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor); /* Actual capacity set using sysfs (/sys/block/zram/disksize */ set_capacity(zram->disk, 0); @@ -2579,9 +2581,6 @@ static int zram_add(void) if (ret) goto out_cleanup_disk; - zram_comp_params_reset(zram); - comp_algorithm_set(zram, ZRAM_PRIMARY_COMP, default_compressor); - zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; -- cgit v1.2.3