summaryrefslogtreecommitdiff
path: root/drivers/block/zram
AgeCommit message (Collapse)AuthorFilesLines
2023-04-18zram: fix synchronous readsChristoph Hellwig1-38/+22
Currently nothing waits for the synchronous reads before accessing the data. Switch them to an on-stack bio and submit_bio_wait to make sure the I/O has actually completed when the work item has been flushed. This also removes the call to page_endio that would unlock a page that has never been locked. Drop the partial_io/sync flag, as chaining only makes sense for the asynchronous reads of the entire page. Link: https://lkml.kernel.org/r/20230411171459.567614-17-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: don't return errors from read_from_bdev_asyncChristoph Hellwig1-12/+4
bio_alloc will never return a NULL bio when it is allowed to sleep, and adding a single page to bio with a single vector also can't fail, so switch to the asserting __bio_add_page variant and drop the error returns. Link: https://lkml.kernel.org/r/20230411171459.567614-16-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: pass a page to read_from_bdevChristoph Hellwig1-27/+16
read_from_bdev always reads a whole page, so pass a page to it instead of the bvec and remove the now pointless zram_bvec_read_from_bdev wrapper. Link: https://lkml.kernel.org/r/20230411171459.567614-15-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: refactor zram_bdev_writeChristoph Hellwig1-19/+19
Split the read/modify/write case into a separate helper. Link: https://lkml.kernel.org/r/20230411171459.567614-14-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: don't pass a bvec to __zram_bvec_writeChristoph Hellwig1-10/+4
__zram_bvec_write only extracts the page from __zram_bvec_write and always expects a full page of input. Pass the page directly instead of the bvec and rename the function to zram_write_page. Link: https://lkml.kernel.org/r/20230411171459.567614-13-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: refactor zram_bdev_readChristoph Hellwig1-20/+20
Split the partial read into a separate helper. Link: https://lkml.kernel.org/r/20230411171459.567614-12-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: directly call zram_read_page in writeback_storeChristoph Hellwig1-10/+4
writeback_store always reads a full page, so just call zram_read_page directly and bypass the boune buffer handling. Link: https://lkml.kernel.org/r/20230411171459.567614-11-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: rename __zram_bvec_read to zram_read_pageChristoph Hellwig1-4/+4
__zram_bvec_read doesn't get passed a bvec, but always read a whole page. Rename it to make the usage more clear. Link: https://lkml.kernel.org/r/20230411171459.567614-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: don't use highmem for the bounce buffer in zram_bvec_{read,write}Christoph Hellwig1-12/+5
There is no point in allocation a highmem page when we instantly need to copy from it. Link: https://lkml.kernel.org/r/20230411171459.567614-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: refactor highlevel read and write handlingChristoph Hellwig1-28/+30
Instead of having an outer loop in __zram_make_request and then branch out for reads vs writes for each loop iteration in zram_bvec_rw, split the main handler into separat zram_bio_read and zram_bio_write handlers that also include the functionality formerly in zram_bvec_rw. Link: https://lkml.kernel.org/r/20230411171459.567614-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: return early on error in zram_bvec_rwChristoph Hellwig1-9/+9
When the low-level access fails, don't clear the idle flag or clear the caches, and just return. Link: https://lkml.kernel.org/r/20230411171459.567614-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: move discard handling to zram_submit_bioChristoph Hellwig1-10/+13
Switch on the bio operation in zram_submit_bio and only call into __zram_make_request for read and write operations. Link: https://lkml.kernel.org/r/20230411171459.567614-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: simplify bvec iteration in __zram_make_requestChristoph Hellwig1-31/+11
bio_for_each_segment synthetize bvecs that never cross page boundaries, so don't duplicate that work in an inner loop. Link: https://lkml.kernel.org/r/20230411171459.567614-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: make zram_bio_discard more self-containedChristoph Hellwig1-9/+7
Derive the index and offset variables inside the function, and complete the bio directly in preparation for cleaning up the I/O path. Link: https://lkml.kernel.org/r/20230411171459.567614-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: remove valid_io_requestChristoph Hellwig2-34/+1
All bios hande to drivers from the block layer are checked against the device size and for logical block alignment already (and have been since long before zram was merged), so don't duplicate those checks. Link: https://lkml.kernel.org/r/20230411171459.567614-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: always compile read_from_bdev_syncChristoph Hellwig1-12/+6
Patch series "zram I/O path cleanups and fixups", v3. This series cleans up the zram I/O path, and fixes the handling of synchronous I/O to the underlying device in the writeback_store function or for > 4K PAGE_SIZE systems. The fixes are at the end, as I could not fully reason about them being safe before untangling the callchain. This patch (of 17): read_from_bdev_sync is currently only compiled for non-4k PAGE_SIZE, which means it won't be built with the most common configurations. Replace the ifdef with a check for the PAGE_SIZE in an if instead. The check uses an extra symbol and IS_ENABLED to allow the compiler to eliminate the dead code, leading to the same generated code size: text data bss dec hex filename 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.old 16709 1428 12 18149 46e5 drivers/block/zram/zram_drv.o.new Link: https://lkml.kernel.org/r/20230411171459.567614-1-hch@lst.de Link: https://lkml.kernel.org/r/20230411171459.567614-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zram: fix up permission for the hot_add sysfs fileGreg Kroah-Hartman1-1/+3
Commit 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") changed the attribute to use CLASS_ATTR_RO() which changed the permission from 0400 to 0444. But this atribute is "special" in that reading it modifies the system state, so it MUST be set to 0400 so that only root processes can muck around with it. Fix this all up, AND document this so that I don't change it again in 3-4 years when I stumble across it and wonder why it's an open-coded _ATTR() macro. Reported-by: Denis Efremov <efremov@linux.com> Fixes: 75a2d4226b53 ("driver core: class: mark the struct class for sysfs callbacks as constant") Link: https://lore.kernel.org/r/2023041810-angelic-conical-52d8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman1-6/+5
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steve French <sfrench@samba.org> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-cifs@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: netdev@vger.kernel.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17drivers: remove struct module * setting from struct classGreg Kroah-Hartman1-1/+0
There is no need to manually set the owner of a struct class, as the registering function does it automatically, so remove all of the explicit settings from various drivers that did so as it is unneeded. This allows us to remove this pointer entirely from this structure going forward. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-64/+5
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-03zram: use bvec_set_page to initialize bvecsChristoph Hellwig1-11/+4
Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-02block: remove ->rw_pageChristoph Hellwig1-60/+1
The ->rw_page method is a special purpose bypass of the usual bio handling path that is limited to single-page reads and writes and synchronous which causes a lot of extra code in the drivers, callers and the block layer. The only remaining user is the MM swap code. Switch that swap code to simply submit a single-vec on-stack bio an synchronously wait on it based on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that currently implement ->rw_page instead. While this touches one extra cache line and executes extra code, it simplifies the block layer and drivers and ensures that all feastures are properly supported by all drivers, e.g. right now ->rw_page bypassed cgroup writeback entirely. [akpm@linux-foundation.org: fix comment typo, per Dan] Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18zram: correctly handle all next_arg() casesSergey Senozhatsky1-2/+2
When supplied buffer does not have assignment sign next_arg() sets `val` pointer to NULL, so we cannot dereference it. Add a NULL pointer test to handle `param` case, in addition to `*val` test, which handles cases when param has no value assigned to it: `param=`. Link: https://lkml.kernel.org/r/20230103030119.1496358-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18zram: fix typos in commentsJeongHyeon Lee1-2/+2
- The double `range` is duplicated in comment, remove one. - change `syfs` to `sysfs` Link: https://lkml.kernel.org/r/20221223040331.4194-1-jhs2.lee@samsung.com Signed-off-by: JeongHyeon Lee <jhs2.lee@samsung.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: remove unused stats fieldsSergey Senozhatsky2-4/+0
We don't show num_reads and num_writes since we removed corresponding sysfs nodes in 2017. Block layer stats are exposed via /sys/block/zramX/stat file. However, we still increment those atomic vars and store them in zram stats. Remove leftovers. Link: https://lkml.kernel.org/r/20221117141326.1105181-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add incompressible flag to read_block_state()Sergey Senozhatsky1-2/+4
Add a new flag to zram block state that shows if the page is incompressible: that none of the algorithm (including secondary ones) could compress it. Link: https://lkml.kernel.org/r/20221109115047.2921851-14-senozhatsky@chromium.org Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add incompressible writebackSergey Senozhatsky1-6/+12
Add support for incompressible pages writeback: echo incompressible > /sys/block/zramX/writeback Link: https://lkml.kernel.org/r/20221109115047.2921851-13-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add algo parameter support to zram_recompress()Sergey Senozhatsky2-9/+46
Recompression iterates through all the registered secondary compression algorithms in order of their priorities so that we have higher chances of finding the algorithm that compresses a particular page. This, however, may not always be best approach and sometimes we may want to limit recompression to only one particular algorithm. For instance, when a higher priority algorithm uses too much power and device has a relatively low battery level we may want to limit recompression to use only a lower priority algorithm, which uses less power. Introduce algo= parameter support to recompression sysfs knob so that user-sapce can request recompression with particular algorithm only: echo "type=idle algo=zstd" > /sys/block/zramX/recompress Link: https://lkml.kernel.org/r/20221109115047.2921851-11-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: remove redundant checks from zram_recompress()Sergey Senozhatsky1-6/+2
Size class index comparison is powerful enough so we can remove object size comparisons. Link: https://lkml.kernel.org/r/20221109115047.2921851-10-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add size class equals check into recompressionAlexey Romanov1-1/+10
It makes no sense for us to recompress the object if it will be in the same size class. We anyway don't get any memory gain. But, at the same time, we get a CPU time overhead when inserting this object into zspage and decompressing it afterwards. [senozhatsky: rebased and fixed conflicts] Link: https://lkml.kernel.org/r/20221109115047.2921851-9-senozhatsky@chromium.org Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: use IS_ERR_VALUE() to check for zs_malloc() errorsSergey Senozhatsky1-3/+3
Avoid typecasts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Link: https://lkml.kernel.org/r/20221109115047.2921851-8-senozhatsky@chromium.org Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: clarify writeback_store() commentSergey Senozhatsky1-2/+6
Re-phrase writeback BIO error comment. Link: https://lkml.kernel.org/r/20221109115047.2921851-7-senozhatsky@chromium.org Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add recompress flag to read_block_state()Sergey Senozhatsky1-2/+3
Add a new flag to zram block state that shows if the page was recompressed (using alternative compression algorithm). Link: https://lkml.kernel.org/r/20221109115047.2921851-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: introduce recompress sysfs knobSergey Senozhatsky3-3/+277
Allow zram to recompress (using secondary compression streams) pages. Re-compression algorithms (we support up to 3 at this stage) are selected via recomp_algorithm: echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm Please read documentation for more details. We support several recompression modes: 1) IDLE pages recompression is activated by `idle` mode echo "type=idle" > /sys/block/zram0/recompress 2) Since there may be many idle pages user-space may pass a size threshold value (in bytes) and we will recompress pages only of equal or greater size: echo "threshold=888" > /sys/block/zram0/recompress 3) HUGE pages recompression is activated by `huge` mode echo "type=huge" > /sys/block/zram0/recompress 4) HUGE_IDLE pages recompression is activated by `huge_idle` mode echo "type=huge_idle" > /sys/block/zram0/recompress [senozhatsky@chromium.org: we should always zero out err variable in recompress loop[ Link: https://lkml.kernel.org/r/20221110143423.3250790-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20221109115047.2921851-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: factor out WB and non-WB zram read functionsSergey Senozhatsky1-23/+49
We will use non-WB variant in ZRAM page recompression path. Link: https://lkml.kernel.org/r/20221109115047.2921851-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: add recompression algorithm sysfs knobSergey Senozhatsky1-19/+105
Introduce recomp_algorithm sysfs knob that controls secondary algorithm selection used for recompression. We will support up to 3 secondary compression algorithms which are sorted in order of their priority. To select an algorithm user has to provide its name and priority: echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm During recompression zram iterates through the list of registered secondary algorithms in order of their priorities. We also have a short version for cases when there is only one secondary compression algorithm: echo "algo=zstd" > /sys/block/zramX/recomp_algorithm This will register zstd as the secondary algorithm with priority 1. Link: https://lkml.kernel.org/r/20221109115047.2921851-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30zram: preparation for multi-zcomp supportSergey Senozhatsky4-32/+80
Patch series "zram: Support multiple compression streams", v5. This series adds support for multiple compression streams. The main idea is that different compression algorithms have different characteristics and zram may benefit when it uses a combination of algorithms: a default algorithm that is faster but have lower compression rate and a secondary algorithm that can use higher compression rate at a price of slower compression/decompression. There are several use-case for this functionality: - huge pages re-compression: zstd or deflate can successfully compress huge pages (~50% of huge pages on my synthetic ChromeOS tests), IOW pages that lzo was not able to compress. - idle pages re-compression: idle/cold pages sit in the memory and we may reduce zsmalloc memory usage if we recompress those idle pages. Userspace has a number of ways to control the behavior and impact of zram recompression: what type of pages should be recompressed, size watermarks, etc. Please refer to documentation patch. This patch (of 13): The patch turns compression streams and compressor algorithm name struct zram members into arrays, so that we can have multiple compression streams support (in the next patches). The patch uses a rather explicit API for compressor selection: - Get primary (default) compression stream zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]) - Get secondary compression stream zcomp_stream_get(zram->comps[ZRAM_SECONDARY_COMP]) We use similar API for compression streams put(). At this point we always have just one compression stream, since CONFIG_ZRAM_MULTI_COMP is not yet defined. Link: https://lkml.kernel.org/r/20221109115047.2921851-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20221109115047.2921851-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Alexey Romanov <avromanov@sberdevices.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08zram: use try_cmpxchg in update_used_maxUros Bizjak1-8/+5
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in update_used_max. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, reorder code a bit to remove additional compare and conditional jump from the assembly code. Together, hese two changes save 15 bytes from the function when compiled for x86_64. No functional change intended. Link: https://lkml.kernel.org/r/20221018145154.3699-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-14Merge tag 'mm-stable-2022-10-13' of ↵Linus Torvalds1-23/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - fix a race which causes page refcounting errors in ZONE_DEVICE pages (Alistair Popple) - fix userfaultfd test harness instability (Peter Xu) - various other patches in MM, mainly fixes * tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits) highmem: fix kmap_to_page() for kmap_local_page() addresses mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page mm/selftest: uffd: explain the write missing fault check mm/hugetlb: use hugetlb_pte_stable in migration race check mm/hugetlb: fix race condition of uffd missing/minor handling zram: always expose rw_page LoongArch: update local TLB if PTE entry exists mm: use update_mmu_tlb() on the second thread kasan: fix array-bounds warnings in tests hmm-tests: add test for migrate_device_range() nouveau/dmem: evict device private memory during release nouveau/dmem: refactor nouveau_dmem_fault_copy_one() mm/migrate_device.c: add migrate_device_range() mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page() mm/memremap.c: take a pgmap reference on page allocation mm: free device private pages have zero refcount mm/memory.c: fix race when faulting a device private page mm/damon: use damon_sz_region() in appropriate place mm/damon: move sz_damon_region to damon_sz_region lib/test_meminit: add checks for the allocation functions ...
2022-10-12zram: always expose rw_pageBrian Geffon1-23/+3
Currently zram will adjust its fops to a version which does not contain rw_page when a backing device has been assigned. This is done to prevent upper layers from assuming a synchronous operation when a page may have been written back. This forces every operation through bio which has overhead associated with bio_alloc/frees. The code can be simplified to always expose an rw_page method and only in the rare event that a page is written back we instead will return -EOPNOTSUPP forcing the upper layer to fallback to bio. Link: https://lkml.kernel.org/r/20221003144832.2906610-1-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Rom Lemarchand <romlem@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds2-22/+31
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-07Merge tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linuxLinus Torvalds1-3/+3
Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - handle number of queue changes in the TCP and RDMA drivers (Daniel Wagner) - allow changing the number of queues in nvmet (Daniel Wagner) - also consider host_iface when checking ip options (Daniel Wagner) - don't map pages which can't come from HIGHMEM (Fabio M. De Francesco) - avoid unnecessary flush bios in nvmet (Guixin Liu) - shrink and better pack the nvme_iod structure (Keith Busch) - add comment for unaligned "fake" nqn (Linjun Bao) - print actual source IP address through sysfs "address" attr (Martin Belanger) - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang) - handle effects after freeing the request (Keith Busch) - copy firmware_rev on each init (Keith Busch) - restrict management ioctls to admin (Keith Busch) - ensure subsystem reset is single threaded (Keith Busch) - report the actual number of tagset maps in nvme-pci (Keith Busch) - small fabrics authentication fixups (Christoph Hellwig) - add common code for tagset allocation and freeing (Christoph Hellwig) - stop using the request_queue in nvmet (Christoph Hellwig) - set min_align_mask before calculating max_hw_sectors (Rishabh Bhatnagar) - send a rediscover uevent when a persistent discovery controller reconnects (Sagi Grimberg) - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi) - MD pull request via Song: - Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan. - Raid10 performance optimization, by Yu Kuai. - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu) - IO scheduler switching quisce fix (Keith) - s390/dasd block driver updates (Stefan) - support for recovery for the ublk driver (ZiyangZhang) - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph) - blk-mq and null_blk map fixes (Bart) - various bcache fixes (Coly, Jilin, Jules) - nbd signal hang fix (Shigeru) - block writeback throttling fix (Yu) - optimize the passthrough mapping handling (me) - prepare block cgroups to being gendisk based (Christoph) - get rid of an old PSI hack in the block layer, moving it to the callers instead where it belongs (Christoph) - blk-throttle fixes and cleanups (Yu) - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj, Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming, Miaohe, Bart, Coly, Gaosheng * tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits) sbitmap: fix lockup while swapping block: add rationale for not using blk_mq_plug() when applicable block: adapt blk_mq_plug() to not plug for writes that require a zone lock s390/dasd: use blk_mq_alloc_disk blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep nvmet: don't l