summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/cgroup-v1
AgeCommit message (Collapse)AuthorFilesLines
2024-09-01memcg: initiate deprecation of pressure_levelShakeel Butt1-2/+6
The pressure_level in memcg v1 provides memory pressure notifications to the user space. At the moment it provides notifications for three levels of memory pressure i.e. low, medium and critical, which are defined based on internal memory reclaim implementation details. More specifically the ratio of scanned and reclaimed pages during a memory reclaim. However this is not robust as there are workloads with mostly unreclaimable user memory or kernel memory. For v2, the users can use PSI for memory pressure status of the system or the cgroup. Let's start the deprecation process for pressure_level and add warnings to gather the info on how the current users are using this interface and how they can be used to PSI. Link: https://lkml.kernel.org/r/20240814220021.3208384-5-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of oom_controlShakeel Butt1-2/+6
The oom_control provides functionality to disable memcg oom-killer, notifications on oom-kill and reading the stats regarding oom-kills. This interface was mainly introduced to provide functionality for userspace oom-killers. However it is not robust enough and only supports OOM handling in the page fault path. For v2, the users can use the combination of memory.events notifications, memory.high and PSI to provide userspace OOM-killing functionality. Actually LMKD in Android and OOMd in systemd and Meta infrastructure already use PSI in combination with other stats to implement userspace OOM-killing. Let's start the deprecation process for v1 and gather the info on how the current users are using this interface and work on providing a more robust functionality in v2. Link: https://lkml.kernel.org/r/20240814220021.3208384-4-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of v1 soft limitShakeel Butt1-2/+6
Memcg v1 provides soft limit functionality for the best effort memory sharing between multiple workloads on a system. It is usually triggered through kswapd and at the moment does not reclaim kernel memory. Memcg v2 provides more straightforward best effort (memory.low) and hard protection (memory.min) functionalities. Let's initiate the deprecation of soft limit from v1 and gather if v2 needs something more to move the existing v1 users to v2 regarding soft limit. Link: https://lkml.kernel.org/r/20240814220021.3208384-3-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of v1 tcp accountingShakeel Butt1-0/+8
Patch series "memcg: initiate deprecation of v1 features", v2. Start the deprecation process of the memcg v1 features which we discussed during LSFMMBPF 2024 [1]. For now add the warnings to collect the information on how the current users are using these features. Next we will work on providing better alternatives in v2 (if needed) and fully deprecate these features. Link: https://lwn.net/Articles/974575 [1] This patch (of 4): Memcg v1 provides opt-in TCP memory accounting feature. However it is mostly unused due to its performance impact on the network traffic. In v2, the TCP memory is accounted in the regular memory usage and is transparent to the users but they can observe the TCP memory usage through memcg stats. Let's initiate the deprecation process of memcg v1's tcp accounting functionality and add warnings to gather if there are any users and if there are, collect how they are using it and plan to provide them better alternative in v2. Link: https://lkml.kernel.org/r/20240814220021.3208384-1-shakeel.butt@linux.dev Link: https://lkml.kernel.org/r/20240814220021.3208384-2-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-26cgroup/pids: Separate semantics of pids.events related to pids.maxMichal Koutný1-1/+2
Currently, when pids.max limit is breached in the hierarchy, the event is counted and reported in the cgroup where the forking task resides. This decouples the limit and the notification caused by the limit making it hard to detect when the actual limit was effected. Redefine the pids.events:max as: the number of times the limit of the cgroup was hit. (Implementation differentiates also "forkfail" event but this is currently not exposed as it would better fit into pids.stat. It also differs from pids.events:max only when pids.max is configured on non-leaf cgroups.) Since it changes semantics of the original "max" event, introduce this change only in the v2 API of the controller and add a cgroup2 mount option to revert to the legacy behavior. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-19Merge tag 'sched-urgent-2024-05-18' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a sched_balance_newidle setting bug - Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst - Fix variable-shadowing build warning - Extend sched-domains debug output - Fix documentation - Fix comments * tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write() sched/fair: Remove stale FREQUENCY_UTIL comment sched/fair: Fix initial util_avg calculation docs: cgroup-v1: Clarify that domain levels are system-specific sched/debug: Dump domains' level sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level arch/topology: Fix variable naming to avoid shadowing
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-17docs: cgroup-v1: Clarify that domain levels are system-specificVitalii Bursov1-1/+6
Add a clarification that domain levels are system-specific and where to check for system details. Signed-off-by: Vitalii Bursov <vitaly@bursov.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/42b177a2e897cdf880caf9c2025f5b609e820334.1714488502.git.vitaly@bursov.com
2024-05-13docs: cgroup-v1: Update page cache removal functionsIllia Ostapyshyn1-1/+1
Commit 452e9e6992fe ("filemap: Add filemap_remove_folio and __filemap_remove_folio") reimplemented __delete_from_page_cache() as __filemap_remove_folio() and delete_from_page_cache() as filemap_remove_folio(). The compatibility wrappers were finally removed in ece62684dcfb ("hugetlbfs: convert hugetlb_delete_from_page_cache() to use folios") and 6ffcd825e7d0 ("mm: Remove __delete_from_page_cache()"). Update the remaining references to dead functions in the memcg implementation memo. Signed-off-by: Illia Ostapyshyn <illia@yshyn.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-05-05Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount()David Hildenbrand1-2/+2
Let's stop talking about page_mapcount(). Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25remove references to page->flags in documentationMatthew Wilcox (Oracle)1-2/+2
Mostly rewording, but remove entirely the copy of page_fixed_fake_head() in the documentation; we can refer people to the actual source if necessary. Link: https://lkml.kernel.org/r/20240326171045.410737-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-08docs: cgroup-v1: Fix description for css_onlineI Hsin Cheng1-1/+1
The original description refers to the comment on cgroup_for_each_descendant_pre() for more details. However, the macro cgroup_for_each_descendant_pre() no longer exist, we replace it with the corresponding macro cgroup_for_each_live_descendant_pre(). Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-02-29cgroup/cpuset: Mark memory_spread_slab as obsoleteXiongwei Song1-1/+1
We've removed the SLAB allocator, cpuset_do_slab_mem_spread() and SLAB_MEM_SPREAD, memory_spread_slab is a no-op now. We can mark memory_spread_slab as obsolete in case someone still wants to use it after cpuset_do_slab_mem_spread() removed. For more details, please check [1]. [1] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#m8e292e21b00f95a4bb8086371fa7387fa4ea8f60 tj: Description and cosmetic updates. Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-01-23docs: cgroup-v1: add missing code-block tagsXinyu Li1-8/+12
Hugetlb.rst lacks two code-block tags, causing a formatting issue. Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-06memcg: expose swapcache stat for memcg v1Liu Shixin1-0/+1
Patch series "Expose swapcache stat for memcg v1", v2. Since commit b6038942480e ("mm: memcg: add swapcache stat for memcg v2") adds swapcache stat for the cgroup v2, it seems there is no reason to hide it in memcg v1. Conversely, with swapcached it is more accurate to evaluate the available memory for memcg. Link: https://lkml.kernel.org/r/20230915105845.3199656-1-liushixin2@huawei.com Link: https://lkml.kernel.org/r/20230915105845.3199656-2-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29mm, memcg: reconsider kmem.limit_in_bytes deprecationMichal Hocko1-0/+7
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and partially reverts 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") which have incrementally removed support for the kernel memory accounting hard limit. Unfortunately it has turned out that there is still userspace depending on the existence of memory.kmem.limit_in_bytes [1]. The underlying functionality is not really required but the non-existent file just confuses the userspace which fails in the result. The patch to fix this on the userspace side has been submitted but it is hard to predict how it will propagate through the maze of 3rd party consumers of the software. Now, reverting alone 86327e8eb94c is not an option because there is another set of userspace which cannot cope with ENOTSUPP returned when writing to the file. Therefore we have to go and revisit 58056f77502f as well. There are two ways to go ahead. Either we give up on the deprecation and fully revert 58056f77502f as well or we can keep kmem.limit_in_bytes but make the write a noop and warn about the fact. This should work for both known breaking workloads which depend on the existence but do not depend on the hard limit enforcement. Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg total kernel memory stat") introduced in 4.18 has added memcg_account_kmem so the accounting is not done by obj_cgroup_charge_pages directly for v1 anymore. Prior kernels need to add it explicitly (thanks to Johannes for pointing this out). [akpm@linux-foundation.org: fix build - remove unused local] Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net [1] Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-01Merge tag 'cgroup-for-6.6' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Per-cpu cpu usage stats are now tracked This currently isn't printed out in the cgroupfs interface and can only be accessed through e.g. BPF. Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs - cpuset received some cleanups and prepatory patches for the pending cpus.exclusive patchset which will allow cpuset partitions to be created below non-partition parents, which should ease the management of partition cpusets - A lot of code and documentation cleanup patches - tools/testing/selftests/cgroup/test_cpuset.c added * tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (32 commits) cgroup: Avoid -Wstringop-overflow warnings cgroup:namespace: Remove unused cgroup_namespaces_init() cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants cgroup: clean up if condition in cgroup_pidlist_start() cgroup: fix obsolete function name in cgroup_destroy_locked() Documentation: cgroup-v2.rst: Correct number of stats entries cgroup: fix obsolete function name above css_free_rwork_fn() cgroup/cpuset: fix kernel-doc cgroup: clean up printk() cgroup: fix obsolete comment above cgroup_create() docs: cgroup-v1: fix typo docs: cgroup-v1: correct the term of Page Cache organization in inode cgroup/misc: Store atomic64_t reads to u64 cgroup/misc: Change counters to be explicit 64bit types cgroup/misc: update struct members descriptions cgroup: remove cgrp->kn check in css_populate_dir() cgroup: fix obsolete function name cgroup: use cached local variable parent in for loop cgroup: remove obsolete comment above struct cgroupstats cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED ...
2023-08-30Merge tag 'docs-6.6' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ...
2023-08-18memcg: drop kmem.limit_in_bytesMichal Hocko1-2/+0
kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been deprecated since 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") merged in 5.16. We haven't heard about any serious users since then but it seems that the mere presence of the file is causing more harm thatn good. We (SUSE) have had several bug reports from customers where Docker based containers started to fail because a write to kmem.limit_in_bytes has failed. This was unexpected because runc code only expects ENOENT (kmem disabled) or EBUSY (tasks already running within cgroup). So a new error code was unexpected and the whole container startup failed. This has been later addressed by https://github.com/opencontainers/runc/commit/52390d68040637dfc77f9fda6bbe70952423d380 so current Docker runtimes do not suffer from the problem anymore. There are still older version of Docker in use and likely hard to get rid of completely. Address this by wiping out the file completely and effectively get back to pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration. I would recommend backporting to stable trees which have picked up 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes"). [mhocko@suse.com: restore _KMEM switch case] Link: https://lkml.kernel.org/r/ZKe5wxdbvPi5Cwd7@dhcp22.suse.cz Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-31docs: cgroup-v1: replace dead CONFIG_MEM_RES_CTRL_SWAP symbolKrzysztof Pawlaczyk1-1/+1
Commit e55b9f96860f ("mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol") removed MEMCG_SWAP as it was just a shorthand for CONFIG_MEMCG && CONFIG_SWAP. The commit also adjusted the documentation but missed reference to CONFIG_MEM_RES_CTRL_SWAP, which was the previous name of CONFIG_MEMCG_SWAP before renaming in commit c255a458055e ("memcg: rename config variables"). Update the reference to the current state. Signed-off-by: Krzysztof Pawlaczyk <krzpaw@gmail.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230728105723.251892-1-krzpaw@gmail.com
2023-07-21docs: cgroup-v1: fix typoXiongwei Song1-1/+1
"listers" -> "listeners" Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-07-21docs: cgroup-v1: correct the term of Page Cache organization in inodeXiongwei Song1-2/+2
The radix-tree for Page Cache has been replaced with xarray, see commit eb797a8ee0ab ("page cache: Rearrange address_space"), so move "radix-tree" to "xarray". Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-06-19mm: kill lock|unlock_page_memcg()Kefeng Wang1-1/+1
Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"), no more user, kill lock_page_memcg() and unlock_page_memcg(). Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-14docs: cgroup-v1/cpusets: update libcgroup project linkKamalesh Babulal1-1/+1
Update the libcgroup project link, in the cgroup v1/cpusets documentation. Signed-off-by: Kamalesh Babulal <kamalesh.babulal@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-22Merge tag 'docs-6.3' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "It has been a moderately calm cycle for documentation; the significant changes include: - Some significant additions to the memory-management documentation - Some improvements to navigation in the HTML-rendered docs - More Spanish and Chinese translations ... and the usual set of typo fixes and such" * tag 'docs-6.3' of git://git.lwn.net/linux: (68 commits) Documentation/watchdog/hpwdt: Fix Format Documentation/watchdog/hpwdt: Fix Reference Documentation: core-api: padata: correct spelling docs/mm: Physical Memory: correct spelling in reference to CONFIG_PAGE_EXTENSION docs: Use HTML comments for the kernel-toc SPDX line docs: Add more information to the HTML sidebar Documentation: KVM: Update AMD memory encryption link printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay= Documentation: userspace-api: correct spelling Documentation: sparc: correct spelling Documentation: driver-api: correct spelling Documentation: admin-guide: correct spelling docs: add workload-tracing document to admin-guide docs/admin-guide/mm: remove useless markup docs/mm: remove useless markup docs/mm: Physical Memory: remove useless markup docs/sp_SP: Add process magic-number translation docs: ftrace: always use canonical ftrace path Doc/damon: fix the data path error dma-buf: Add "dma-buf" to title of documentation ...
2023-02-02Documentation: admin-guide: correct spellingRandy Dunlap1-1/+1
Correct spelling problems for Documentation/admin-guide/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: dm-devel@redhat.com Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-media@vger.kernel.org Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230129231053.20863-2-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-01-18mm: memcontrol: deprecate charge movingJohannes Weiner1-2/+11
Charge moving mode in cgroup1 allows memory to follow tasks as they migrate between cgroups. This is, and always has been, a questionable thing to do - for several reasons. First, it's expensive. Pages need to be identified, locked and isolated from various MM operations, and reassigned, one by one. Second, it's unreliable. Once pages are charged to a cgroup, there isn't always a clear owner task anymore. Cache isn't moved at all, for example. Mapped memory is moved - but if trylocking or isolating a page fails, it's arbitrarily left behind. Frequent moving between domains may leave a task's memory scattered all over the place. Third, it isn't really needed. Launcher tasks can kick off workload tasks directly in their target cgroup. Using dedicated per-workload groups allows fine-grained policy adjustments - no need to move tasks and their physical pages between control domains. The feature was never forward-ported to cgroup2, and it hasn't been missed. Despite it being a niche usecase, the maintenance overhead of supporting it is enormous. Because pages are moved while they are live and subject to various MM operations, the synchronization rules are complicated. There are lock_page_memcg() in MM and FS code, which non-cgroup people don't understand. In some cases we've been able to shift code and cgroup API calls around such that we can rely on native locking as much as possible. But that's fragile, and sometimes we need to hold MM locks for longer than we otherwise would (pte lock e.g.). Mark the feature deprecated. Hopefully we can remove it soon. And backport into -stable kernels so that people who develop against earlier kernels are warned about this deprecation as early as possible. [akpm@linux-foundation.org: fix memory.rst underlining] Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-05docs: cgroup-v1: use numbered lists for user interface setupBagas Sanjaya2-15/+13
Setup instructions for memory resource controller UI uses a mix of section headings and normal paragraphs, whereas numbered lists are better fit for this purpose. While at it, also slightly reword the instructions and add reference to "Why are cgroups needed?" in the main cgroups documentation. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: add internal cross-referencesBagas Sanjaya1-13/+30
The documentation contains references to other sections in the doc (internal). Add cross-references for them so that these can be accessed without having to manually search for them. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: make swap extension subsections subsectionsBagas Sanjaya1-2/+4
Subsections text of swap extension section is marked up as bold text, whereas making them proper subsection is more appropriate. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: use bullet lists for list of stat file tablesBagas Sanjaya1-54/+53
The stat file section contains three tables, where the leading texts for them are subsection heading. Organize them in the bullet list, while demoting headings into normal text. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: move hierarchy of accounting captionBagas Sanjaya1-2/+2
The caption for hierarchy of accounting figure is in the code block, which is quite odd. Move the caption into :caption: option of code-block:: directive instead. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: fix footnotesBagas Sanjaya1-17/+19
The documentation contains external references, which some of them are marked as footnotes. Fix the syntax for them to be properly rendered as such. Non-footnote references aren't affected since the text for these is aligned the same to the footnotes. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: use code block for locking order schemaBagas Sanjaya1-1/+1
The locking order schema is a figure (like diagram), which should have been formatted with literal code block for consistency with other figures. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: wrap remaining admonitions in admonition blocksBagas Sanjaya1-1/+1
Wrap two other admonitions in appropriate blocks in order for readers to pay more attention to block contents: * hint:: for editor's note * warning:: for move charges deprecation Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05docs: cgroup-v1: replace custom note constructs with appropriate admonition ↵Bagas Sanjaya1-27/+31
blocks Admonition constructs on the documentation use definition lists, which isn't fit for the purpose. Replace them with appropriate blocks: * Use caution:: for outdated document notice * hint:: for memo * note:: for other constructs * warning:: for memory reclaim Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-11-30docs: admin-guide: cgroup-v1: update description of inactive_fileJian Wen1-1/+2
MADV_FREE pages have been moved into the LRU_INACTIVE_FILE list by commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into LRU_INACTIVE_FILE list"). Link: https://lkml.kernel.org/r/20221111034639.3593380-1-wenjian1@xiaomi.com Signed-off-by: Jian Wen <wenjian1@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbolJohannes Weiner1-3/+1
Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP. Update the sites accordingly and drop the symbol. [ While touching the docs, remove two references to CONFIG_MEMCG_KMEM, which hasn't been a user-visible symbol for over half a decade. ] Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@goo