summaryrefslogtreecommitdiff
path: root/Documentation/mm
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/mm')
-rw-r--r--Documentation/mm/active_mm.rst91
-rw-r--r--Documentation/mm/arch_pgtable_helpers.rst260
-rw-r--r--Documentation/mm/balance.rst102
-rw-r--r--Documentation/mm/bootmem.rst5
-rw-r--r--Documentation/mm/damon/api.rst20
-rw-r--r--Documentation/mm/damon/design.rst176
-rw-r--r--Documentation/mm/damon/faq.rst50
-rw-r--r--Documentation/mm/damon/index.rst29
-rw-r--r--Documentation/mm/free_page_reporting.rst40
-rw-r--r--Documentation/mm/frontswap.rst266
-rw-r--r--Documentation/mm/highmem.rst167
-rw-r--r--Documentation/mm/hmm.rst452
-rw-r--r--Documentation/mm/hugetlbfs_reserv.rst596
-rw-r--r--Documentation/mm/hwpoison.rst184
-rw-r--r--Documentation/mm/index.rst68
-rw-r--r--Documentation/mm/ksm.rst87
-rw-r--r--Documentation/mm/memory-model.rst177
-rw-r--r--Documentation/mm/mmu_notifier.rst99
-rw-r--r--Documentation/mm/numa.rst150
-rw-r--r--Documentation/mm/oom.rst5
-rw-r--r--Documentation/mm/overcommit-accounting.rst86
-rw-r--r--Documentation/mm/page_allocation.rst5
-rw-r--r--Documentation/mm/page_cache.rst5
-rw-r--r--Documentation/mm/page_frags.rst45
-rw-r--r--Documentation/mm/page_migration.rst195
-rw-r--r--Documentation/mm/page_owner.rst196
-rw-r--r--Documentation/mm/page_reclaim.rst5
-rw-r--r--Documentation/mm/page_table_check.rst56
-rw-r--r--Documentation/mm/page_tables.rst5
-rw-r--r--Documentation/mm/physical_memory.rst5
-rw-r--r--Documentation/mm/process_addrs.rst5
-rw-r--r--Documentation/mm/remap_file_pages.rst33
-rw-r--r--Documentation/mm/shmfs.rst5
-rw-r--r--Documentation/mm/slab.rst5
-rw-r--r--Documentation/mm/slub.rst452
-rw-r--r--Documentation/mm/split_page_table_lock.rst100
-rw-r--r--Documentation/mm/swap.rst5
-rw-r--r--Documentation/mm/transhuge.rst187
-rw-r--r--Documentation/mm/unevictable-lru.rst554
-rw-r--r--Documentation/mm/vmalloc.rst5
-rw-r--r--Documentation/mm/vmalloced-kernel-stacks.rst153
-rw-r--r--Documentation/mm/vmemmap_dedup.rst223
-rw-r--r--Documentation/mm/z3fold.rst30
-rw-r--r--Documentation/mm/zsmalloc.rst82
44 files changed, 5466 insertions, 0 deletions
diff --git a/Documentation/mm/active_mm.rst b/Documentation/mm/active_mm.rst
new file mode 100644
index 000000000000..6f8269c284ed
--- /dev/null
+++ b/Documentation/mm/active_mm.rst
@@ -0,0 +1,91 @@
+.. _active_mm:
+
+=========
+Active MM
+=========
+
+::
+
+ List: linux-kernel
+ Subject: Re: active_mm
+ From: Linus Torvalds <torvalds () transmeta ! com>
+ Date: 1999-07-30 21:36:24
+
+ Cc'd to linux-kernel, because I don't write explanations all that often,
+ and when I do I feel better about more people reading them.
+
+ On Fri, 30 Jul 1999, David Mosberger wrote:
+ >
+ > Is there a brief description someplace on how "mm" vs. "active_mm" in
+ > the task_struct are supposed to be used? (My apologies if this was
+ > discussed on the mailing lists---I just returned from vacation and
+ > wasn't able to follow linux-kernel for a while).
+
+ Basically, the new setup is:
+
+ - we have "real address spaces" and "anonymous address spaces". The
+ difference is that an anonymous address space doesn't care about the
+ user-level page tables at all, so when we do a context switch into an
+ anonymous address space we just leave the previous address space
+ active.
+
+ The obvious use for a "anonymous address space" is any thread that
+ doesn't need any user mappings - all kernel threads basically fall into
+ this category, but even "real" threads can temporarily say that for
+ some amount of time they are not going to be interested in user space,
+ and that the scheduler might as well try to avoid wasting time on
+ switching the VM state around. Currently only the old-style bdflush
+ sync does that.
+
+ - "tsk->mm" points to the "real address space". For an anonymous process,
+ tsk->mm will be NULL, for the logical reason that an anonymous process
+ really doesn't _have_ a real address space at all.
+
+ - however, we obviously need to keep track of which address space we
+ "stole" for such an anonymous user. For that, we have "tsk->active_mm",
+ which shows what the currently active address space is.
+
+ The rule is that for a process with a real address space (ie tsk->mm is
+ non-NULL) the active_mm obviously always has to be the same as the real
+ one.
+
+ For a anonymous process, tsk->mm == NULL, and tsk->active_mm is the
+ "borrowed" mm while the anonymous process is running. When the
+ anonymous process gets scheduled away, the borrowed address space is
+ returned and cleared.
+
+ To support all that, the "struct mm_struct" now has two counters: a
+ "mm_users" counter that is how many "real address space users" there are,
+ and a "mm_count" counter that is the number of "lazy" users (ie anonymous
+ users) plus one if there are any real users.
+
+ Usually there is at least one real user, but it could be that the real
+ user exited on another CPU while a lazy user was still active, so you do
+ actually get cases where you have a address space that is _only_ used by
+ lazy users. That is often a short-lived state, because once that thread
+ gets scheduled away in favour of a real thread, the "zombie" mm gets
+ released because "mm_count" becomes zero.
+
+ Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any
+ more. "init_mm" should be considered just a "lazy context when no other
+ context is available", and in fact it is mainly used just at bootup when
+ no real VM has yet been created. So code that used to check
+
+ if (current->mm == &init_mm)
+
+ should generally just do
+
+ if (!current->mm)
+
+ instead (which makes more sense anyway - the test is basically one of "do
+ we have a user context", and is generally done by the page fault handler
+ and things like that).
+
+ Anyway, I put a pre-patch-2.3.13-1 on ftp.kernel.org just a moment ago,
+ because it slightly changes the interfaces to accommodate the alpha (who
+ would have thought it, but the alpha actually ends up having one of the
+ ugliest context switch codes - unlike the other architectures where the MM
+ and register state is separate, the alpha PALcode joins the two, and you
+ need to switch both together).
+
+ (From http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst
new file mode 100644
index 000000000000..cbaee9e59241
--- /dev/null
+++ b/Documentation/mm/arch_pgtable_helpers.rst
@@ -0,0 +1,260 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _arch_page_table_helpers:
+
+===============================
+Architecture Page Table Helpers
+===============================
+
+Generic MM expects architectures (with MMU) to provide helpers to create, access
+and modify page table entries at various level for different memory functions.
+These page table helpers need to conform to a common semantics across platforms.
+Following tables describe the expected semantics which can also be tested during
+boot via CONFIG_DEBUG_VM_PGTABLE option. All future changes in here or the debug
+test need to be in sync.
+
+
+PTE Page Table Helpers
+======================
+
++---------------------------+--------------------------------------------------+
+| pte_same | Tests whether both PTE entries are the same |
++---------------------------+--------------------------------------------------+
+| pte_bad | Tests a non-table mapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_present | Tests a valid mapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_young | Tests a young PTE |
++---------------------------+--------------------------------------------------+
+| pte_dirty | Tests a dirty PTE |
++---------------------------+--------------------------------------------------+
+| pte_write | Tests a writable PTE |
++---------------------------+--------------------------------------------------+
+| pte_special | Tests a special PTE |
++---------------------------+--------------------------------------------------+
+| pte_protnone | Tests a PROT_NONE PTE |
++---------------------------+--------------------------------------------------+
+| pte_devmap | Tests a ZONE_DEVICE mapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_soft_dirty | Tests a soft dirty PTE |
++---------------------------+--------------------------------------------------+
+| pte_swp_soft_dirty | Tests a soft dirty swapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkyoung | Creates a young PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkold | Creates an old PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkdirty | Creates a dirty PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkclean | Creates a clean PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkwrite | Creates a writable PTE |
++---------------------------+--------------------------------------------------+
+| pte_wrprotect | Creates a write protected PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkspecial | Creates a special PTE |
++---------------------------+--------------------------------------------------+
+| pte_mkdevmap | Creates a ZONE_DEVICE mapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_mksoft_dirty | Creates a soft dirty PTE |
++---------------------------+--------------------------------------------------+
+| pte_clear_soft_dirty | Clears a soft dirty PTE |
++---------------------------+--------------------------------------------------+
+| pte_swp_mksoft_dirty | Creates a soft dirty swapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_swp_clear_soft_dirty | Clears a soft dirty swapped PTE |
++---------------------------+--------------------------------------------------+
+| pte_mknotpresent | Invalidates a mapped PTE |
++---------------------------+--------------------------------------------------+
+| ptep_clear | Clears a PTE |
++---------------------------+--------------------------------------------------+
+| ptep_get_and_clear | Clears and returns PTE |
++---------------------------+--------------------------------------------------+
+| ptep_get_and_clear_full | Clears and returns PTE (batched PTE unmap) |
++---------------------------+--------------------------------------------------+
+| ptep_test_and_clear_young | Clears young from a PTE |
++---------------------------+--------------------------------------------------+
+| ptep_set_wrprotect | Converts into a write protected PTE |
++---------------------------+--------------------------------------------------+
+| ptep_set_access_flags | Converts into a more permissive PTE |
++---------------------------+--------------------------------------------------+
+
+
+PMD Page Table Helpers
+======================
+
++---------------------------+--------------------------------------------------+
+| pmd_same | Tests whether both PMD entries are the same |
++---------------------------+--------------------------------------------------+
+| pmd_bad | Tests a non-table mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_leaf | Tests a leaf mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_huge | Tests a HugeTLB mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_trans_huge | Tests a Transparent Huge Page (THP) at PMD |
++---------------------------+--------------------------------------------------+
+| pmd_present | Tests a valid mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_young | Tests a young PMD |
++---------------------------+--------------------------------------------------+
+| pmd_dirty | Tests a dirty PMD |
++---------------------------+--------------------------------------------------+
+| pmd_write | Tests a writable PMD |
++---------------------------+--------------------------------------------------+
+| pmd_special | Tests a special PMD |
++---------------------------+--------------------------------------------------+
+| pmd_protnone | Tests a PROT_NONE PMD |
++---------------------------+--------------------------------------------------+
+| pmd_devmap | Tests a ZONE_DEVICE mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_soft_dirty | Tests a soft dirty PMD |
++---------------------------+--------------------------------------------------+
+| pmd_swp_soft_dirty | Tests a soft dirty swapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkyoung | Creates a young PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkold | Creates an old PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkdirty | Creates a dirty PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkclean | Creates a clean PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkwrite | Creates a writable PMD |
++---------------------------+--------------------------------------------------+
+| pmd_wrprotect | Creates a write protected PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkspecial | Creates a special PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkdevmap | Creates a ZONE_DEVICE mapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mksoft_dirty | Creates a soft dirty PMD |
++---------------------------+--------------------------------------------------+
+| pmd_clear_soft_dirty | Clears a soft dirty PMD |
++---------------------------+--------------------------------------------------+
+| pmd_swp_mksoft_dirty | Creates a soft dirty swapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
++---------------------------+--------------------------------------------------+
+| pmd_mkinvalid | Invalidates a mapped PMD [1] |
++---------------------------+--------------------------------------------------+
+| pmd_set_huge | Creates a PMD huge mapping |
++---------------------------+--------------------------------------------------+
+| pmd_clear_huge | Clears a PMD huge mapping |
++---------------------------+--------------------------------------------------+
+| pmdp_get_and_clear | Clears a PMD |
++---------------------------+--------------------------------------------------+
+| pmdp_get_and_clear_full | Clears a PMD |
++---------------------------+--------------------------------------------------+
+| pmdp_test_and_clear_young | Clears young from a PMD |
++---------------------------+--------------------------------------------------+
+| pmdp_set_wrprotect | Converts into a write protected PMD |
++---------------------------+--------------------------------------------------+
+| pmdp_set_access_flags | Converts into a more permissive PMD |
++---------------------------+--------------------------------------------------+
+
+
+PUD Page Table Helpers
+======================
+
++---------------------------+--------------------------------------------------+
+| pud_same | Tests whether both PUD entries are the same |
++---------------------------+--------------------------------------------------+
+| pud_bad | Tests a non-table mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_leaf | Tests a leaf mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_huge | Tests a HugeTLB mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_trans_huge | Tests a Transparent Huge Page (THP) at PUD |
++---------------------------+--------------------------------------------------+
+| pud_present | Tests a valid mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_young | Tests a young PUD |
++---------------------------+--------------------------------------------------+
+| pud_dirty | Tests a dirty PUD |
++---------------------------+--------------------------------------------------+
+| pud_write | Tests a writable PUD |
++---------------------------+--------------------------------------------------+
+| pud_devmap | Tests a ZONE_DEVICE mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkyoung | Creates a young PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkold | Creates an old PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkdirty | Creates a dirty PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkclean | Creates a clean PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkwrite | Creates a writable PUD |
++---------------------------+--------------------------------------------------+
+| pud_wrprotect | Creates a write protected PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD |
++---------------------------+--------------------------------------------------+
+| pud_mkinvalid | Invalidates a mapped PUD [1] |
++---------------------------+--------------------------------------------------+
+| pud_set_huge | Creates a PUD huge mapping |
++---------------------------+--------------------------------------------------+
+| pud_clear_huge | Clears a PUD huge mapping |
++---------------------------+--------------------------------------------------+
+| pudp_get_and_clear | Clears a PUD |
++---------------------------+--------------------------------------------------+
+| pudp_get_and_clear_full | Clears a PUD |
++---------------------------+--------------------------------------------------+
+| pudp_test_and_clear_young | Clears young from a PUD |
++---------------------------+--------------------------------------------------+
+| pudp_set_wrprotect | Converts into a write protected PUD |
++---------------------------+--------------------------------------------------+
+| pudp_set_access_flags | Converts into a more permissive PUD |
++---------------------------+--------------------------------------------------+
+
+
+HugeTLB Page Table Helpers
+==========================
+
++---------------------------+--------------------------------------------------+
+| pte_huge | Tests a HugeTLB |
++---------------------------+--------------------------------------------------+
+| pte_mkhuge | Creates a HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_pte_dirty | Tests a dirty HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_pte_write | Tests a writable HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_pte_mkdirty | Creates a dirty HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_pte_mkwrite | Creates a writable HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_pte_wrprotect | Creates a write protected HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_ptep_get_and_clear | Clears a HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_ptep_set_wrprotect | Converts into a write protected HugeTLB |
++---------------------------+--------------------------------------------------+
+| huge_ptep_set_access_flags | Converts into a more permissive HugeTLB |
++---------------------------+--------------------------------------------------+
+
+
+SWAP Page Table Helpers
+========================
+
++---------------------------+--------------------------------------------------+
+| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE |
++---------------------------+--------------------------------------------------+
+| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) |
++---------------------------+--------------------------------------------------+
+| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD |
++---------------------------+--------------------------------------------------+
+| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) |
++---------------------------+--------------------------------------------------+
+| is_migration_entry | Tests a migration (read or write) swapped entry |
++-------------------------------+----------------------------------------------+
+| is_writable_migration_entry | Tests a write migration swapped entry |
++-------------------------------+----------------------------------------------+
+| make_readable_migration_entry | Creates a read migration swapped entry |
++-------------------------------+----------------------------------------------+
+| make_writable_migration_entry | Creates a write migration swapped entry |
++-------------------------------+----------------------------------------------+
+
+[1] https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/
diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst
new file mode 100644
index 000000000000..6a1fadf3e173
--- /dev/null
+++ b/Documentation/mm/balance.rst
@@ -0,0 +1,102 @@
+.. _balance:
+
+================
+Memory Balancing
+================
+
+Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
+
+Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
+well as for non __GFP_IO allocations.
+
+The first reason why a caller may avoid reclaim is that the caller can not
+sleep due to holding a spinlock or is in interrupt context. The second may
+be that the caller is willing to fail the allocation without incurring the
+overhead of page reclaim. This may happen for opportunistic high-order
+allocation requests that have order-0 fallback options. In such cases,
+the caller may also wish to avoid waking kswapd.
+
+__GFP_IO allocation requests are made to prevent file system deadlocks.
+
+In the absence of non sleepable allocation requests, it seems detrimental
+to be doing balancing. Page reclamation can be kicked off lazily, that
+is, only when needed (aka zone free memory is 0), instead of making it
+a proactive process.
+
+That being said, the kernel should try to fulfill requests for direct
+mapped pages from the direct mapped pool, instead of falling back on
+the dma pool, so as to keep the dma pool filled for dma requests (atomic
+or not). A similar argument applies to highmem and direct mapped pages.
+OTOH, if there is a lot of free dma pages, it is preferable to satisfy
+regular memory requests by allocating one from the dma pool, instead
+of incurring the overhead of regular zone balancing.
+
+In 2.2, memory balancing/page reclamation would kick off only when the
+_total_ number of free pages fell below 1/64 th of total memory. With the
+right ratio of dma and regular memory, it is quite possible that balancing
+would not be done even when the dma zone was completely empty. 2.2 has
+been running production machines of varying memory sizes, and seems to be
+doing fine even with the presence of this problem. In 2.3, due to
+HIGHMEM, this problem is aggravated.
+
+In 2.3, zone balancing can be done in one of two ways: depending on the
+zone size (and possibly of the size of lower class zones), we can decide
+at init time how many free pages we should aim for while balancing any
+zone. The good part is, while balancing, we do not need to look at sizes
+of lower class zones, the bad part is, we might do too frequent balancing
+due to ignoring possibly lower usage in the lower class zones. Also,
+with a slight change in the allocation routine, it is possible to reduce
+the memclass() macro to be a simple equality.
+
+Another possible solution is that we balance only when the free memory
+of a zone _and_ all its lower class zones falls below 1/64th of the
+total memory in the zone and its lower class zones. This fixes the 2.2
+balancing problem, and stays as close to 2.2 behavior as possible. Also,
+the balancing algorithm works the same way on the various architectures,
+which have different numbers and types of zones. If we wanted to get
+fancy, we could assign different weights to free pages in different
+zones in the future.
+
+Note that if the size of the regular zone is huge compared to dma zone,
+it becomes less significant to consider the free dma pages while
+deciding whether to balance the regular zone. The first solution
+becomes more attractive then.
+
+The appended patch implements the second solution. It also "fixes" two
+problems: first, kswapd is woken up as in 2.2 on low memory conditions
+for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
+so as to give a fighting chance for replace_with_highmem() to get a
+HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
+fall back into regular zone. This also makes sure that HIGHMEM pages
+are not leaked (for example, in situations where a HIGHMEM page is in
+the swapcache but is not being used by anyone)
+
+kswapd also needs to know about the zones it should balance. kswapd is
+primarily needed in a situation where balancing can not be done,
+probably because all allocation requests are coming from intr context
+and all process contexts are sleeping. For 2.3, kswapd does not really
+need to balance the highmem zone, since intr context does not request
+highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
+structure to decide whether a zone needs balancing.
+
+Page stealing from process memory and shm is done if stealing the page would
+alleviate memory pressure on any zone in the page's node that has fallen below
+its watermark.
+
+wat