summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/cgroups/unified-hierarchy.txt79
-rw-r--r--Documentation/filesystems/proc.txt23
-rw-r--r--Documentation/sysctl/vm.txt12
-rw-r--r--Documentation/vm/pagemap.txt8
-rw-r--r--arch/alpha/include/asm/pgtable.h2
-rw-r--r--arch/arc/include/asm/pgtable.h2
-rw-r--r--arch/arm/include/asm/pgtable-2level.h2
-rw-r--r--arch/arm/include/asm/pgtable-nommu.h2
-rw-r--r--arch/arm/mm/hugetlbpage.c6
-rw-r--r--arch/arm/mm/pgd.c4
-rw-r--r--arch/arm64/include/asm/pgtable.h2
-rw-r--r--arch/arm64/mm/hugetlbpage.c6
-rw-r--r--arch/avr32/include/asm/pgtable.h2
-rw-r--r--arch/cris/include/asm/pgtable.h2
-rw-r--r--arch/frv/include/asm/pgtable.h2
-rw-r--r--arch/hexagon/include/asm/pgtable.h2
-rw-r--r--arch/ia64/include/asm/pgtable.h2
-rw-r--r--arch/ia64/mm/hugetlbpage.c6
-rw-r--r--arch/m32r/include/asm/pgtable.h2
-rw-r--r--arch/m68k/include/asm/pgtable_mm.h2
-rw-r--r--arch/metag/mm/hugetlbpage.c6
-rw-r--r--arch/microblaze/include/asm/pgtable.h4
-rw-r--r--arch/mips/include/asm/pgtable-32.h2
-rw-r--r--arch/mips/mm/gup.c8
-rw-r--r--arch/mips/mm/hugetlbpage.c18
-rw-r--r--arch/mn10300/include/asm/pgtable.h2
-rw-r--r--arch/nios2/include/asm/pgtable.h2
-rw-r--r--arch/openrisc/include/asm/pgtable.h2
-rw-r--r--arch/parisc/include/asm/pgtable.h2
-rw-r--r--arch/powerpc/include/asm/pgtable-ppc32.h2
-rw-r--r--arch/powerpc/include/asm/pgtable-ppc64.h2
-rw-r--r--arch/powerpc/mm/hugetlbpage.c8
-rw-r--r--arch/powerpc/mm/subpage-prot.c6
-rw-r--r--arch/s390/include/asm/pgtable.h2
-rw-r--r--arch/s390/mm/gup.c6
-rw-r--r--arch/s390/mm/hugetlbpage.c20
-rw-r--r--arch/score/include/asm/pgtable.h2
-rw-r--r--arch/sh/include/asm/pgtable.h2
-rw-r--r--arch/sh/mm/gup.c6
-rw-r--r--arch/sh/mm/hugetlbpage.c12
-rw-r--r--arch/sparc/include/asm/pgtable_32.h5
-rw-r--r--arch/sparc/include/asm/pgtable_64.h2
-rw-r--r--arch/sparc/mm/gup.c6
-rw-r--r--arch/sparc/mm/hugetlbpage.c12
-rw-r--r--arch/tile/include/asm/pgtable.h2
-rw-r--r--arch/tile/mm/hugetlbpage.c28
-rw-r--r--arch/um/include/asm/pgtable-2level.h2
-rw-r--r--arch/um/include/asm/pgtable-3level.h2
-rw-r--r--arch/unicore32/mm/pgd.c3
-rw-r--r--arch/x86/include/asm/pgtable_types.h2
-rw-r--r--arch/x86/mm/gup.c9
-rw-r--r--arch/x86/mm/hugetlbpage.c20
-rw-r--r--arch/x86/mm/pgtable.c14
-rw-r--r--arch/xtensa/include/asm/pgtable.h2
-rw-r--r--drivers/media/pci/ivtv/ivtv-udma.c6
-rw-r--r--drivers/scsi/st.c7
-rw-r--r--drivers/staging/android/lowmemorykiller.c7
-rw-r--r--drivers/tty/sysrq.c23
-rw-r--r--drivers/video/fbdev/pvr2fb.c6
-rw-r--r--fs/btrfs/extent_io.c2
-rw-r--r--fs/proc/page.c16
-rw-r--r--fs/proc/task_mmu.c218
-rw-r--r--include/asm-generic/4level-fixup.h1
-rw-r--r--include/linux/compaction.h86
-rw-r--r--include/linux/gfp.h12
-rw-r--r--include/linux/huge_mm.h12
-rw-r--r--include/linux/hugetlb.h8
-rw-r--r--include/linux/kvm_host.h11
-rw-r--r--include/linux/memcontrol.h50
-rw-r--r--include/linux/mm.h69
-rw-r--r--include/linux/mm_types.h11
-rw-r--r--include/linux/mmzone.h15
-rw-r--r--include/linux/oom.h18
-rw-r--r--include/linux/page_counter.h3
-rw-r--r--include/linux/page_ext.h2
-rw-r--r--include/linux/swap.h15
-rw-r--r--include/linux/swapops.h4
-rw-r--r--include/trace/events/compaction.h209
-rw-r--r--include/trace/events/kmem.h7
-rw-r--r--include/uapi/linux/kernel-page-flags.h1
-rw-r--r--kernel/exit.c3
-rw-r--r--kernel/fork.c11
-rw-r--r--kernel/power/process.c75
-rw-r--r--mm/cma.c2
-rw-r--r--mm/compaction.c156
-rw-r--r--mm/debug.c3
-rw-r--r--mm/gup.c228
-rw-r--r--mm/huge_memory.c106
-rw-r--r--mm/hugetlb.c158
-rw-r--r--mm/hugetlb_cgroup.c2
-rw-r--r--mm/internal.h22
-rw-r--r--mm/memcontrol.c702
-rw-r--r--mm/memory.c15
-rw-r--r--mm/mempolicy.c277
-rw-r--r--mm/migrate.c5
-rw-r--r--mm/mincore.c166
-rw-r--r--mm/mmap.c7
-rw-r--r--mm/mmzone.c4
-rw-r--r--mm/nommu.c37
-rw-r--r--mm/oom_kill.c169
-rw-r--r--mm/page-writeback.c17
-rw-r--r--mm/page_alloc.c432
-rw-r--r--mm/page_counter.c7
-rw-r--r--mm/page_owner.c26
-rw-r--r--mm/pagewalk.c238
-rw-r--r--mm/process_vm_access.c7
-rw-r--r--mm/rmap.c12
-rw-r--r--mm/shmem.c2
-rw-r--r--mm/util.c10
-rw-r--r--mm/vmscan.c32
-rw-r--r--mm/vmstat.c6
-rw-r--r--net/ceph/pagevec.c6
-rw-r--r--net/ipv4/tcp_memcontrol.c2
-rw-r--r--tools/vm/page-types.c1
-rw-r--r--virt/kvm/async_pf.c2
-rw-r--r--virt/kvm/kvm_main.c50
116 files changed, 2491 insertions, 1717 deletions
diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 4f4563277864..71daa35ec2d9 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -327,6 +327,85 @@ supported and the interface files "release_agent" and
- use_hierarchy is on by default and the cgroup file for the flag is
not created.
+- The original lower boundary, the soft limit, is defined as a limit
+ that is per default unset. As a result, the set of cgroups that
+ global reclaim prefers is opt-in, rather than opt-out. The costs
+ for optimizing these mostly negative lookups are so high that the
+ implementation, despite its enormous size, does not even provide the
+ basic desirable behavior. First off, the soft limit has no
+ hierarchical meaning. All configured groups are organized in a
+ global rbtree and treated like equal peers, regardless where they
+ are located in the hierarchy. This makes subtree delegation
+ impossible. Second, the soft limit reclaim pass is so aggressive
+ that it not just introduces high allocation latencies into the
+ system, but also impacts system performance due to overreclaim, to
+ the point where the feature becomes self-defeating.
+
+ The memory.low boundary on the other hand is a top-down allocated
+ reserve. A cgroup enjoys reclaim protection when it and all its
+ ancestors are below their low boundaries, which makes delegation of
+ subtrees possible. Secondly, new cgroups have no reserve per
+ default and in the common case most cgroups are eligible for the
+ preferred reclaim pass. This allows the new low boundary to be
+ efficiently implemented with just a minor addition to the generic
+ reclaim code, without the need for out-of-band data structures and
+ reclaim passes. Because the generic reclaim code considers all
+ cgroups except for the ones running low in the preferred first
+ reclaim pass, overreclaim of individual groups is eliminated as
+ well, resulting in much better overall workload performance.
+
+- The original high boundary, the hard limit, is defined as a strict
+ limit that can not budge, even if the OOM killer has to be called.
+ But this generally goes against the goal of making the most out of
+ the available memory. The memory consumption of workloads varies
+ during runtime, and that requires users to overcommit. But doing
+ that with a strict upper limit requires either a fairly accurate
+ prediction of the working set size or adding slack to the limit.
+ Since working set size estimation is hard and error prone, and
+ getting it wrong results in OOM kills, most users tend to err on the
+ side of a looser limit and end up wasting precious resources.
+
+ The memory.high boundary on the other hand can be set much more
+ conservatively. When hit, it throttles allocations by forcing them
+ into direct reclaim to work off the excess, but it never invokes the
+ OOM killer. As a result, a high boundary that is chosen too
+ aggressively will not terminate the processes, but instead it will
+ lead to gradual performance degradation. The user can monitor this
+ and make corrections until the minimal memory footprint that still
+ gives acceptable performance is found.
+
+ In extreme cases, with many concurrent allocations and a complete
+ breakdown of reclaim progress within the group, the high boundary
+ can be exceeded. But even then it's mostly better to satisfy the
+ allocation from the slack available in other groups or the rest of
+ the system than killing the group. Otherwise, memory.max is there
+ to limit this type of spillover and ultimately contain buggy or even
+ malicious applications.
+
+- The original control file names are unwieldy and inconsistent in
+ many different ways. For example, the upper boundary hit count is
+ exported in the memory.failcnt file, but an OOM event count has to
+ be manually counted by listening to memory.oom_control events, and
+ lower boundary / soft limit events have to be counted by first
+ setting a threshold for that value and then counting those events.
+ Also, usage and limit files encode their units in the filename.
+ That makes the filenames very long, even though this is not
+ information that a user needs to be reminded of every time they type
+ out those names.
+
+ To address these naming issues, as well as to signal clearly that
+ the new interface carries a new configuration model, the naming
+ conventions in it necessarily differ from the old interface.
+
+- The original limit files indicate the state of an unset limit with a
+ Very High Number, and a configured limit can be unset by echoing -1
+ into those files. But that very high number is implementation and
+ architecture dependent and not very descriptive. And while -1 can
+ be understood as an underflow into the highest possible value, -2 or
+ -10M etc. do not work, so it's not consistent.
+
+ memory.low, memory.high, and memory.max will use the string
+ "infinity" to indicate and set the highest possible value.
5. Planned Changes
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 79b3cc821e7b..cf8fc2f0b34b 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -42,6 +42,7 @@ Table of Contents
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
3.7 /proc/<pid>/task/<tid>/children - Information about task children
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
+ 3.9 /proc/<pid>/map_files - Information about memory mapped files
4 Configuring procfs
4.1 Mount options
@@ -1763,6 +1764,28 @@ pair provide additional information particular to the objects they represent.
with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
still exhibits timer's remaining time.
+3.9 /proc/<pid>/map_files - Information about memory mapped files
+---------------------------------------------------------------------
+This directory contains symbolic links which represent memory mapped files
+the process is maintaining. Example output: