Control Group v2
October, 2015 Tejun Heo <tj@kernel.org>
This is the authoritative documentation on the design, interface and
conventions of cgroup v2. It describes all userland-visible aspects
of cgroup including core and specific controller behaviors. All
future changes must be reflected in this document. Documentation for
v1 is available under Documentation/cgroup-v1/.
CONTENTS
1. Introduction
1-1. Terminology
1-2. What is cgroup?
2. Basic Operations
2-1. Mounting
2-2. Organizing Processes
2-3. [Un]populated Notification
2-4. Controlling Controllers
2-4-1. Enabling and Disabling
2-4-2. Top-down Constraint
2-4-3. No Internal Process Constraint
2-5. Delegation
2-5-1. Model of Delegation
2-5-2. Delegation Containment
2-6. Guidelines
2-6-1. Organize Once and Control
2-6-2. Avoid Name Collisions
3. Resource Distribution Models
3-1. Weights
3-2. Limits
3-3. Protections
3-4. Allocations
4. Interface Files
4-1. Format
4-2. Conventions
4-3. Core Interface Files
5. Controllers
5-1. CPU
5-1-1. CPU Interface Files
5-2. Memory
5-2-1. Memory Interface Files
5-2-2. Usage Guidelines
5-2-3. Memory Ownership
5-3. IO
5-3-1. IO Interface Files
5-3-2. Writeback
6. Namespace
6-1. Basics
6-2. The Root and Views
6-3. Migration and setns(2)
6-4. Interaction with Other Namespaces
P. Information on Kernel Programming
P-1. Filesystem Support for Writeback
D. Deprecated v1 Core Features
R. Issues with v1 and Rationales for v2
R-1. Multiple Hierarchies
R-2. Thread Granularity
R-3. Competition Between Inner Nodes and Threads
R-4. Other Interface Issues
R-5. Controller Issues and Remedies
R-5-1. Memory
1. Introduction
1-1. Terminology
"cgroup" stands for "control group" and is never capitalized. The
singular form is used to designate the whole feature and also as a
qualifier as in "cgroup controllers". When explicitly referring to
multiple individual control groups, the plural form "cgroups" is used.
1-2. What is cgroup?
cgroup is a mechanism to organize processes hierarchically and
distribute system resources along the hierarchy in a controlled and
configurable manner.
cgroup is largely composed of two parts - the core and controllers.
cgroup core is primarily responsible for hierarchically organizing
processes. A cgroup controller is usually responsible for
distributing a specific type of system resource along the hierarchy
although there are utility controllers which serve purposes other than
resource distribution.
cgroups form a tree structure and every process in the system belongs
to one and only one cgroup. All threads of a process belong to the
same cgroup. On creation, all processes are put in the cgroup that
the parent process belongs to at the time. A process can be migrated
to another cgroup. Migration of a process doesn't affect already
existing descendant processes.
Following certain structural constraints, controllers may be enabled or
disabled selectively on a cgroup. All controller behaviors are
hierarchical - if a controller is enabled on a cgroup, it affects all
processes which belong to the cgroups consisting the inclusive
sub-hierarchy of the cgroup. When a controller is enabled on a nested
cgroup, it always restricts the resource distribution further. The
restrictions set closer to the root in the hierarchy can not be
overridden from further away.
2. Basic Operations
2-1. Mounting
Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2
hierarchy can be mounted with the following mount command.
# mount -t cgroup2 none $MOUNT_POINT
cgroup2 filesystem has the magic number 0x63677270 ("cgrp"). All
controllers which support v2 and are not bound to a v1 hierarchy are
automatically bound to the v2 hierarchy and show up at the root.
Controllers which are not in active use in the v2 hierarchy can be
bound to other hierarchies. This allows mixing v2 hierarchy with the
legacy v1 multiple hierarchies in a fully backward compatible way.
A controller can be moved across hierarchies only after the controller
is no longer referenced in its current hierarchy. Because per-cgroup
controller states are destroyed asynchronously and controllers may
have lingering references, a controller may not show up immediately on
the v2 hierarchy after the final umount of the previous hierarchy.
Similarly, a controller should be fully disabled to be moved out of
the unified hierarchy and it may take some time for the disabled
controller to become available for other hierarchies; furthermore, due
to inter-controller dependencies, other controllers may need to be
disabled too.
While useful for development and manual configurations, moving
controllers dynamically between the v2 and other hierarchies is
strongly discouraged for production use. It is recommended to decide
the hierarchies and controller associations before starting using the
controllers after system boot.
During transition to v2, system management software might still
automount the v1 cgroup filesystem and so hijack all controllers
during boot, before manual intervention is possible. To make testing
and experimenting easier, the kernel parameter cgroup_no_v1= allows
disabling controllers in v1 and make them always available in v2.
2-2. Organizing Processes
Initially, only the root cgroup exists to which all processes belong.
A child cgroup can be created by creating a sub-directory.
# mkdir $CGROUP_NAME
A given cgroup may have multiple child cgroups forming a tree
structure. Each cgroup has a read-writable interface file
"cgroup.procs". When read, it lists the PIDs of all processes which
belong to the cgroup one-per-line. The PIDs are not ordered and the
same PID may show up more than once if the process got moved to
another cgroup and then back or the PID got recycled while reading.
A process can be migrated into a cgroup by writing its PID to the
target cgroup's "cgroup.procs" file. Only one process can be migrated
on a single write(2) call. If a process is composed of multiple
threads, writing the PID of any thread migrates all threads of the
process.
When a process forks a child process, the new process is born into the
cgroup that the forking process belongs to at the time of the
operation. After exit, a process stays associated with the cgroup
that it belonged to at the time of exi
|