// SPDX-License-Identifier: GPL-2.0
/*
* Pressure stall information for CPU, memory and IO
*
* Copyright (c) 2018 Facebook, Inc.
* Author: Johannes Weiner <hannes@cmpxchg.org>
*
* Polling support by Suren Baghdasaryan <surenb@google.com>
* Copyright (c) 2018 Google, Inc.
*
* When CPU, memory and IO are contended, tasks experience delays that
* reduce throughput and introduce latencies into the workload. Memory
* and IO contention, in addition, can cause a full loss of forward
* progress in which the CPU goes idle.
*
* This code aggregates individual task delays into resource pressure
* metrics that indicate problems with both workload health and
* resource utilization.
*
* Model
*
* The time in which a task can execute on a CPU is our baseline for
* productivity. Pressure expresses the amount of time in which this
* potential cannot be realized due to resource contention.
*
* This concept of productivity has two components: the workload and
* the CPU. To measure the impact of pressure on both, we define two
* contention states for a resource: SOME and FULL.
*
* In the SOME state of a given resource, one or more tasks are
* delayed on that resource. This affects the workload's ability to
* perform work, but the CPU may still be executing other tasks.
*
* In the FULL state of a given resource, all non-idle tasks are
* delayed on that resource such that nobody is advancing and the CPU
* goes idle. This leaves both workload and CPU unproductive.
*
* SOME = nr_delayed_tasks != 0
* FULL = nr_delayed_tasks != 0 && nr_productive_tasks == 0
*
* What it means for a task to be productive is defined differently
* for each resource. For IO, productive means a running task. For
* memory, productive means a running task that isn't a reclaimer. For
* CPU, productive means an on-CPU task.
*
* Naturally, the FULL state doesn't exist for the CPU resource at the
* system level, but exist at the cgroup level. At the cgroup level,
* FULL means all non-idle tasks in the cgroup are delayed on the CPU
* resource which is being used by others outside of the cgroup or
* throttled by the cgroup cpu.max configuration.
*
* The percentage of wall clock time spent in those compound stall
* states gives pressure numbers between 0 and 100 for each resource,
* where the SOME percentage indicates workload slowdowns and the FULL
* percentage indicates reduced CPU utilization:
*
* %SOME = time(SOME) / period
* %FULL = time(FULL) / period
*
* Multiple CPUs
*
* The more tasks and availabl
|