linux.git/kernel/cgroup/cpuset.c, branch v6.6.39

sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level

2024-06-12T09:12:13+00:00

[ Upstream commit a1fd0b9d751f840df23ef0e75b691fc00cfd4743 ]

Change relax_domain_level checks so that it would be possible
to include or exclude all domains from newidle balancing.

This matches the behavior described in the documentation:

  -1   no request. use system default or follow request of others.
   0   no search.
   1   search siblings (hyperthreads in a core).

"2" enables levels 0 and 1, level_max excludes the last (level_max)
level, and level_max+1 includes all levels.

Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core")
Signed-off-by: Vitalii Bursov 
Signed-off-by: Ingo Molnar 
Tested-by: Dietmar Eggemann 
Reviewed-by: Vincent Guittot 
Reviewed-by: Valentin Schneider 
Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com
Signed-off-by: Sasha Levin

cgroup/cpuset: Fix retval in update_cpumask()

2024-04-03T13:28:40+00:00

commit 25125a4762835d62ba1e540c1351d447fc1f6c7c upstream.

The update_cpumask(), checks for newly requested cpumask by calling
validate_change(), which returns an error on passing an invalid set
of cpu(s). Independent of the error returned, update_cpumask() always
returns zero, suppressing the error and returning success to the user
on writing an invalid cpu range for a cpuset. Fix it by returning
retval instead, which is returned by validate_change().

Fixes: 99fe36ba6fc1 ("cgroup/cpuset: Improve temporary cpumasks handling")
Signed-off-by: Kamalesh Babulal 
Reviewed-by: Waiman Long 
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Tejun Heo 
Signed-off-by: Greg Kroah-Hartman

cgroup/cpuset: Fix load balance state in update_partition_sd_lb()

2023-11-20T10:58:53+00:00

[ Upstream commit 6fcdb0183bf024a70abccb0439321c25891c708d ]

Commit a86ce68078b2 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE
& CS_SCHED_LOAD_BALANCE handling") adds a new helper function
update_partition_sd_lb() to update the load balance state of the
cpuset. However the new load balance is determined by just looking at
whether the cpuset is a valid isolated partition root or not.  That is
not enough if the cpuset is not a valid partition root but its parent
is in the isolated state (load balance off). Update the function to
set the new state to be the same as its parent in this case like what
has been done in commit c8c926200c55 ("cgroup/cpuset: Inherit parent's
load balance state in v2").

Fixes: a86ce68078b2 ("cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling")
Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo 
Signed-off-by: Sasha Levin

cgroup/cpuset: fix kernel-doc

2023-08-02T19:37:03+00:00

Add kernel-doc of param @rotor to fix warnings:

kernel/cgroup/cpuset.c:4162: warning: Function parameter or member
'rotor' not described in 'cpuset_spread_node'
kernel/cgroup/cpuset.c:3771: warning: Function parameter or member
'work' not described in 'cpuset_hotplug_workfn'

Signed-off-by: Cai Xinchen 
Acked-by: Waiman Long 
Signed-off-by: Tejun Heo

cgroup/cpuset: Allow suppression of sched domain rebuild in update_cpumasks_hier()

2023-07-10T21:01:23+00:00

A single partition setup and tear-down operation can lead to
multiple rebuild_sched_domains_locked() calls which is a waste of
effort. This can partly be mitigated by adding a flag to suppress the
rebuild_sched_domains_locked() call in update_cpumasks_hier(). Since
a Boolean flag has already been passed as the 3rd argument to
update_cpumasks_hier(), we can extend that to a full flag word.

The sched domain rebuild suppression is now enabled in
update_sibling_cpumasks() as all it callers will do the sched domain
rebuild after its return later on anyway.

Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo

cgroup/cpuset: Improve temporary cpumasks handling

2023-07-10T21:01:03+00:00

The limitation that update_parent_subparts_cpumask() can only use
addmask & delmask in the given tmp cpumasks is fragile and may lead to
unexpected error.

Fix this problem by allocating/freeing a struct tmpmasks in
update_cpumask() to avoid reusing the cpumasks in trial_cs.

With this change, we can move the update_tasks_cpumask() for the
parent and update_sibling_cpumasks() for the sibling to inside
update_parent_subparts_cpumask().

Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo

cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling

2023-07-10T21:00:12+00:00

Extract out the setting of CS_CPU_EXCLUSIVE and CS_SCHED_LOAD_BALANCE
flags as well as the rebuilding of scheduling domains into the new
update_partition_exclusive() and update_partition_sd_lb() helper
functions to simplify the logic. The update_partition_exclusive()
helper is called mainly at the beginning of the caller, but it may be
called at the end too. The update_partition_sd_lb() helper is called
at the end of the caller.

This patch should reduce the chance that cpuset partition will end up
in an incorrect state.

Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo

cgroup/cpuset: Inherit parent's load balance state in v2

2023-07-10T20:59:27+00:00

Since commit f28e22441f35 ("cgroup/cpuset: Add a new isolated
cpus.partition type"), the CS_SCHED_LOAD_BALANCE bit of a v2 cpuset
can be on or off. The child cpusets of a partition root must have the
same setting as its parent or it may screw up the rebuilding of sched
domains. Fix this problem by making sure the a child v2 cpuset will
follows its parent cpuset load balance state unless the child cpuset
is a new partition root itself.

Fixes: f28e22441f35 ("cgroup/cpuset: Add a new isolated cpus.partition type")
Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo

cpuset: Allow setscheduler regardless of manipulated task

2023-07-10T20:28:43+00:00

When we migrate a task between two cgroups, one of the checks is a
verification whether we can modify task's scheduler settings
(cap_task_setscheduler()).

An implicit migration occurs also when enabling a controller on the
unified hierarchy (think of parent to child migration). The
aforementioned check may be problematic if the caller of the migration
(enabling a controller) has no permissions over migrated tasks.
For instance, a user's cgroup that ends up running a process of a
different user. Although cgroup permissions are configured favorably,
the enablement fails due to the foreign process [1].

Change the behavior by relaxing the permissions check on the unified
hierarchy when no effective change would happen.
This is in accordance with unified hierarchy attachment behavior when
permissions of the source to target cgroups are decisive whereas the
migrated task is opaque (as opposed to more restrictive check in
__cgroup1_procs_write()).

Notice that foreign task's affinity may still be modified if the user
can modify destination cgroup's cpuset attributes
(update_tasks_cpumask() does no permissions check). The permissions
check could thus be skipped on v2 even when affinity changes. Stay
conservative in this patch though.

[1] https://github.com/systemd/systemd/issues/18293#issuecomment-831205649

Signed-off-by: Michal Koutný 
Reviewed-by: Waiman Long 
Signed-off-by: Tejun Heo

cgroup/cpuset: avoid unneeded cpuset_mutex re-lock

2023-07-10T20:26:06+00:00

cpuset_mutex unlock and lock pair is only needed when transferring tasks
out of empty cpuset. Avoid unneeded cpuset_mutex re-lock when !is_empty
to save cpu cycles.

Signed-off-by: Miaohe Lin 
Reviewed-by: Waiman Long 
Signed-off-by: Tejun Heo