From 15383a0d63dbcd63dc7e8d9ec1bf3a0f7ebf64ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Tue, 18 Mar 2025 17:14:37 +0100 Subject: landlock: Add the errata interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some fixes may require user space to check if they are applied on the running kernel before using a specific feature. For instance, this applies when a restriction was previously too restrictive and is now getting relaxed (e.g. for compatibility reasons). However, non-visible changes for legitimate use (e.g. security fixes) do not require an erratum. Because fixes are backported down to a specific Landlock ABI, we need a way to avoid cherry-pick conflicts. The solution is to only update a file related to the lower ABI impacted by this issue. All the ABI files are then used to create a bitmask of fixes. The new errata interface is similar to the one used to get the supported Landlock ABI version, but it returns a bitmask instead because the order of fixes may not match the order of versions, and not all fixes may apply to all versions. The actual errata will come with dedicated commits. The description is not actually used in the code but serves as documentation. Create the landlock_abi_version symbol and use its value to check errata consistency. Update test_base's create_ruleset_checks_ordering tests and add errata tests. This commit is backportable down to the first version of Landlock. Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features") Cc: Günther Noack Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net Signed-off-by: Mickaël Salaün --- include/uapi/linux/landlock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/uapi') diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index e1d2c27533b4..8806a132d7b8 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -57,9 +57,11 @@ struct landlock_ruleset_attr { * * - %LANDLOCK_CREATE_RULESET_VERSION: Get the highest supported Landlock ABI * version. + * - %LANDLOCK_CREATE_RULESET_ERRATA: Get a bitmask of fixed issues. */ /* clang-format off */ #define LANDLOCK_CREATE_RULESET_VERSION (1U << 0) +#define LANDLOCK_CREATE_RULESET_ERRATA (1U << 1) /* clang-format on */ /** -- cgit v1.2.3 From 33e65b0d3add6bdc731e9298995cbbc979349f51 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Thu, 20 Mar 2025 20:06:58 +0100 Subject: landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a new AUDIT_LANDLOCK_ACCESS record type dedicated to an access request denied by a Landlock domain. AUDIT_LANDLOCK_ACCESS indicates that something unexpected happened. For now, only denied access are logged, which means that any AUDIT_LANDLOCK_ACCESS record is always followed by a SYSCALL record with "success=no". However, log parsers should check this syscall property because this is the only sign that a request was denied. Indeed, we could have "success=yes" if Landlock would support a "permissive" mode. We could also add a new field to AUDIT_LANDLOCK_DOMAIN for this mode (see following commit). By default, the only logged access requests are those coming from the same executed program that enforced the Landlock restriction on itself. In other words, no audit record are created for a task after it called execve(2). This is required to avoid log spam because programs may only be aware of their own restrictions, but not the inherited ones. Following commits will allow to conditionally generate AUDIT_LANDLOCK_ACCESS records according to dedicated landlock_restrict_self(2)'s flags. The AUDIT_LANDLOCK_ACCESS message contains: - the "domain" ID restricting the action on an object, - the "blockers" that are missing to allow the requested access, - a set of fields identifying the related object (e.g. task identified with "opid" and "ocomm"). The blockers are implicit restrictions (e.g. ptrace), or explicit access rights (e.g. filesystem), or explicit scopes (e.g. signal). This field contains a list of at least one element, each separated with a comma. The initial blocker is "ptrace", which describe all implicit Landlock restrictions related to ptrace (e.g. deny tracing of tasks outside a sandbox). Add audit support to ptrace_access_check and ptrace_traceme hooks. For the ptrace_access_check case, we log the current/parent domain and the child task. For the ptrace_traceme case, we log the parent domain and the current/child task. Indeed, the requester and the target are the current task, but the action would be performed by the parent task. Audit event sample: type=LANDLOCK_ACCESS msg=audit(1729738800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd" type=SYSCALL msg=audit(1729738800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0 A following commit adds user documentation. Add KUnit tests to check reading of domain ID relative to layer level. The quick return for non-landlocked tasks is moved from task_ptrace() to each LSM hooks. It is not useful to inline the audit_enabled check because other computation are performed by landlock_log_denial(). Use scoped guards for RCU read-side critical sections. Cc: Günther Noack Acked-by: Paul Moore Link: https://lore.kernel.org/r/20250320190717.2287696-10-mic@digikod.net Signed-off-by: Mickaël Salaün --- include/uapi/linux/audit.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index d9a069b4a775..5dd53f416a4a 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -33,7 +33,7 @@ * 1100 - 1199 user space trusted application messages * 1200 - 1299 messages internal to the audit daemon * 1300 - 1399 audit event messages - * 1400 - 1499 SE Linux use + * 1400 - 1499 access control messages * 1500 - 1599 kernel LSPP events * 1600 - 1699 kernel crypto events * 1700 - 1799 kernel anomaly records @@ -146,6 +146,7 @@ #define AUDIT_IPE_ACCESS 1420 /* IPE denial or grant */ #define AUDIT_IPE_CONFIG_CHANGE 1421 /* IPE config change */ #define AUDIT_IPE_POLICY_LOAD 1422 /* IPE policy load */ +#define AUDIT_LANDLOCK_ACCESS 1423 /* Landlock denial */ #define AUDIT_FIRST_KERN_ANOM_MSG 1700 #define AUDIT_LAST_KERN_ANOM_MSG 1799 -- cgit v1.2.3 From 1d636984e088b17e8587eb5ed9d9d7a80b656c4c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Thu, 20 Mar 2025 20:06:59 +0100 Subject: landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Asynchronously log domain information when it first denies an access. This minimize the amount of generated logs, which makes it possible to always log denials for the current execution since they should not happen. These records are identified with the new AUDIT_LANDLOCK_DOMAIN type. The AUDIT_LANDLOCK_DOMAIN message contains: - the "domain" ID which is described; - the "status" which can either be "allocated" or "deallocated"; - the "mode" which is for now only "enforcing"; - for the "allocated" status, a minimal set of properties to easily identify the task that loaded the domain's policy with landlock_restrict_self(2): "pid", "uid", executable path ("exe"), and command line ("comm"); - for the "deallocated" state, the number of "denials" accounted to this domain, which is at least 1. This requires each domain to save these task properties at creation time in the new struct landlock_details. A reference to the PID is kept for the lifetime of the domain to avoid race conditions when investigating the related task. The executable path is resolved and stored to not keep a reference to the filesystem and block related actions. All these metadata are stored for the lifetime of the related domain and should then be minimal. The required memory is not accounted to the task calling landlock_restrict_self(2) contrary to most other Landlock allocations (see related comment). The AUDIT_LANDLOCK_DOMAIN record follows the first AUDIT_LANDLOCK_ACCESS record for the same domain, which is always followed by AUDIT_SYSCALL and AUDIT_PROCTITLE. This is in line with the audit logic to first record the cause of an event, and then add context with other types of record. Audit event sample for a first denial: type=LANDLOCK_ACCESS msg=audit(1732186800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd" type=LANDLOCK_DOMAIN msg=audit(1732186800.349:44): domain=195ba459b status=allocated mode=enforcing pid=300 uid=0 exe="/root/sandboxer" comm="sandboxer" type=SYSCALL msg=audit(1732186800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0 Audit event sample for a following denial: type=LANDLOCK_ACCESS msg=audit(1732186800.372:45): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd" type=SYSCALL msg=audit(1732186800.372:45): arch=c000003e syscall=101 success=no [...] pid=300 auid=0 Log domain deletion with the "deallocated" state when a domain was previously logged. This makes it possible for log parsers to free potential resources when a domain ID will never show again. The number of denied access requests is useful to easily check how many access requests a domain blocked and potentially if some of them are missing in logs because of audit rate limiting, audit rules, or Landlock log configuration flags (see following commit). Audit event sample for a deletion of a domain that denied something: type=LANDLOCK_DOMAIN msg=audit(1732186800.393:46): domain=195ba459b status=deallocated denials=2 Cc: Günther Noack Acked-by: Paul Moore Link: https://lore.kernel.org/r/20250320190717.2287696-11-mic@digikod.net [mic: Update comment and GFP flag for landlock_log_drop_domain()] Signed-off-by: Mickaël Salaün --- include/uapi/linux/audit.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/uapi') diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h index 5dd53f416a4a..9a4ecc9f6dc5 100644 --- a/include/uapi/linux/audit.h +++ b/include/uapi/linux/audit.h @@ -147,6 +147,7 @@ #define AUDIT_IPE_CONFIG_CHANGE 1421 /* IPE config change */ #define AUDIT_IPE_POLICY_LOAD 1422 /* IPE policy load */ #define AUDIT_LANDLOCK_ACCESS 1423 /* Landlock denial */ +#define AUDIT_LANDLOCK_DOMAIN 1424 /* Landlock domain status */ #define AUDIT_FIRST_KERN_ANOM_MSG 1700 #define AUDIT_LAST_KERN_ANOM_MSG 1799 -- cgit v1.2.3 From 12bfcda73ac2cf3083c9d6d05724af92da3a4b4b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Thu, 20 Mar 2025 20:07:06 +0100 Subject: landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Most of the time we want to log denied access because they should not happen and such information helps diagnose issues. However, when sandboxing processes that we know will try to access denied resources (e.g. unknown, bogus, or malicious binary), we might want to not log related access requests that might fill up logs. By default, denied requests are logged until the task call execve(2). If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied requests will not be logged for the same executed file. If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied requests from after an execve(2) call will be logged. The rationale is that a program should know its own behavior, but not necessarily the behavior of other programs. Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific Landlock domain, it makes it possible to selectively mask some access requests that would be logged by a parent domain, which might be handy for unprivileged processes to limit logs. However, system administrators should still use the audit filtering mechanism. There is intentionally no audit nor sysctl configuration to re-enable these logs. This is delegated to the user space program. Increment the Landlock ABI version to reflect this interface change. Cc: Günther Noack Cc: Paul Moore Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net [mic: Rename variables and fix __maybe_unused] Signed-off-by: Mickaël Salaün --- include/uapi/linux/landlock.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/uapi') diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index 8806a132d7b8..56b0094ef792 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -4,6 +4,7 @@ * * Copyright © 2017-2020 Mickaël Salaün * Copyright © 2018-2020 ANSSI + * Copyright © 2021-2025 Microsoft Corporation */ #ifndef _UAPI_LINUX_LANDLOCK_H @@ -64,6 +65,26 @@ struct landlock_ruleset_attr { #define LANDLOCK_CREATE_RULESET_ERRATA (1U << 1) /* clang-format on */ +/* + * sys_landlock_restrict_self() flags: + * + * - %LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF: Do not create any log related to the + * enforced restrictions. This should only be set by tools launching unknown + * or untrusted programs (e.g. a sandbox tool, container runtime, system + * service manager). Because programs sandboxing themselves should fix any + * denied access, they should not set this flag to be aware of potential + * issues reported by system's logs (i.e. audit). + * - %LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON: Explicitly ask to continue + * logging denied access requests even after an :manpage:`execve(2)` call. + * This flag should only be set if all the programs than can legitimately be + * executed will not try to request a denied access (which could spam audit + * logs). + */ +/* clang-format off */ +#define LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF (1U << 0) +#define LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON (1U << 1) +/* clang-format on */ + /** * enum landlock_rule_type - Landlock rule type * -- cgit v1.2.3 From ead9079f75696a028aea8860787770c80eddb8f9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Thu, 20 Mar 2025 20:07:07 +0100 Subject: landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF for the case of sandboxer tools, init systems, or runtime containers launching programs sandboxing themselves in an inconsistent way. Setting this flag should only depends on runtime configuration (i.e. not hardcoded). We don't create a new ruleset's option because this should not be part of the security policy: only the task that enforces the policy (not the one that create it) knows if itself or its children may request denied actions. This is the first and only flag that can be set without actually restricting the caller (i.e. without providing a ruleset). Extend struct landlock_cred_security with a u8 log_subdomains_off. struct landlock_file_security is still 16 bytes. Cc: Günther Noack Cc: Paul Moore Closes: https://github.com/landlock-lsm/linux/issues/3 Link: https://lore.kernel.org/r/20250320190717.2287696-19-mic@digikod.net [mic: Fix comment] Signed-off-by: Mickaël Salaün --- include/uapi/linux/landlock.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/uapi') diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index 56b0094ef792..d9d0cb827117 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -79,10 +79,22 @@ struct landlock_ruleset_attr { * This flag should only be set if all the programs than can legitimately be * executed will not try to request a denied access (which could spam audit * logs). + * - %LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF: Do not create any log related + * to the enforced restrictions coming from future nested domains created by + * the caller or its descendants. This should only be set according to a + * runtime configuration (i.e. not hardcoded) by programs launching other + * unknown or untrusted programs that may create their own Landlock domains + * and spam logs. The main use case is for container runtimes to enable users + * to mute buggy sandboxed programs for a specific container image. Other use + * cases include sandboxer tools and init systems. Unlike + * %LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF, + * %LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF does not impact the requested + * restriction (if any) but only the future nested domains. */ /* clang-format off */ #define LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF (1U << 0) #define LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON (1U << 1) +#define LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF (1U << 2) /* clang-format on */ /** -- cgit v1.2.3