summaryrefslogtreecommitdiff
path: root/kernel/signal.c
AgeCommit message (Collapse)AuthorFilesLines
2022-07-07signal handling: don't use BUG_ON() for debuggingLinus Torvalds1-4/+4
These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-03Merge tag 'ptrace_stop-cleanup-for-v5.19' of ↵Linus Torvalds1-79/+61
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace_stop cleanups from Eric Biederman: "While looking at the ptrace problems with PREEMPT_RT and the problems Peter Zijlstra was encountering with ptrace in his freezer rewrite I identified some cleanups to ptrace_stop that make sense on their own and move make resolving the other problems much simpler. The biggest issue is the habit of the ptrace code to change task->__state from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No other code in the kernel does that and it is straight forward to update signal_wake_up and friends to make that unnecessary. Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying on the fact that all stopped states except the special stop states can tolerate spurious wake up and recover their state. The state of stopped and traced tasked is changed to be stored in task->jobctl as well as in task->__state. This makes it possible for the freezer to recover tasks in these special states, as well as serving as a general cleanup. With a little more work in that direction I believe TASK_STOPPED can learn to tolerate spurious wake ups and become an ordinary stop state. The TASK_TRACED state has to remain a special state as the registers for a process are only reliably available when the process is stopped in the scheduler. Fundamentally ptrace needs acess to the saved register values of a task. There are bunch of semi-random ptrace related cleanups that were found while looking at these issues. One cleanup that deserves to be called out is from commit 57b6de08b5f6 ("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This makes a change that is technically user space visible, in the handling of what happens to a tracee when a tracer dies unexpectedly. According to our testing and our understanding of userspace nothing cares that spurious SIGTRAPs can be generated in that case" * tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state ptrace: Always take siglock in ptrace_resume ptrace: Don't change __state ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Document that wait_task_inactive can't fail ptrace: Reimplement PTRACE_KILL by always sending SIGKILL signal: Use lockdep_assert_held instead of assert_spin_locked ptrace: Remove arch_ptrace_attach ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP signal: Replace __group_send_sig_info with send_signal_locked signal: Rename send_signal send_signal_locked
2022-05-11sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED statePeter Zijlstra1-2/+8
Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else. There's two spots of bother with this: - PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable. - An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour. As such, add additional state to task->jobctl to track this state outside of task->__state. NOTE: this doesn't actually fix anything yet, just adds extra state. --EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-05-11ptrace: Don't change __stateEric W. Biederman1-8/+6
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up. Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPsEric W. Biederman1-54/+38
Long ago and far away there was a BUG_ON at the start of ptrace_stop that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1]. The BUG_ON had never triggered but examination of the code showed that the BUG_ON could actually trigger. To complement removing the BUG_ON an attempt to better handle the race was added. The code detected the tracer had gone away and did not call do_notify_parent_cldstop. The code also attempted to prevent ptrace_report_syscall from sending spurious SIGTRAPs when the tracer went away. The code to detect when the tracer had gone away before sending a signal to tracer was a legitimate fix and continues to work to this date. The code to prevent sending spurious SIGTRAPs is a failure. At the time and until today the code only catches it when the tracer goes away after siglock is dropped and before read_lock is acquired. If the tracer goes away after read_lock is dropped a spurious SIGTRAP can still be sent to the tracee. The tracer going away after read_lock is dropped is the far likelier case as it is the bigger window. Given that the attempt to prevent the generation of a SIGTRAP was a failure and continues to be a failure remove the code that attempts to do that. This simplifies the code in ptrace_stop and makes ptrace_stop much easier to reason about. To successfully deal with the tracer going away, all of the tracer's instrumentation of the child would need to be removed, and reliably detecting when the tracer has set a signal to continue with would need to be implemented. [1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON") History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-9-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11signal: Use lockdep_assert_held instead of assert_spin_lockedEric W. Biederman1-2/+2
The distinction is that assert_spin_locked() checks if the lock is held *by*anyone* whereas lockdep_assert_held() asserts the current context holds the lock. Also, the check goes away if you build without lockdep. Suggested-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/UKU@hirez.programming.kicks-ass.net Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11signal: Replace __group_send_sig_info with send_signal_lockedEric W. Biederman1-7/+1
The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-05-11signal: Rename send_signal send_signal_lockedEric W. Biederman1-12/+12
Rename send_signal and __send_signal to send_signal_locked and __send_signal_locked to make send_signal usable outside of signal.c. Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-04-22signal: Deliver SIGTRAP on perf event asynchronously if blockedMarco Elver1-2/+16
With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case: <set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers> When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task. This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later. To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ] Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com
2022-03-31Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"Thomas Gleixner1-40/+0
Revert commit bf9ad37dc8a. It needs to be better encapsulated and generalized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2022-03-28Merge tag 'ptrace-cleanups-for-v5.18' of ↵Linus Torvalds1-30/+32
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace cleanups from Eric Biederman: "This set of changes removes tracehook.h, moves modification of all of the ptrace fields inside of siglock to remove races, adds a missing permission check to ptrace.c The removal of tracehook.h is quite significant as it has been a major source of confusion in recent years. Much of that confusion was around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the semantics clearer). For people who don't know tracehook.h is a vestiage of an attempt to implement uprobes like functionality that was never fully merged, and was later superseeded by uprobes when uprobes was merged. For many years now we have been removing what tracehook functionaly a little bit at a time. To the point where anything left in tracehook.h was some weird strange thing that was difficult to understand" * tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: Remove duplicated include in ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE ptrace: Return the signal to continue with from ptrace_stop ptrace: Move setting/clearing ptrace_message into ptrace_stop tracehook: Remove tracehook.h resume_user_mode: Move to resume_user_mode.h resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume signal: Move set_notify_signal and clear_notify_signal into sched/signal.h task_work: Decouple TIF_NOTIFY_SIGNAL and task_work task_work: Call tracehook_notify_signal from get_signal on all architectures task_work: Introduce task_work_pending task_work: Remove unnecessary include from posix_timers.h ptrace: Remove tracehook_signal_handler ptrace: Remove arch_syscall_{enter,exit}_tracehook ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h ptrace/arm: Rename tracehook_report_syscall report_syscall ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-18ptrace: Return the signal to continue with from ptrace_stopEric W. Biederman1-13/+19
The signal a task should continue with after a ptrace stop is inconsistently read, cleared, and sent. Solve this by reading and clearing the signal to be sent in ptrace_stop. In an ideal world everything except ptrace_signal would share a common implementation of continuing with the signal, so ptracers could count on the signal they ask to continue with actually being delivered. For now retain bug compatibility and just return with the signal number the ptracer requested the code continue with. Link: https://lkml.kernel.org/r/875yoe7qdp.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-18ptrace: Move setting/clearing ptrace_message into ptrace_stopEric W. Biederman1-9/+12
Today ptrace_message is easy to overlook as it not a core part of ptrace_stop. It has been overlooked so much that there are places that set ptrace_message and don't clear it, and places that never set it. So if you get an unlucky sequence of events the ptracer may be able to read a ptrace_message that does not apply to the current ptrace stop. Move setting of ptrace_message into ptrace_stop so that it always gets set before the stop, and always gets cleared after the stop. This prevents non-sense from being reported to userspace and makes ptrace_message more visible in the ptrace helper functions so that kernel developers can see it. Link: https://lkml.kernel.org/r/87bky67qfv.fsf_-_@email.froward.int.ebiederm.org Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10tracehook: Remove tracehook.hEric W. Biederman1-1/+1
Now that all of the definitions have moved out of tracehook.h into ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in tracehook.h so remove it. Update the few files that were depending upon tracehook.h to bring in definitions to use the headers they need directly. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10task_work: Call tracehook_notify_signal from get_signal on all architecturesEric W. Biederman1-11/+3
Always handle TIF_NOTIFY_SIGNAL in get_signal. With commit 35d0b389f3b2 ("task_work: unconditionally run task_work from get_signal()") always calling task_work_run all of the work of tracehook_notify_signal is already happening except clearing TIF_NOTIFY_SIGNAL. Factor clear_notify_signal out of tracehook_notify_signal and use it in get_signal so that get_signal only needs one call of task_work_run. To keep the semantics in sync update xfer_to_guest_mode_work (which does not call get_signal) to call tracehook_notify_signal if either _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10task_work: Introduce task_work_pendingEric W. Biederman1-2/+2
Wrap the test of task->task_works in a helper function to make it clear what is being tested. All of the other readers of task->task_work use READ_ONCE and this is even necessary on current as other processes can update task->task_work. So for consistency I have added READ_ONCE into task_work_pending. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10ptrace: Remove tracehook_signal_handlerEric W. Biederman1-1/+2
The two line function tracehook_signal_handler is only called from signal_delivered. Expand it inline in signal_delivered and remove it. Just to make it easier to understand what is going on. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-04signal, x86: Delay calling signals in atomic on RT enabled kernelsOleg Nesterov1-0/+40
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spinlock_t lock that has been converted to a sleeping lock. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ] [ tglx: Use a config option ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
2022-02-10signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLEKees Cook1-2/+3
Fatal SIGSYS signals (i.e. seccomp RET_KILL_* syscall filter actions) were not being delivered to ptraced pid namespace init processes. Make sure the SIGNAL_UNKILLABLE doesn't get set for these cases. Reported-by: Robert Święcki <robert@swiecki.net> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/lkml/878rui8u4a.fsf@email.froward.int.ebiederm.org
2022-01-17Merge branch 'signal-for-v5.17' of ↵Linus Torvalds1-25/+36
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ...
2022-01-08signal: clean up kernel-doc commentsRandy Dunlap1-2/+3
Fix kernel-doc warnings in kernel/signal.c: kernel/signal.c:1830: warning: Function parameter or member 'force_coredump' not described in 'force_sig_seccomp' kernel/signal.c:2873: warning: missing initial short description on line: * signal_delivered - Also add a closing parenthesis to the comments in signal_delivered(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Richard Weinberger <richard@nod.at> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marco Elver <elver@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20211222031027.29694-1-rdunlap@infradead.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2022-01-08signal: Remove the helper signal_group_exitEric W. Biederman1-3/+5
This helper is misleading. It tests for an ongoing exec as well as the process having received a fatal signal. Sometimes it is appropriate to treat an on-going exec differently than a process that is shutting down due to a fatal signal. In particular taking the fast path out of exit_signals instead of retargeting signals is not appropriate during exec, and not changing the the exit code in do_group_exit during exec. Removing the helper makes it more obvious what is going on as both cases must be coded for explicitly. While removing the helper fix the two cases where I have observed using signal_group_exit resulted in the wrong result. In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are retargetted during an exec. In do_group_exit use 0 as the exit code during an exec as de_thread does not set group_exit_code. As best as I can determine group_exit_code has been is set to 0 most of the time during de_thread. During a thread group stop group_exit_code is set to the stop signal and when the thread group receives SIGCONT group_exit_code is reset to 0. Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Remove SIGNAL_GROUP_COREDUMPEric W. Biederman1-1/+1
After the previous cleanups "signal->core_state" is set whenever SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested whenver the code wants to know if a coredump is in progress. The remaining tests of SIGNAL_GROUP_COREDUMP also test to see if SIGNAL_GROUP_EXIT is set. Similarly the only place that sets SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT. Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and remove it's definition. With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags by setting SIGNAL_GROUP_EXIT. Link: https://lkml.kernel.org/r/20211213225350.27481-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Make coredump handling explicit in complete_signalEric W. Biederman1-1/+1
Ever since commit 6cd8f0acae34 ("coredump: ensure that SIGKILL always kills the dumping thread") it has been possible for a SIGKILL received during a coredump to set SIGNAL_GROUP_EXIT and trigger a process shutdown (for a second time). Update the logic to explicitly allow coredumps so that coredumps can set SIGNAL_GROUP_EXIT and shutdown like an ordinary process. Link: https://lkml.kernel.org/r/87zgo6ytyf.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08signal: Have prepare_signal detect coredumps using signal->core_stateEric W. Biederman1-2/+2
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change prepare_signal to test signal->core_state instead of the flag SIGNAL_GROUP_COREDUMP. Both fields are protected by siglock and both live in signal_struct so there are no real tradeoffs here, just a change to which field is being tested. Link: https://lkml.kernel.org/r/20211213225350.27481-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/875yqu14co.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-14signal: Skip the altstack update when not neededChang S. Bae1-0/+9
== Background == Support for large, "dynamic" fpstates was recently merged. This included code to ensure that sigaltstacks are sufficiently sized for these large states. A new lock was added to remove races between enabling large features and setting up sigaltstacks. == Problem == The new lock (sigaltstack_lock()) is acquired in the sigreturn path before restoring the old sigaltstack. Unfortunately, contention on the new lock causes a measurable signal handling performance regression [1]. However, the common case is that no *changes* are made to the sigaltstack state at sigreturn. == Solution == do_sigaltstack() acquires sigaltstack_lock() and is used for both sys_sigaltstack() and restoring the sigaltstack in sys_sigreturn(). Check for changes to the sigaltstack before taking the lock. If no changes were made, return before acquiring the lock. This removes lock contention from the common-case sigreturn path. [1] https://lore.kernel.org/lkml/20211207012128.GA16074@xsang-OptiPlex-9020/ Fixes: 3aac3ebea08f ("x86/signal: Implement sigaltstack size validation") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20211210225503.12734-1-chang.seok.bae@intel.com
2021-12-03Merge SA_IMMUTABLE-fixes-for-v5.16-rc2Eric W. Biederman1-7/+29
I completed the first batch of signal changes for v5.17 against v5.16-rc1 before the SA_IMMUTABLE fixes where completed. Which leaves me with two lines of development that I want on my signal development branch both rooted at v5.16-rc1. Especially as I am hoping to reach the point of being able to remove SA_IMMUTABLE. Linus merged my SA_IMUTABLE fixes as: 7af959b5d5c8 ("Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace") To avoid rebasing the development changes that are currently complete I am merging the work I sent upstream to Linus to make my life simpler. The SA_IMMUTABLE changes as they are described in Linus's merge commit. Pull exit-vs-signal handling fixes from Eric Biederman: "This is a small set of changes where debuggers were no longer able to intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit cleanups. This is essentially the change you suggested with all of i's dotted and the t's crossed so that ptrace can intercept all of the cases it has been able to intercept the past, and all of the cases that made it to exit without giving ptrace a chance still don't give ptrace a chance" * 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Replace force_fatal_sig with force_exit_sig when in doubt signal: Don't always set SA_IMMUTABLE for forced signals Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19signal: Replace force_fatal_sig with force_exit_sig when in doubtEric W. Biederman1-0/+13
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals. This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept. In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis. Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-19signal: Don't always set SA_IMMUTABLE for forced signalsEric W. Biederman1-7/+16
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace know about the signal until the process has exited. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Link: https://lkml.kernel.org/r/877dd5qfw5.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: Requeue ptrace signalsEric W. Biederman1-1/+2
Kyle Huey <me@kylehuey.com> writes: > rr, a userspace record and replay debugger[0], uses the recorded register > state at PTRACE_EVENT_EXIT to find the point in time at which to cease > executing the program during replay. > > If a SIGKILL races with processing another signal in get_signal, it is > possible for the kernel to decline to notify the tracer of the original > signal. But if the original signal had a handler, the kernel proceeds > with setting up a signal handler frame as if the tracer had chosen to > deliver the signal unmodified to the tracee. When the kernel goes to > execute the signal handler that it has now modified the stack and registers > for, it will discover the pending SIGKILL, and terminate the tracee > without executing the handler. When PTRACE_EVENT_EXIT is delivered to > the tracer, however, the effects of handler setup will be visible to > the tracer. > > Because rr (the tracer) was never notified of the signal, it is not aware > that a signal handler frame was set up and expects the state of the program > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally > by allowing the program to execute from the last event. When that fails > to happen during replay, rr will assert and die. > > The following patches add an explicit check for a newly pending SIGKILL > after the ptracer has been notified and the siglock has been reacquired. > If this happens, we stop processing the current signal and proceed > immediately to handling the SIGKILL. This makes the state reported at > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the > work to set up a signal handler frame that will never be used. > > [0] https://rr-project.org/ The problem is that while the traced process makes it into ptrace_stop, the tracee is killed before the tracer manages to wait for the tracee and discover which signal was about to be delivered. More generally the problem is that while siglock was dropped a signal with process wide effect is short cirucit delivered to the entire process killing it, but the process continues to try and deliver another signal. In general it impossible to avoid all cases where work is performed after the process has been killed. In particular if the process is killed after get_signal returns the code will simply not know it has been killed until after delivering the signal frame to userspace. On the other hand when the code has already discovered the process has been killed and taken user space visible action that shows the kernel knows the process has been killed, it is just silly to then write the signal frame to the user space stack. Instead of being silly detect the process has been killed in ptrace_signal and requeue the signal so the code can pretend it was simply never dequeued for delivery. To test the process has been killed I use fatal_signal_pending rather than signal_group_exit to match the test in signal_pending_state which is used in schedule which is where ptrace_stop detects the process has been killed. Requeuing the signal so the code can pretend it was simply never dequeued improves the user space visible behavior that has been present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4"). Kyle Huey verified that this change in behavior and makes rr happy. Reported-by: Kyle Huey <khuey@kylehuey.com> Reported-by: Marko Mäkelä <marko.makela@mariadb.com> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87tugcd5p2.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: Requeue signals in the appropriate queueEric W. Biederman1-7/+14
In the event that a tracer changes which signal needs to be delivered and that signal is currently blocked then the signal needs to be requeued for later delivery. With the advent of CLONE_THREAD the kernel has 2 signal queues per task. The per process queue and the per task queue. Update the code so that if the signal is removed from the per process queue it is requeued on the per process queue. This is necessary to make it appear the signal was never dequeued. The rr debugger reasonably believes that the state of the process from the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated by simply letting a process run. If a SIGKILL interrupts a ptrace_stop this is not true today. So return signals to their original queue in ptrace_signal so that signals that are not delivered appear like they were never dequeued. Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87zgq4d5r4.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-17signal: In get_signal test for signal_group_exit every time through the loopEric W. Biederman1-10/+10
Recently while investigating a problem with rr and signals I noticed that siglock is dropped in ptrace_signal and get_signal does not jump to relock. Looking farther to see if the problem is anywhere else I see that do_signal_stop also returns if signal_group_exit is true. I believe that test can now never be true, but it is a bit hard to trace through and be certain. Testing signal_group_exit is not expensive, so move the test for signal_group_exit into the for loop inside of get_signal to ensure the test is never skipped improperly. This has been a potential problem since I added the test for signal_group_exit was added. Fixes: 35634ffa1751 ("signal: Always notice exiting tasks") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/875yssekcd.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-11-10Merge branch 'exit-cleanups-for-v5.16' of ↵Linus Torvalds1-10/+24
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull exit cleanups from Eric Biederman: "While looking at some issues related to the exit path in the kernel I found several instances where the code is not using the existing abstractions properly. This set of changes introduces force_fatal_sig a way of sending a signal and not allowing it to be caught, and corrects the misuse of the existing abstractions that I found. A lot of the misuse of the existing abstractions are silly things such as doing something after calling a no return function, rolling BUG by hand, doing more work than necessary to terminate a kernel thread, or calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL). In the review a deficiency in force_fatal_sig and force_sig_seccomp where ptrace or sigaction could prevent the delivery of the signal was found. I have added a change that adds SA_IMMUTABLE to change that makes it impossible to interrupt the delivery of those signals, and allows backporting to fix force_sig_seccomp And Arnd found an issue where a function passed to kthread_run had the wrong prototype, and after my cleanup was failing to build." * 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) soc: ti: fix wkup_m3_rproc_boot_thread return type signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) exit/r8188eu: Replace the macro thread_exit with a simple return 0 exit/rtl8712: Replace the macro thread_exit with a simple return 0 exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 signal/x86: In emulate_vsyscall force a signal instead of calling do_exit signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails exit/syscall_user_dispatch: Send ordinary signals on failure signal: Implement force_fatal_sig exit/kthread: Have kernel threads return instead of calling do_exit signal/s390: Use force_sigsegv in default_trap_handler signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved. signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON signal/sparc: In setup_tsb_params convert open coded BUG into BUG signal/powerpc: On swapcontext failure force SIGSEGV signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT signal/sparc32: Remove unreachable do_exit in do_sparc_fault ...
2021-11-03Merge branch 'per_signal_struct_coredumps-for-v5.16' of ↵Linus Torvalds1-42/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull per signal_struct coredumps from Eric Biederman: "Current coredumps are mixed up with the exit code, the signal handling code, and the ptrace code making coredumps much more complicated than necessary and difficult to follow. This series of changes starts with ptrace_stop and cleans it up, making it easier to follow what is happening in ptrace_stop. Then cleans up the exec interactions with coredumps. Then cleans up the coredump interactions with exit. Finally the coredump interactions with the signal handling code is cleaned up. The first and last changes are bug fixes for minor bugs. I believe the fact that vfork followed by execve can kill the process the called vfork if exec fails is sufficient justification to change the userspace visible behavior. In previous discussions some of these changes were organized differently and individually appeared to make the code base worse. As currently written I believe they all stand on their own as cleanups and bug fixes. Which means that even if the worst should happen and the last change needs to be reverted for some unimaginable reason, the code base will still be improved. If the worst does not happen there are a more cleanups that can be made. Signals that generate coredumps can easily become eligible for short circuit delivery in complete_signal. The entire rendezvous for generating a coredump can move into get_signal. The function force_sig_info_to_task be written in a way that does not modify the signal handling state of the target task (because coredumps are eligible for short circuit delivery). Many of these future cleanups can be done another way but nothing so cleanly as