summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2026-01-23ftrace: Do not over-allocate ftrace memoryGuenter Roeck1-14/+15
commit be55257fab181b93af38f8c4b1b3cb453a78d742 upstream. The pg_remaining calculation in ftrace_process_locs() assumes that ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE (integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g. 4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages) have significantly more capacity than 256 * 170. This leads to pg_remaining being underestimated, which in turn makes skip (derived from skipped - pg_remaining) larger than expected, causing the WARN(skip != remaining) to trigger. Extra allocated pages for ftrace: 2 with 654 skipped WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0 A similar problem in ftrace_allocate_records() can result in allocating too many pages. This can trigger the second warning in ftrace_process_locs(). Extra allocated pages for ftrace WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580 Use the actual capacity of a page group to determine the number of pages to allocate. Have ftrace_allocate_pages() return the number of allocated pages to avoid having to calculate it. Use the actual page group capacity when validating the number of unused pages due to skipped entries. Drop the definition of ENTRIES_PER_PAGE since it is no longer used. Cc: stable@vger.kernel.org Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries") Link: https://patch.msgid.link/20260113152243.3557219-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17trace: ftrace_dump_on_oops[] is not exported, make it staticBen Dooks1-1/+1
[ Upstream commit 1e2ed4bfd50ace3c4272cfab7e9aa90956fb7ae0 ] The ftrace_dump_on_oops string is not used outside of trace.c so make it static to avoid the export warning from sparse: kernel/trace/trace.c:141:6: warning: symbol 'ftrace_dump_on_oops' was not declared. Should it be static? Fixes: dd293df6395a2 ("tracing: Move trace sysctls into trace.c") Link: https://patch.msgid.link/20260106231054.84270-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17tracing: Add recursion protection in kernel stack trace recordingSteven Rostedt1-0/+6
commit 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb upstream. A bug was reported about an infinite recursion caused by tracing the rcu events with the kernel stack trace trigger enabled. The stack trace code called back into RCU which then called the stack trace again. Expand the ftrace recursion protection to add a set of bits to protect events from recursion. Each bit represents the context that the event is in (normal, softirq, interrupt and NMI). Have the stack trace code use the interrupt context to protect against recursion. Note, the bug showed an issue in both the RCU code as well as the tracing stacktrace code. This only handles the tracing stack trace side of the bug. The RCU fix will be handled separately. Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home Reported-by: Yao Kai <yaokai34@huawei.com> Tested-by: Yao Kai <yaokai34@huawei.com> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08fgraph: Check ftrace_pids_enabled on registration for early filteringShengming Hu1-2/+7
commit 1650a1b6cb1ae6cb99bb4fce21b30ebdf9fc238e upstream. When registering ftrace_graph, check if ftrace_pids_enabled is active. If enabled, assign entryfunc to fgraph_pid_func to ensure filtering is performed before executing the saved original entry function. Cc: stable@vger.kernel.org Cc: <wang.yaxin@zte.com.cn> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.yang29@zte.com.cn> Link: https://patch.msgid.link/20251126173331679XGVF98NLhyLJRdtNkVZ6w@zte.com.cn Fixes: df3ec5da6a1e7 ("function_graph: Add pid tracing back to function graph tracer") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08fgraph: Initialize ftrace_ops->private for function graph opsShengming Hu1-0/+1
commit b5d6d3f73d0bac4a7e3a061372f6da166fc6ee5c upstream. The ftrace_pids_enabled(op) check relies on op->private being properly initialized, but fgraph_ops's underlying ftrace_ops->private was left uninitialized. This caused ftrace_pids_enabled() to always return false, effectively disabling PID filtering for function graph tracing. Fix this by copying src_ops->private to dst_ops->private in fgraph_init_ops(), ensuring PID filter state is correctly propagated. Cc: stable@vger.kernel.org Cc: <wang.yaxin@zte.com.cn> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.yang29@zte.com.cn> Fixes: c132be2c4fcc1 ("function_graph: Have the instances use their own ftrace_ops for filtering") Link: https://patch.msgid.link/20251126172926004y3hC8QyU4WFOjBkU_UxLC@zte.com.cn Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02tracing: Fix fixed array of synthetic eventSteven Rostedt1-1/+0
commit 47ef834209e5981f443240d8a8b45bf680df22aa upstream. The commit 4d38328eb442d ("tracing: Fix synth event printk format for str fields") replaced "%.*s" with "%s" but missed removing the number size of the dynamic and static strings. The commit e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events") fixed the dynamic part but did not fix the static part. That is, with the commands: # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger That caused the output of: <idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155 sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58 <idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91 The commit e1a453a57bc7 fixed the part where the synthetic event had "char[] wakee". But if one were to replace that with a static size string: # echo 's:wake_lat char[16] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events Where "wakee" is defined as "char[16]" and not "char[]" making it a static size, the code triggered the "(efaul)" again. Remove the added STR_VAR_LEN_MAX size as the string is still going to be nul terminated. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Link: https://patch.msgid.link/20251204151935.5fa30355@gandalf.local.home Fixes: e1a453a57bc7 ("tracing: Do not add length to print format in synthetic events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02tracing: Do not register unsupported perf eventsSteven Rostedt1-0/+2
commit ef7f38df890f5dcd2ae62f8dbde191d72f3bebae upstream. Synthetic events currently do not have a function to register perf events. This leads to calling the tracepoint register functions with a NULL function pointer which triggers: ------------[ cut here ]------------ WARNING: kernel/tracepoint.c:175 at tracepoint_add_func+0x357/0x370, CPU#2: perf/2272 Modules linked in: kvm_intel kvm irqbypass CPU: 2 UID: 0 PID: 2272 Comm: perf Not tainted 6.18.0-ftest-11964-ge022764176fc-dirty #323 PREEMPTLAZY Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:tracepoint_add_func+0x357/0x370 Code: 28 9c e8 4c 0b f5 ff eb 0f 4c 89 f7 48 c7 c6 80 4d 28 9c e8 ab 89 f4 ff 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc <0f> 0b 49 c7 c6 ea ff ff ff e9 ee fe ff ff 0f 0b e9 f9 fe ff ff 0f RSP: 0018:ffffabc0c44d3c40 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff9380aa9e4060 RCX: 0000000000000000 RDX: 000000000000000a RSI: ffffffff9e1d4a98 RDI: ffff937fcf5fd6c8 RBP: 0000000000000001 R08: 0000000000000007 R09: ffff937fcf5fc780 R10: 0000000000000003 R11: ffffffff9c193910 R12: 000000000000000a R13: ffffffff9e1e5888 R14: 0000000000000000 R15: ffffabc0c44d3c78 FS: 00007f6202f5f340(0000) GS:ffff93819f00f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d3162281a8 CR3: 0000000106a56003 CR4: 0000000000172ef0 Call Trace: <TASK> tracepoint_probe_register+0x5d/0x90 synth_event_reg+0x3c/0x60 perf_trace_event_init+0x204/0x340 perf_trace_init+0x85/0xd0 perf_tp_event_init+0x2e/0x50 perf_try_init_event+0x6f/0x230 ? perf_event_alloc+0x4bb/0xdc0 perf_event_alloc+0x65a/0xdc0 __se_sys_perf_event_open+0x290/0x9f0 do_syscall_64+0x93/0x7b0 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e ? trace_hardirqs_off+0x53/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Instead, have the code return -ENODEV, which doesn't warn and has perf error out with: # perf record -e synthetic:futex_wait Error: The sys_perf_event_open() syscall returned with 19 (No such device) for event (synthetic:futex_wait). "dmesg | grep -i perf" may provide additional information. Ideally perf should support synthetic events, but for now just fix the warning. The support can come later. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251216182440.147e4453@gandalf.local.home Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events") Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02bpf: Fix verifier assumptions of bpf_d_path's output bufferShuran Liu1-1/+1
[ Upstream commit ac44dcc788b950606793e8f9690c30925f59df02 ] Commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") started distinguishing read vs write accesses performed by helpers. The second argument of bpf_d_path() is a pointer to a buffer that the helper fills with the resulting path. However, its prototype currently uses ARG_PTR_TO_MEM without MEM_WRITE. Before 37cce22dbd51, helper accesses were conservatively treated as potential writes, so this mismatch did not cause issues. Since that commit, the verifier may incorrectly assume that the buffer contents are unchanged across the helper call and base its optimizations on this wrong assumption. This can lead to misbehaviour in BPF programs that read back the buffer, such as prefix comparisons on the returned path. Fix this by marking the second argument of bpf_d_path() as ARG_PTR_TO_MEM | MEM_WRITE so that the verifier correctly models the write to the caller-provided buffer. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20251206141210.3148-2-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-26Merge tag 'trace-ringbuffer-v6.18-rc7' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Do not allow mmapped ring buffer to be split When the ring buffer VMA is split by a partial munmap or a MAP_FIXED, the kernel calls vm_ops->close() on each portion. This causes the ring_buffer_unmap() to be called multiple times. This causes subsequent calls to return -ENODEV and triggers a warning. There's no reason to allow user space to split up memory mapping of the ring buffer. Have it return -EINVAL when that happens. * tag 'trace-ringbuffer-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix WARN_ON in tracing_buffers_mmap_close for split VMAs
2025-11-25tracing: Fix WARN_ON in tracing_buffers_mmap_close for split VMAsDeepanshu Kartikey1-0/+10
When a VMA is split (e.g., by partial munmap or MAP_FIXED), the kernel calls vm_ops->close on each portion. For trace buffer mappings, this results in ring_buffer_unmap() being called multiple times while ring_buffer_map() was only called once. This causes ring_buffer_unmap() to return -ENODEV on subsequent calls because user_mapped is already 0, triggering a WARN_ON. Trace buffer mappings cannot support partial mappings because the ring buffer structure requires the complete buffer including the meta page. Fix this by adding a may_split callback that returns -EINVAL to prevent VMA splits entirely. Cc: stable@vger.kernel.org Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer") Link: https://patch.msgid.link/20251119064019.25904-1-kartikey406@gmail.com Closes: https://syzkaller.appspot.com/bug?extid=a72c325b042aae6403c7 Tested-by: syzbot+a72c325b042aae6403c7@syzkaller.appspotmail.com Reported-by: syzbot+a72c325b042aae6403c7@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-15/+45
Pull bpf fixes from Alexei Starovoitov: - Fix interaction between livepatch and BPF fexit programs (Song Liu) With Steven and Masami acks. - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa) With Steven and Masami acks. - Fix out of bounds access in widen_imprecise_scalars() in the verifier (Eduard Zingerman) - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen) - Fix net_sched storage collision with BPF data_meta/data_end (Eric Dumazet) - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test widen_imprecise_scalars() with different stack depth bpf: account for current allocated stack depth in widen_imprecise_scalars() bpf: Add bpf_prog_run_data_pointers() selftests/bpf: Add mptcp test with sockmap mptcp: Fix proto fallback detection with BPF mptcp: Disallow MPTCP subflows from sockmap selftests/bpf: Add stacktrace ips test for raw_tp selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" bpf: add _impl suffix for bpf_stream_vprintk() kfunc bpf:add _impl suffix for bpf_task_work_schedule* kfuncs selftests/bpf: Add tests for livepatch + bpf trampoline ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct() ftrace: Fix BPF fexit with livepatch
2025-11-07Merge tag 'trace-v6.18-rc4' of ↵Linus Torvalds2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Check for reader catching up in ring_buffer_map_get_reader() If the reader catches up to the writer in the memory mapped ring buffer then calling rb_get_reader_page() will return NULL as there's no pages left. But this isn't checked for before calling rb_get_reader_page() and the return of NULL causes a warning. If it is detected that the reader caught up to the writer, then simply exit the routine - Fix memory leak in histogram create_field_var() The couple of the error paths in create_field_var() did not properly clean up what was allocated. Make sure everything is freed properly on error - Fix help message of tools latency_collector The help message incorrectly stated that "-t" was the same as "--threads" whereas "--threads" is actually represented by "-e" * tag 'trace-v6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/tools: Fix incorrcet short option in usage text for --threads tracing: Fix memory leaks in create_field_var() ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches up
2025-11-06tracing: Fix memory leaks in create_field_var()Zilin Guan1-2/+4
The function create_field_var() allocates memory for 'val' through create_hist_field() inside parse_atom(), and for 'var' through create_var(), which in turn allocates var->type and var->var.name internally. Simply calling kfree() to release these structures will result in memory leaks. Use destroy_hist_field() to properly free 'val', and explicitly release the memory of var->type and var->var.name before freeing 'var' itself. Link: https://patch.msgid.link/20251106120132.3639920-1-zilin@seu.edu.cn Fixes: 02205a6752f22 ("tracing: Add support for 'field variables'") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-06ring-buffer: Do not warn in ring_buffer_map_get_reader() when reader catches upSteven Rostedt1-0/+4
The function ring_buffer_map_get_reader() is a bit more strict than the other get reader functions, and except for certain situations the rb_get_reader_page() should not return NULL. If it does, it triggers a warning. This warning was triggering but after looking at why, it was because another acceptable situation was happening and it wasn't checked for. If the reader catches up to the writer and there's still data to be read on the reader page, then the rb_get_reader_page() will return NULL as there's no new page to get. In this situation, the reader page should not be updated and no warning should trigger. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Reported-by: syzbot+92a3745cea5ec6360309@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/690babec.050a0220.baf87.0064.GAE@google.com/ Link: https://lore.kernel.org/20251016132848.1b11bb37@gandalf.local.home Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-07tracing: tprobe-events: Fix to put tracepoint_user when disable the tprobeMasami Hiramatsu (Google)1-0/+4
__unregister_trace_fprobe() checks tf->tuser to put it when removing tprobe. However, disable_trace_fprobe() does not use it and only calls unregister_fprobe(). Thus it forgets to disable tracepoint_user. If the trace_fprobe has tuser, put it for unregistering the tracepoint callbacks when disabling tprobe correctly. Link: https://lore.kernel.org/all/176244794466.155515.3971904050506100243.stgit@devnote2/ Fixes: 2867495dea86 ("tracing: tprobe-events: Register tracepoint when enable tprobe event") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Beau Belgrave <beaub@linux.microsoft.com>
2025-11-07tracing: tprobe-events: Fix to register tracepoint correctlyMasami Hiramatsu (Google)1-1/+2
Since __tracepoint_user_init() calls tracepoint_user_register() without initializing tuser->tpoint with given tracpoint, it does not register tracepoint stub function as callback correctly, and tprobe does not work. Initializing tuser->tpoint correctly before tracepoint_user_register() so that it sets up tracepoint callback. I confirmed below example works fine again. echo "t sched_switch preempt prev_pid=prev->pid next_pid=next->pid" > /sys/kernel/tracing/dynamic_events echo 1 > /sys/kernel/tracing/events/tracepoints/sched_switch/enable cat /sys/kernel/tracing/trace_pipe Link: https://lore.kernel.org/all/176244793514.155515.6466348656998627773.stgit@devnote2/ Fixes: 2867495dea86 ("tracing: tprobe-events: Register tracepoint when enable tprobe event") Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Beau Belgrave <beaub@linux.microsoft.com>
2025-11-03ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()Song Liu1-9/+31
ftrace_hash_ipmodify_enable() checks IPMODIFY and DIRECT ftrace_ops on the same kernel function. When needed, ftrace_hash_ipmodify_enable() calls ops->ops_func() to prepare the direct ftrace (BPF trampoline) to share the same function as the IPMODIFY ftrace (livepatch). ftrace_hash_ipmodify_enable() is called in register_ftrace_direct() path, but not called in modify_ftrace_direct() path. As a result, the following operations will break livepatch: 1. Load livepatch to a kernel function; 2. Attach fentry program to the kernel function; 3. Attach fexit program to the kernel function. After 3, the kernel function being used will not be the livepatched version, but the original version. Fix this by adding __ftrace_hash_update_ipmodify() to __modify_ftrace_direct() and adjust some logic around the call. Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251027175023.1521602-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-03ftrace: Fix BPF fexit with livepatchSong Liu1-6/+14
When livepatch is attached to the same function as bpf trampoline with a fexit program, bpf trampoline code calls register_ftrace_direct() twice. The first time will fail with -EAGAIN, and the second time it will succeed. This requires register_ftrace_direct() to unregister the address on the first attempt. Otherwise, the bpf trampoline cannot attach. Here is an easy way to reproduce this issue: insmod samples/livepatch/livepatch-sample.ko bpftrace -e 'fexit:cmdline_proc_show {}' ERROR: Unable to attach probe: fexit:vmlinux:cmdline_proc_show... Fix this by cleaning up the hash when register_ftrace_function_nolock hits errors. Also, move the code that resets ops->func and ops->trampoline to the error path of register_ftrace_direct(); and add a helper function reset_direct() in register_ftrace_direct() and unregister_ftrace_direct(). Fixes: d05cb470663a ("ftrace: Fix modification of direct_function hash while in use") Cc: stable@vger.kernel.org # v6.6+ Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Closes: https://lore.kernel.org/live-patching/c5058315a39d4615b333e485893345be@crowdstrike.com/ Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251027175023.1521602-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-20rv: Make rtapp/pagefault monitor depends on CONFIG_MMUNam Cao1-0/+1
There is no page fault without MMU. Compiling the rtapp/pagefault monitor without CONFIG_MMU fails as page fault tracepoints' definitions are not available. Make rtapp/pagefault monitor depends on CONFIG_MMU. Fixes: 9162620eb604 ("rv: Add rtapp_pagefault monitor") Signed-off-by: Nam Cao <namcao@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509260455.6Z9Vkty4-lkp@intel.com/ Cc: stable@vger.kernel.org Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20251002082317.973839-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-10-20rv: Fully convert enabled_monitors to use list_head as iteratorNam Cao1-6/+6
The callbacks in enabled_monitors_seq_ops are inconsistent. Some treat the iterator as struct rv_monitor *, while others treat the iterator as struct list_head *. This causes a wrong type cast and crashes the system as reported by Nathan. Convert everything to use struct list_head * as iterator. This also makes enabled_monitors consistent with available_monitors. Fixes: de090d1ccae1 ("rv: Fix wrong type cast in enabled_monitors_next()") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20250923002004.GA2836051@ax162/ Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20251002082235.973099-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-10-11Merge tag 'trace-v6.18-3' of ↵Linus Torvalds1-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "The previous fix to trace_marker required updating trace_marker_raw as well. The difference between trace_marker_raw from trace_marker is that the raw version is for applications to write binary structures directly into the ring buffer instead of writing ASCII strings. This is for applications that will read the raw data from the ring buffer and get the data structures directly. It's a bit quicker than using the ASCII version. Unfortunately, it appears that our test suite has several tests that test writes to the trace_marker file, but lacks any tests to the trace_marker_raw file (this needs to be remedied). Two issues came about the update to the trace_marker_raw file that syzbot found: - Fix tracing_mark_raw_write() to use per CPU buffer The fix to use the per CPU buffer to copy from user space was needed for both the trace_maker and trace_maker_raw file. The fix for reading from user space into per CPU buffers properly fixed the trace_marker write function, but the trace_marker_raw file wasn't fixed properly. The user space data was correctly written into the per CPU buffer, but the code that wrote into the ring buffer still used the user space pointer and not the per CPU buffer that had the user space data already written. - Stop the fortify string warning from writing into trace_marker_raw After converting the copy_from_user_nofault() into a memcpy(), another issue appeared. As writes to the trace_marker_raw expects binary data, the first entry is a 4 byte identifier. The entry structure is defined as: struct { struct trace_entry ent; int id; char buf[]; }; The size of this structure is reserved on the ring buffer with: size = sizeof(*entry) + cnt; Then it is copied from the buffer into the ring buffer with: memcpy(&entry->id, buf, cnt); This use to be a copy_from_user_nofault(), but now converting it to a memcpy() triggers the fortify-string code, and causes a warning. The allocated space is actually more than what is copied, as the cnt used also includes the entry->id portion. Allocating sizeof(*entry) plus cnt is actually allocating 4 bytes more than what is needed. Change the size function to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); And update the memcpy() to unsafe_memcpy()" * tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Stop fortify-string from warning in tracing_mark_raw_write() tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
2025-10-11tracing: Stop fortify-string from warning in tracing_mark_raw_write()Steven Rostedt1-2/+6
The way tracing_mark_raw_write() records its data is that it has the following structure: struct { struct trace_entry; int id; char buf[]; }; But memcpy(&entry->id, buf, size) triggers the following warning when the size is greater than the id: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4) WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Modules linked in: CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000 FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0 Call Trace: <TASK> tracing_mark_raw_write+0x1fe/0x290 ? __pfx_tracing_mark_raw_write+0x10/0x10 ? security_file_permission+0x50/0xf0 ? rw_verify_area+0x6f/0x4b0 vfs_write+0x1d8/0xdd0 ? __pfx_vfs_write+0x10/0x10 ? __pfx_css_rstat_updated+0x10/0x10 ? count_memcg_events+0xd9/0x410 ? fdget_pos+0x53/0x5e0 ksys_write+0x182/0x200 ? __pfx_ksys_write+0x10/0x10 ? do_user_addr_fault+0x4af/0xa30 do_syscall_64+0x63/0x350 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa61d318687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- This is because fortify string sees that the size of entry->id is only 4 bytes, but it is writing more than that. But this is OK as the dynamic_array is allocated to handle that copy. The size allocated on the ring buffer was actually a bit too big: size = sizeof(*entry) + cnt; But cnt includes the 'id' and the buffer data, so adding cnt to the size of *entry actually allocates too much on the ring buffer. Change the allocation to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); and the memcpy() to unsafe_memcpy() with an added justification. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-10tracing: Fix tracing_mark_raw_write() to use buf and not ubufSteven Rostedt1-2/+2
The fix to use a per CPU buffer to read user space tested only the writes to trace_marker. But it appears that the selftests are missing tests to the trace_maker_raw file. The trace_maker_raw file is used by applications that writes data structures and not strings into the file, and the tools read the raw ring buffer to process the structures it writes. The fix that reads the per CPU buffers passes the new per CPU buffer to the trace_marker file writes, but the update to the trace_marker_raw write read the data from user space into the per CPU buffer, but then still used then passed the user space address to the function that records the data. Pass in the per CPU buffer and not the user space address. TODO: Add a test to better test trace_marker_raw. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-09Merge tag 'trace-v6.18-2' of ↵Linus Torvalds5-79/+241
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing clean up and fixes from Steven Rostedt: - Have osnoise tracer use memdup_user_nul() The function osnoise_cpus_write() open codes a kmalloc() and then a copy_from_user() and then adds a nul byte at the end which is the same as simply using memdup_user_nul(). - Fix wakeup and irq tracers when failing to acquire calltime When the wakeup and irq tracers use the function graph tracer for tracing function times, it saves a timestamp into the fgraph shadow stack. It is possible that this could fail to be stored. If that happens, it exits the routine early. These functions also disable nesting of the operations by incremeting the data "disable" counter. But if the calltime exits out early, it never increments the counter back to what it needs to be. Since there's only a couple of lines of code that does work after acquiring the calltime, instead of exiting out early, reverse the if statement to be true if calltime is acquired, and place the code that is to be done within that if block. The clean up will always be done after that. - Fix ring_buffer_map() return value on failure of __rb_map_vma() If __rb_map_vma() fails in ring_buffer_map(), it does not return an error. This means the caller will be working against a bad vma mapping. Have ring_buffer_map() return an error when __rb_map_vma() fails. - Fix regression of writing to the trace_marker file A bug fix was made to change __copy_from_user_inatomic() to copy_from_user_nofault() in the trace_marker write function. The trace_marker file is used by applications to write into it (usually with a file descriptor opened at the start of the program) to record into the tracing system. It's usually used in critical sections so the write to trace_marker is highly optimized. The reason for copying in an atomic section is that the write reserves space on the ring buffer and then writes directly into it. After it writes, it commits the event. The time between reserve and commit must have preemption disabled. The trace marker write does not have any locking nor can it allocate due to the nature of it being a critical path. Unfortunately, converting __copy_from_user_inatomic() to copy_from_user_nofault() caused a regression in Android. Now all the writes from its applications trigger the fault that is rejected by the _nofault() version that wasn't rejected by the _inatomic() version. Instead of getting data, it now just gets a trace buffer filled with: tracing_mark_write: <faulted> To fix this, on opening of the trace_marker file, allocate per CPU buffers that can be used by the write call. Then when entering the write call, do the following: preempt_disable(); cpu = smp_processor_id(); buffer = per_cpu_ptr(cpu_buffers, cpu); do { cnt = nr_context_switches_cpu(cpu); migrate_disable(); preempt_enable(); ret = copy_from_user(buffer, ptr, size); preempt_disable(); migrate_enable(); } while (!ret && cnt != nr_context_switches_cpu(cpu)); if (!ret) ring_buffer_write(buffer); preempt_enable(); This works similarly to seqcount. As it must enabled preemption to do a copy_from_user() into a per CPU buffer, if it gets preempted, the buffer could be corrupted by another task. To handle this, read the number of context switches of the current CPU, disable migration, enable preemption, copy the data from user space, then immediately disable preemption again. If the number of context switches is the same, the buffer is still valid. Otherwise it must be assumed that the buffer may have been corrupted and it needs to try again. Now the trace_marker write can get the user data even if it has to fault it in, and still not grab any locks of its own. * tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Have trace_marker use per-cpu data to read user space ring buffer: Propagate __rb_map_vma return value to caller tracing: Fix irqoff tracers on failure of acquiring calltime tracing: Fix wakeup tracers on failure of acquiring calltime tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
2025-10-08tracing: Have trace_marker use per-cpu data to read user spaceSteven Rostedt1-48/+220
It was reported that using __copy_from_user_inatomic() can actually schedule. Which is bad when preemption is disabled. Even though there's logic to check in_atomic() is set, but this is a nop when the kernel is configured with PREEMPT_NONE. This is due to page faulting and the code could schedule with preemption disabled. Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud.com/ The solution was to change the __copy_from_user_inatomic() to copy_from_user_nofault(). But then it was reported that this caused a regression in Android. There's several applications writing into trace_marker() in Android, but now instead of showing the expected data, it is showing: tracing_mark_write: <faulted> After reverting the conversion to copy_from_user_nofault(), Android was able to get the data again. Writes to the trace_marker is a way to efficiently and quickly enter data into the Linux tracing buffer. It takes no locks and was designed to be as non-intrusive as possible. This means it cannot allocate memory, and must use pre-allocated data. A method that is actively being worked on to have faultable system call tracepoints read user space data is to allocate per CPU buffers, and use them in the callback. The method uses a technique similar to seqcount. That is something like this: preempt_disable(); cpu = smp_processor_id(); buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu); do { cnt = nr_context_switches_cpu(cpu); migrate_disable(); preempt_enable(); ret = copy_from_user(buffer, ptr, size); preempt_disable(); migrate_enable(); } while (!ret && cnt != nr_context_switches_cpu(cpu)); if (!ret) ring_buffer_write(buffer); preempt_enable(); It's a little more involved than that, but the above is the basic logic. The idea is to acquire the current CPU buffer, disable migration, and then enable preemption. At this moment, it can safely use copy_from_user(). After reading the data from user space, it disables preemption again. It then checks to see if there was any new scheduling on this CPU. If there was, it must assume that the buffer was corrupted by another task. If there wasn't, then the buffer is still valid as only tasks in preemptable context can write to this buffer and only those that are running on the CPU. By using this method, where trace_marker open allocates the per CPU buffers, trace_marker writes can access user space and even fault it in, without having to allocate or take any locks of its own. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Wattson CI <wattson-external@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home Fixes: 3d62ab32df065 ("tracing: Fix tracing_marker may trigger page fault during preempt_disable") Reported-by: Runping Lai <runpinglai@google.com> Tested-by: Runping Lai <runpinglai@google.com> Closes: https://lore.kernel.org/linux-trace-kernel/20251007003417.3470979-2-runpinglai@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08ring buffer: Propagate __rb_map_vma return value to callerAnkit Khushwaha1-1/+1
The return value from `__rb_map_vma()`, which rejects writable or executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being ignored. As a result the caller of `__rb_map_vma` always returned 0 even when the mapping had actually failed, allowing it to proceed with an invalid VMA. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions") Reported-by: syzbot+ddc001b92c083dbf2b97@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714bdb Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08tracing: Fix irqoff tracers on failure of acquiring calltimeSteven Rostedt1-13/+10
The functions irqsoff_graph_entry() and irqsoff_graph_return() both call func_prolog_dec() that will test if the data->disable is already set and if not, increment it and return. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-2-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08tracing: Fix wakeup tracers on failure of acquiring calltimeSteven Rostedt1-10/+6
The functions wakeup_graph_entry() and wakeup_graph_return() both call func_prolog_preempt_disable() that will test if the data->disable is already set and if not, increment it and disable preemption. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nulThorsten Blum1-7/+4
Replace kmalloc() followed by copy_from_user() with memdup_user_nul() to simplify and improve osnoise_cpus_write(). Remove the manual NUL-termination. No functional changes intended. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251001130907.364673-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-05Merge tag 'trace-v6.18' of ↵Linus Torvalds8-20/+26
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints Individual system call trace events are pseudo events attached to the raw_syscall trace events that just trace the entry and exit of all system calls. When any of these individual system call trace events get enabled, an element in an array indexed by the system call number is assigned to the trace file that defines how to trace it. When the trace event triggers, it reads this array and if the array has an element, it uses that trace file to know what to write it (the trace file defines the output format of the corresponding system call). The issue is that it uses rcu_dereference_ptr() and marks the elements of the array as using RCU. This is incorrect. There is no RCU synchronization here. The event file that is pointed to has a completely different way to make sure its freed properly. The reading of the array during the system call trace event is only to know if there is a value or not. If not, it does nothing (it means this system call isn't being traced). If it does, it uses the information to store the system call data. The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE() macros. - Have the system call trace events use "0x" for hex values Some system call trace events display hex values but do not have "0x" in front of it. Seeing "count: 44" can be assumed that it is 44 decimal when in actuality it is 44 hex (68 decimal). Display "0x44" instead. - Use vmalloc_array() in tracing_map_sort_entries() The function tracing_map_sort_entries() used array_size() and vmalloc() when it could have simply used vmalloc_array(). - Use for_each_online_cpu() in trace_osnoise.c() Instead of open coding for_each_cpu(cpu, cpu_online_mask), use for_each_online_cpu(). - Move the buffer field in struct trace_seq to the end The buffer field in struct trace_seq is architecture dependent in size, and caused padding for the fields after it. By moving the buffer to the end of the structure, it compacts the trace_seq structure better. - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer() The structure that contains cmdline_idx is zeroed by memset(), no need to explicitly zero any of its fields after that. - Use system_percpu_wq instead of system_wq in user_event_mm_remove() As system_wq is being deprecated, use the new wq. - Add cond_resched() is ftrace_module_enable() Some modules have a lot of functions (thousands of them), and the enabling of those functions can take some time. On non preemtable kernels, it was triggering a watchdog timeout. Add a cond_resched()