summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2026-04-18tracing: Fix potential deadlock in cpu hotplug with osnoiseLuo Haiyang1-4/+4
[ Upstream commit 1f9885732248d22f788e4992c739a98c88ab8a55 ] The following sequence may leads deadlock in cpu hotplug: task1 task2 task3 ----- ----- ----- mutex_lock(&interface_lock) [CPU GOING OFFLINE] cpus_write_lock(); osnoise_cpu_die(); kthread_stop(task3); wait_for_completion(); osnoise_sleep(); mutex_lock(&interface_lock); cpus_read_lock(); [DEAD LOCK] Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock). Cc: stable@vger.kernel.org Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.tao172@zte.com.cn> Cc: <ran.xiaokai@zte.com.cn> Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer") Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> [ adapted guard() macros to lock/unlock calls ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-18tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2GCalvin Owens1-3/+3
commit d008ba8be8984760e36d7dcd4adbd5a41a645708 upstream. Some of the sizing logic through tracer_alloc_buffers() uses int internally, causing unexpected behavior if the user passes a value that does not fit in an int (on my x86 machine, the result is uselessly tiny buffers). Fix by plumbing the parameter's real type (unsigned long) through to the ring buffer allocation functions, which already use unsigned long. It has always been possible to create larger ring buffers via the sysfs interface: this only affects the cmdline parameter. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: 73c5162aa362 ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04tracing: Fix to set write permission to per-cpu buffer_size_kbMasami Hiramatsu (Google)1-1/+1
[ Upstream commit f844282deed7481cf2f813933229261e27306551 ] Since the per-cpu buffer_size_kb file is writable for changing per-cpu ring buffer size, the file should have the write access permission. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/177071301597.2293046.11683339475076917920.stgit@mhiramat.tok.corp.google.com Fixes: 21ccc9cd7211 ("tracing: Disable "other" permission bits in the tracefs files") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04tracing: Fix false sharing in hwlat get_sample()Colin Lord1-8/+7
[ Upstream commit f743435f988cb0cf1f521035aee857851b25e06d ] The get_sample() function in the hwlat tracer assumes the caller holds hwlat_data.lock, but this is not actually happening. The result is unprotected data access to hwlat_data, and in per-cpu mode can result in false sharing which may show up as false positive latency events. The specific case of false sharing observed was primarily between hwlat_data.sample_width and hwlat_data.count. These are separated by just 8B and are therefore likely to share a cache line. When one thread modifies count, the cache line is in a modified state so when other threads read sample_width in the main latency detection loop, they fetch the modified cache line. On some systems, the fetch itself may be slow enough to count as a latency event, which could set up a self reinforcing cycle of latency events as each event increments count which then causes more latency events, continuing the cycle. The other result of the unprotected data access is that hwlat_data.count can end up with duplicate or missed values, which was observed on some systems in testing. Convert hwlat_data.count to atomic64_t so it can be safely modified without locking, and prevent false sharing by pulling sample_width into a local variable. One system this was tested on was a dual socket server with 32 CPUs on each numa node. With settings of 1us threshold, 1000us width, and 2000us window, this change reduced the number of latency events from 500 per second down to approximately 1 event per minute. Some machines tested did not exhibit measurable latency from the false sharing. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260210074810.6328-1-clord@mykolab.com Signed-off-by: Colin Lord <clord@mykolab.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04tracing: Remove duplicate ENABLE_EVENT_STR and DISABLE_EVENT_STR macrosSteven Rostedt1-5/+0
[ Upstream commit 9df0e49c5b9b8d051529be9994e4f92f2d20be6f ] The macros ENABLE_EVENT_STR and DISABLE_EVENT_STR were added to trace.h so that more than one file can have access to them, but was never removed from their original location. Remove the duplicates. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260126130037.4ba201f9@gandalf.local.home Fixes: d0bad49bb0a09 ("tracing: Add enable_hist/disable_hist triggers") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11tracing: Fix ftrace event field alignmentsSteven Rostedt3-15/+27
[ Upstream commit 033c55fe2e326bea022c3cc5178ecf3e0e459b82 ] The fields of ftrace specific events (events used to save ftrace internal events like function traces and trace_printk) are generated similarly to how normal trace event fields are generated. That is, the fields are added to a trace_events_fields array that saves the name, offset, size, alignment and signness of the field. It is used to produce the output in the format file in tracefs so that tooling knows how to parse the binary data of the trace events. The issue is that some of the ftrace event structures are packed. The function graph exit event structures are one of them. The 64 bit calltime and rettime fields end up 4 byte aligned, but the algorithm to show to userspace shows them as 8 byte aligned. The macros that create the ftrace events has one for embedded structure fields. There's two macros for theses fields: __field_desc() and __field_packed() The difference of the latter macro is that it treats the field as packed. Rename that field to __field_desc_packed() and create replace the __field_packed() to be a normal field that is packed and have the calltime and rettime use those. This showed up on 32bit architectures for function graph time fields. It had: ~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format [..] field:unsigned long func; offset:8; size:4; signed:0; field:unsigned int depth; offset:12; size:4; signed:0; field:unsigned int overrun; offset:16; size:4; signed:0; field:unsigned long long calltime; offset:24; size:8; signed:0; field:unsigned long long rettime; offset:32; size:8; signed:0; Notice that overrun is at offset 16 with size 4, where in the structure calltime is at offset 20 (16 + 4), but it shows the offset at 24. That's because it used the alignment of unsigned long long when used as a declaration and not as a member of a structure where it would be aligned by word size (in this case 4). By using the proper structure alignment, the format has it at the correct offset: ~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format [..] field:unsigned long func; offset:8; size:4; signed:0; field:unsigned int depth; offset:12; size:4; signed:0; field:unsigned int overrun; offset:16; size:4; signed:0; field:unsigned long long calltime; offset:20; size:8; signed:0; field:unsigned long long rettime; offset:28; size:8; signed:0; Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: "jempty.liang" <imntjempty@163.com> Link: https://patch.msgid.link/20260204113628.53faec78@gandalf.local.home Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()") Closes: https://lore.kernel.org/all/20260130015740.212343-1-imntjempty@163.com/ Closes: https://lore.kernel.org/all/20260202123342.2544795-1-imntjempty@163.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> [ Context / renames ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-11ring-buffer: Avoid softlockup in ring_buffer_resize() during memory freeWupeng Ma1-0/+2
[ Upstream commit 6435ffd6c7fcba330dfa91c58dc30aed2df3d0bf ] When user resize all trace ring buffer through file 'buffer_size_kb', then in ring_buffer_resize(), kernel allocates buffer pages for each cpu in a loop. If the kernel preemption model is PREEMPT_NONE and there are many cpus and there are many buffer pages to be freed, it may not give up cpu for a long time and finally cause a softlockup. To avoid it, call cond_resched() after each cpu buffer free as Commit f6bd2c92488c ("ring-buffer: Avoid softlockup in ring_buffer_resize()") does. Detailed call trace as follow: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 24-....: (14837 ticks this GP) idle=521c/1/0x4000000000000000 softirq=230597/230597 fqs=5329 rcu: (t=15004 jiffies g=26003221 q=211022 ncpus=96) CPU: 24 UID: 0 PID: 11253 Comm: bash Kdump: loaded Tainted: G EL 6.18.2+ #278 NONE pc : arch_local_irq_restore+0x8/0x20 arch_local_irq_restore+0x8/0x20 (P) free_frozen_page_commit+0x28c/0x3b0 __free_frozen_pages+0x1c0/0x678 ___free_pages+0xc0/0xe0 free_pages+0x3c/0x50 ring_buffer_resize.part.0+0x6a8/0x880 ring_buffer_resize+0x3c/0x58 __tracing_resize_ring_buffer.part.0+0x34/0xd8 tracing_resize_ring_buffer+0x8c/0xd0 tracing_entries_write+0x74/0xd8 vfs_write+0xcc/0x288 ksys_write+0x74/0x118 __arm64_sys_write+0x24/0x38 Cc: <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251228065008.2396573-1-mawupeng1@huawei.com Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19tracing: Do not register unsupported perf eventsSteven Rostedt1-0/+2
commit ef7f38df890f5dcd2ae62f8dbde191d72f3bebae upstream. Synthetic events currently do not have a function to register perf events. This leads to calling the tracepoint register functions with a NULL function pointer which triggers: ------------[ cut here ]------------ WARNING: kernel/tracepoint.c:175 at tracepoint_add_func+0x357/0x370, CPU#2: perf/2272 Modules linked in: kvm_intel kvm irqbypass CPU: 2 UID: 0 PID: 2272 Comm: perf Not tainted 6.18.0-ftest-11964-ge022764176fc-dirty #323 PREEMPTLAZY Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:tracepoint_add_func+0x357/0x370 Code: 28 9c e8 4c 0b f5 ff eb 0f 4c 89 f7 48 c7 c6 80 4d 28 9c e8 ab 89 f4 ff 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc <0f> 0b 49 c7 c6 ea ff ff ff e9 ee fe ff ff 0f 0b e9 f9 fe ff ff 0f RSP: 0018:ffffabc0c44d3c40 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff9380aa9e4060 RCX: 0000000000000000 RDX: 000000000000000a RSI: ffffffff9e1d4a98 RDI: ffff937fcf5fd6c8 RBP: 0000000000000001 R08: 0000000000000007 R09: ffff937fcf5fc780 R10: 0000000000000003 R11: ffffffff9c193910 R12: 000000000000000a R13: ffffffff9e1e5888 R14: 0000000000000000 R15: ffffabc0c44d3c78 FS: 00007f6202f5f340(0000) GS:ffff93819f00f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d3162281a8 CR3: 0000000106a56003 CR4: 0000000000172ef0 Call Trace: <TASK> tracepoint_probe_register+0x5d/0x90 synth_event_reg+0x3c/0x60 perf_trace_event_init+0x204/0x340 perf_trace_init+0x85/0xd0 perf_tp_event_init+0x2e/0x50 perf_try_init_event+0x6f/0x230 ? perf_event_alloc+0x4bb/0xdc0 perf_event_alloc+0x65a/0xdc0 __se_sys_perf_event_open+0x290/0x9f0 do_syscall_64+0x93/0x7b0 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e ? trace_hardirqs_off+0x53/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Instead, have the code return -ENODEV, which doesn't warn and has perf error out with: # perf record -e synthetic:futex_wait Error: The sys_perf_event_open() syscall returned with 19 (No such device) for event (synthetic:futex_wait). "dmesg | grep -i perf" may provide additional information. Ideally perf should support synthetic events, but for now just fix the warning. The support can come later. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251216182440.147e4453@gandalf.local.home Fixes: 4b147936fa509 ("tracing: Add support for 'synthetic' events") Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07tracing: Fix memory leaks in create_field_var()Zilin Guan1-2/+4
[ Upstream commit 80f0d631dcc76ee1b7755bfca1d8417d91d71414 ] The function create_field_var() allocates memory for 'val' through create_hist_field() inside parse_atom(), and for 'var' through create_var(), which in turn allocates var->type and var->var.name internally. Simply calling kfree() to release these structures will result in memory leaks. Use destroy_hist_field() to properly free 'val', and explicitly release the memory of var->type and var->var.name before freeing 'var' itself. Link: https://patch.msgid.link/20251106120132.3639920-1-zilin@seu.edu.cn Fixes: 02205a6752f22 ("tracing: Add support for 'field variables'") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19minmax: make generic MIN() and MAX() macros available everywhereLinus Torvalds1-2/+0
[ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19tracing: Fix race condition in kprobe initialization causing NULL pointer ↵Yuan Chen3-10/+22
dereference [ Upstream commit 9cf9aa7b0acfde7545c1a1d912576e9bab28dc6f ] There is a critical race condition in kprobe initialization that can lead to NULL pointer dereference and kernel crash. [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000 ... [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO) [1135630.269239] pc : kprobe_perf_func+0x30/0x260 [1135630.277643] lr : kprobe_dispatcher+0x44/0x60 [1135630.286041] sp : ffffaeff4977fa40 [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400 [1135630.302837] x27: 0000000000000000 x26: 0000000000000000 [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528 [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50 [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50 [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000 [1135630.349985] x17: 0000000000000000 x16: 0000000000000000 [1135630.359285] x15: 0000000000000000 x14: 0000000000000000 [1135630.368445] x13: 0000000000000000 x12: 0000000000000000 [1135630.377473] x11: 0000000000000000 x10: 0000000000000000 [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000 [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000 [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000 [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006 [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000 [1135630.429410] Call trace: [1135630.434828] kprobe_perf_func+0x30/0x260 [1135630.441661] kprobe_dispatcher+0x44/0x60 [1135630.448396] aggr_pre_handler+0x70/0xc8 [1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0 [1135630.462435] brk_handler+0xbc/0xd8 [1135630.468437] do_debug_exception+0x84/0x138 [1135630.475074] el1_dbg+0x18/0x8c [1135630.480582] security_file_permission+0x0/0xd0 [1135630.487426] vfs_write+0x70/0x1c0 [1135630.493059] ksys_write+0x5c/0xc8 [1135630.498638] __arm64_sys_write+0x24/0x30 [1135630.504821] el0_svc_common+0x78/0x130 [1135630.510838] el0_svc_handler+0x38/0x78 [1135630.516834] el0_svc+0x8/0x1b0 kernel/trace/trace_kprobe.c: 1308 0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120] include/linux/compiler.h: 294 0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0] kernel/trace/trace_kprobe.c 1308: head = this_cpu_ptr(call->perf_events); 1309: if (hlist_empty(head)) 1310: return 0; crash> struct trace_event_call -o struct trace_event_call { ... [120] struct hlist_head *perf_events; //(call->perf_event) ... } crash> struct trace_event_call ffffaf015340e528 struct trace_event_call { ... perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0 ... } Race Condition Analysis: The race occurs between kprobe activation and perf_events initialization: CPU0 CPU1 ==== ==== perf_kprobe_init perf_trace_event_init tp_event->perf_events = list;(1) tp_event->class->reg (2)← KPROBE ACTIVE Debug exception triggers ... kprobe_dispatcher kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE) head = this_cpu_ptr(call->perf_events)(3) (perf_events is still NULL) Problem: 1. CPU0 executes (1) assigning tp_event->perf_events = list 2. CPU0 executes (2) enabling kprobe functionality via class->reg() 3. CPU1 triggers and reaches kprobe_dispatcher 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed) 5. CPU1 calls kprobe_perf_func() and crashes at (3) because call->perf_events is still NULL CPU1 sees that kprobe functionality is enabled but does not see that perf_events has been assigned. Add pairing read and write memory barriers to guarantee that if CPU1 sees that kprobe functionality is enabled, it must also see that perf_events has been assigned. Link: https://lore.kernel.org/all/20251001022025.44626-1-chenyuan_fl@163.com/ Fixes: 50d780560785 ("tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use") Cc: stable@vger.kernel.org Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> [ Drop fprobe changes + context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02tracing: dynevent: Add a missing lockdown check on dyneventMasami Hiramatsu (Google)1-0/+4
commit 456c32e3c4316654f95f9d49c12cbecfb77d5660 upstream. Since dynamic_events interface on tracefs is compatible with kprobe_events and uprobe_events, it should also check the lockdown status and reject if it is set. Link: https://lore.kernel.org/all/175824455687.45175.3734166065458520748.stgit@devnote2/ Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02tracing: Fix tracing_marker may trigger page fault during preempt_disableLuo Gengkun1-2/+2
[ Upstream commit 3d62ab32df065e4a7797204a918f6489ddb8a237 ] Both tracing_mark_write and tracing_mark_raw_write call __copy_from_user_inatomic during preempt_disable. But in some case, __copy_from_user_inatomic may trigger page fault, and will call schedule() subtly. And if a task is migrated to other cpu, the following warning will be trigger: if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing))) An example can illustrate this issue: process flow CPU --------------------------------------------------------------------- tracing_mark_raw_write(): cpu:0 ... ring_buffer_lock_reserve(): cpu:0 ... cpu = raw_smp_processor_id() cpu:0 cpu_buffer = buffer->buffers[cpu] cpu:0 ... ... __copy_from_user_inatomic(): cpu:0 ... # page fault do_mem_abort(): cpu:0 ... # Call schedule schedule() cpu:0 ... # the task schedule to cpu1 __buffer_unlock_commit(): cpu:1 ... ring_buffer_unlock_commit(): cpu:1 ... cpu = raw_smp_processor_id() cpu:1 cpu_buffer = buffer->buffers[cpu] cpu:1 As shown above, the process will acquire cpuid twice and the return values are not the same. To fix this problem using copy_from_user_nofault instead of __copy_from_user_inatomic, as the former performs 'access_ok' before copying. Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-02tracing: Do not add length to print format in synthetic eventsSteven Rostedt1-2/+0
[ Upstream commit e1a453a57bc76be678bd746f84e3d73f378a9511 ] The following causes a vsnprintf fault: # echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events # echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger # echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger Because the synthetic event's "wakee" field is created as a dynamic string (even though the string copied is not). The print format to print the dynamic string changed from "%*s" to "%s" because another location (__set_synth_event_print_fmt()) exported this to user space, and user space did not need that. But it is still used in print_synth_event(), and the output looks like: <idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155 sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58 <idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91 bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21 <idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129 sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50 The length isn't needed as the string is always nul terminated. Just print the string and not add the length (which was hard coded to the max string length anyway). Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Douglas Raillard <douglas.raillard@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields"); Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> [ offset calculations instead of union-based data structures ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-04ftrace: Fix potential warning in trace_printk_seq during ftrace_dumpTengda Wu1-2/+2
[ Upstream commit 4013aef2ced9b756a410f50d12df9ebe6a883e4a ] When calling ftrace_dump_one() concurrently with reading trace_pipe, a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race condition. The issue occurs because: CPU0 (ftrace_dump) CPU1 (reader) echo z > /proc/sysrq-trigger !trace_empty(&iter) trace_iterator_reset(&iter) <- len = size = 0 cat /sys/kernel/tracing/trace_pipe trace_find_next_entry_inc(&iter) __find_next_entry ring_buffer_empty_cpu <- all empty return NULL trace_printk_seq(&iter.seq) WARN_ON_ONCE(s->seq.len >= s->seq.size) In the context between trace_empty() and trace_find_next_entry_inc() during ftrace_dump, the ring buffer data was consumed by other readers. This caused trace_find_next_entry_inc to return NULL, failing to populate `iter.seq`. At this point, due to the prior trace_iterator_reset, both `iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal, the WARN_ON_ONCE condition is triggered. Move the trace_printk_seq() into the if block that checks to make sure the return value of trace_find_next_entry_inc() is non-NULL in ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before subsequent operations. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@elte.hu> Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com Fixes: d769041f8653 ("ring_buffer: implement new locking") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28tracing: Limit access to parser->buffer when trace_get_user failedPu Lehui2-7/+19
[ Upstream commit 6a909ea83f226803ea0e718f6e88613df9234d58 ] When the length of the string written to set_ftrace_filter exceeds FTRACE_BUFF_MAX, the following KASAN alarm will be triggered: BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0 Read of size 1 at addr ffff0000d00bd5ba by task ash/165 CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x34/0x50 (C) dump_stack_lvl+0xa0/0x158 print_address_description.constprop.0+0x88/0x398 print_report+0xb0/0x280 kasan_report+0xa4/0xf0 __asan_report_load1_noabort+0x20/0x30 strsep+0x18c/0x1b0 ftrace_process_regex.isra.0+0x100/0x2d8 ftrace_regex_release+0x484/0x618 __fput+0x364/0xa58 ____fput+0x28/0x40 task_work_run+0x154/0x278 do_notify_resume+0x1f0/0x220 el0_svc+0xec/0xf0 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1ac/0x1b0 The reason is that trace_get_user will fail when processing a string longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0. Then an OOB access will be triggered in ftrace_regex_release-> ftrace_process_regex->strsep->strpbrk. We can solve this problem by limiting access to parser->buffer when trace_get_user failed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com Fixes: 8c9af478c06b ("ftrace: Handle commands when closing set_ftrace_filter file") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28tracing: Remove unneeded goto out logicSteven Rostedt1-23/+14
[ Upstream commit c89504a703fb779052213add0e8ed642f4a4f1c8 ] Several places in the trace.c file there's a goto out where the out is simply a return. There's no reason to jump to the out label if it's not doing any more logic but simply returning from the function. Replace the goto outs with a return and remove the out labels. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.538726745@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28ftrace: Also allocate and copy hash for reading of filter filesSteven Rostedt1-9/+10
commit bfb336cf97df7b37b2b2edec0f69773e06d11955 upstream. Currently the reader of set_ftrace_filter and set_ftrace_notrace just adds the pointer to the global tracer hash to its iterator. Unlike the writer that allocates a copy of the hash, the reader keeps the pointer to the filter hashes. This is problematic because this pointer is static across function calls that release the locks that can update the global tracer hashes. This can cause UAF and similar bugs. Allocate and copy the hash for reading the filter files like it is done for the writers. This not only fixes UAF bugs, but also makes the code a bit simpler as it doesn't have to differentiate when to free the iterator's hash between writers and readers. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250822183606.12962cc3@batman.local.home Fixes: c20489dad156 ("ftrace: Assign iter->hash to filter or notrace hashes on seq read") Closes: https://lore.kernel.org/all/20250813023044.2121943-1-wutengda@huaweicloud.com/ Closes: https://lore.kernel.org/all/20250822192437.GA458494@ax162/ Reported-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28kernel: trace: preemptirq_delay_test: use offstack cpu maskArnd Bergmann1-4/+9
[ Upstream commit adc353c0bfb243ebfd29b6222fa3bf149169a6de ] A CPU mask on the stack is broken for large values of CONFIG_NR_CPUS: kernel/trace/preemptirq_delay_test.c: In function ‘preemptirq_delay_run’: kernel/trace/preemptirq_delay_test.c:143:1: error: the frame size of 8512 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] Fall back to dynamic allocation here. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Chen <chensong_2000@189.cn> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250620111215.3365305-1-arnd@kernel.org Fixes: 4b9091e1c194 ("kernel: trace: preemptirq_delay_test: add cpu affinity") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28tracing: Add down_write(trace_event_sem) when adding trace eventSteven Rostedt1-0/+5
commit b5e8acc14dcb314a9b61ff19dcd9fdd0d88f70df upstream. When a module is loaded, it adds trace events defined by the module. It may also need to modify the modules trace printk formats to replace enum names with their values. If two modules are loaded at the same time, the adding of the event to the ftrace_events list can corrupt the walking of the list in the code that is modifying the printk format strings and crash the kernel. The addition of the event should take the trace_event_sem for write while it adds the new event. Also add a lockdep_assert_held() on that semaphore in __trace_add_event_dirs() as it iterates the list. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/20250718223158.799bfc0c@batman.local.home Reported-by: Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com> Closes: https://lore.kernel.org/all/20250717105007.46ccd18f@batman.local.home/ Fixes: 110bf2b764eb6 ("tracing: add protection around module events unload") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27ftrace: Fix UAF when lookup kallsym after ftrace disabledYe Bin1-3/+7
commit f914b52c379c12288b7623bb814d0508dbe7481d upstream. The following issue happens with a buggy module: BUG: unable to handle page fault for address: ffffffffc05d0218 PGD 1bd66f067 P4D 1bd66f067 PUD 1bd671067 PMD 101808067 PTE 0 Oops: Oops: 0000 [#1] SMP KASAN PTI Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS RIP: 0010:sized_strscpy+0x81/0x2f0 RSP: 0018:ffff88812d76fa08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffffc0601010 RCX: dffffc0000000000 RDX: 0000000000000038 RSI: dffffc0000000000 RDI: ffff88812608da2d RBP: 8080808080808080 R08: ffff88812608da2d R09: ffff88812608da68 R10: ffff88812608d82d R11: ffff88812608d810 R12: 0000000000000038 R13: ffff88812608da2d R14: ffffffffc05d0218 R15: fefefefefefefeff FS: 00007fef552de740(0000) GS:ffff8884251c7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05d0218 CR3: 00000001146f0000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ftrace_mod_get_kallsym+0x1ac/0x590 update_iter_mod+0x239/0x5b0 s_next+0x5b/0xa0 seq_read_iter+0x8c9/0x1070 seq_read+0x249/0x3b0 proc_reg_read+0x1b0/0x280 vfs_read+0x17f/0x920 ksys_read+0xf3/0x1c0 do_syscall_64+0x5f/0x2e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue may happen as follows: (1) Add kprobe tracepoint; (2) insmod test.ko; (3) Module triggers ftrace disabled; (4) rmmod test.ko; (5) cat /proc/kallsyms; --> Will trigger UAF as test.ko already removed; ftrace_mod_get_kallsym() ... strscpy(module_name, mod_map->mod->name, MODULE_NAME_LEN); ... The problem is when a module triggers an issue with ftrace and sets ftrace_disable. The ftrace_disable is set when an anomaly is discovered and to prevent any more damage, ftrace stops all text modification. The issue that happened was that the ftrace_disable stops more than just the text modification. When a module is loaded, its init functions can also be traced. Because kallsyms deletes the init functions after a module has loaded, ftrace saves them when the module is loaded and function tracing is enabled. This allows the output of the function trace to show the init function names instead of just their raw memory addresses. When a module is removed, ftrace_release_mod() is called, and if ftrace_disable is set, it just returns without doing anything more. The problem here is that it leaves the mod_list still around and if kallsyms is called, it will call into this code and access the module memory that has already been freed as it will return: strscpy(module_name, mod_map->mod->name, MODULE_NAME_LEN); Where the "mod" no longer exists and triggers a UAF bug. Link: https://lore.kernel.org/all/20250523135452.626d8dcd@gandalf.local.home/ Cc: stable@vger.kernel.org Fixes: aba4b5c22cba ("ftrace: Save module init functions kallsyms symbols for tracing") Link: https://lore.kernel.org/20250529111955.2349189-2-yebin@huaweicloud.com Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27bpf: Fix WARN() in get_bpf_raw_tp_regsTao Chen1-1/+1
[ Upstream commit 3880cdbed1c4607e378f58fa924c5d6df900d1d3 ] syzkaller reported an issue: WARNING: CPU: 3 PID: 5971 at kernel/trace/bpf_trace.c:1861 get_bpf_raw_tp_regs+0xa4/0x100 kernel/trace/bpf_trace.c:1861 Modules linked in: CPU: 3 UID: 0 PID: 5971 Comm: syz-executor205 Not tainted 6.15.0-rc5-syzkaller-00038-g707df3375124 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:get_bpf_raw_tp_regs+0xa4/0x100 kernel/trace/bpf_trace.c:1861 RSP: 0018:ffffc90003636fa8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff81c6bc4c RDX: ffff888032efc880 RSI: ffffffff81c6bc83 RDI: 0000000000000005 RBP: ffff88806a730860 R08: 0000000000000005 R09: 0000000000000003 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000001 R14: ffffc90003637008 R15: 0000000000000900 FS: 0000000000000000(0000) GS:ffff8880d6cdf000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7baee09130 CR3: 0000000029f5a000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1934 [inline] bpf_get_stack_raw_tp+0x24/0x160 kernel/trace/bpf_trace.c:1931 bpf_prog_ec3b2eefa702d8d3+0x43/0x47 bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline] __bpf_prog_run include/linux/filter.h:718 [inline] bpf_prog_run include/linux/filter.h:725 [inline] __bpf_trace_run kernel/trace/bpf_trace.c:2363 [inline] bpf_trace_run3+0x23f/0x5a0 kernel/trace/bpf_trace.c:2405 __bpf_trace_mmap_lock_acquire_returned+0xfc/0x140 include/trace/events/mmap_lock.h:47 __traceiter_mmap_lock_acquire_returned+0x79/0xc0 include/trace/events/mmap_lock.h:47 __do_trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline] trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline] __mmap_lock_do_trace_acquire_returned+0x138/0x1f0 mm/mmap_lock.c:35 __mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline] mmap_read_trylock include/linux/mmap_lock.h:204 [inline] stack_map_get_build_id_offset+0x535/0x6f0 kernel/bpf/stackmap.c:157 __bpf_get_stack+0x307/0xa10 kernel/bpf/stackmap.c:483 ____bpf_get_stack kernel/bpf/stackmap.c:499 [inline] bpf_get_stack+0x32/0x40 kernel/bpf/stackmap.c:496 ____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1941 [inline] bpf_get_stack_raw_tp+0x124/0x160 kernel/trace/bpf_trace.c:1931 bpf_prog_ec3b2eefa702d8d3+0x43/0x47 Tracepoint like trace_mmap_lock_acquire_returned may cause nested call as the corner case show above, which will be resolved with more general method in the future. As a result, WARN_ON_ONCE will be triggered. As Alexei suggested, remove the WARN_ON_ONCE first. Fixes: 9594dc3c7e71 ("bpf: fix nested bpf tracepoints with per-cpu data") Reported-by: syzbot+45b0c89a0fc7ae8dbadc@syzkaller.appspotmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250513042747.757042-1-chen.dylane@linux.dev Closes: https://lore.kernel.org/bpf/8bc2554d-1052-4922-8832-e0078a033e1d@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27tracing: Fix compilation warning on arm32Pan Taixi1-1/+1
commit 2fbdb6d8e03b70668c0876e635506540ae92ab05 upstream. On arm32, size_t is defined to be unsigned int, while PAGE_SIZE is unsigned long. This hence triggers a compilation warning as min() asserts the type of two operands to be equal. Casting PAGE_SIZE to size_t solves this issue and works on other target architectures as well. Compilation warning details: kernel/trace/trace.c: In function 'tracing_splice_read_pipe': ./include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ ./include/linux/minmax.h:26:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ ... kernel/trace/trace.c:6771:8: note: in expansion of macro 'min' min((size_t)trace_seq_used(&iter->seq), ^~~ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250526013731.1198030-1-pantaixi@huaweicloud.com Fixes: f5178c41bb43 ("tracing: Fix oob write in trace_seq_to_buffer()") Reviewed-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Pan Taixi <pantaixi@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04tracing: Mark binary printing functions with __printf() attributeAndy Shevchenko2-15/+12
[ Upstream commit 196a062641fe68d9bfe0ad36b6cd7628c99ad22c ] Binary printing functions are using printf() type of format, and compiler is not happy about them as is: kernel/trace/trace.c:3292:9: error: function ‘trace_vbprintk’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] kernel/trace/trace_seq.c:182:9: error: function ‘trace_seq_bprintf’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] Fix the compilation errors by adding __printf() attribute. While at it, move existing __printf() attributes from the implementations to the declarations. IT also fixes incorrect attribute parameters that are used for trace_array_printk(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250321144822.324050-4-andriy.shevchenko@linux.intel.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22ftrace: Fix preemption accounting for stacktrace filter commandpengdonglin1-5/+1
commit 11aff32439df6ca5b3b891b43032faf88f4a6a29 upstream. The preemption count of the stacktrace filter command to trace ksys_read is consistently incorrect: $ echo ksys_read:stacktrace > set_ftrace_filter <...>-453 [004] ...1. 38.308956: <stack trace> => ksys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe The root cause is that the trace framework disables preemption when invoking the filter command callback in function_trace_probe_call: preempt_disable_notrace(); probe_ops->func(ip, parent_ip, probe_opsbe->tr, probe_ops, probe->data); preempt_enable_notrace(); Use tracing_gen_ctx_dec() to account for the preempt_disable_notrace(), which will output the correct preemption count: $ echo ksys_read:stacktrace > set_ftrace_filter <...>-410 [006] ..... 31.420396: <stack trace> => ksys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe Cc: stable@vger.kernel.org Fixes: 36590c50b2d07 ("tracing: Merge irqflags + preempt counter.") Link: https://lore.kernel.org/20250512094246.1167956-2-dolinux.peng@gmail.com Signed-off-by: pengdonglin <dolinux.peng@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22ftrace: Fix preemption accounting for stacktrace trigger commandpengdonglin1-1/+1
commit e333332657f615ac2b55aa35565c4a882018bbe9 upstream. When using the stacktrace trigger command to trace syscalls, the preemption count was consistently reported as 1 when the system call event itself had 0 ("."). For example: root@ubuntu22-vm:/sys/kernel/tracing/events/syscalls/sys_enter_read $ echo stacktrace > trigger $ echo 1 > enable sshd-416 [002] ..... 232.864910: sys_read(fd: a, buf: 556b1f3221d0, count: 8000) sshd-416 [002] ...1. 232.864913: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe The root cause is that the trace framework disables preemption in __DO_TRACE before invoking the trigger callback. Use the tracing_gen_ctx_dec() that will accommodate for the increase of the preemption count in __DO_TRACE when calling the callback. The result is the accurate reporting of: sshd-410 [004] ..... 210.117660: sys_read(fd: 4, buf: 559b725ba130, count: 40000) sshd-410 [004] ..... 210.117662: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe Cc: stable@vger.kernel.org Fixes: ce33c845b030c ("tracing: Dump stacktrace trigger to the corresponding instance") Link: https://lore.kernel.org/20250512094246.1167956-1-dolinux.peng@gmail.com Signed-off-by: pengdonglin <dolinux.peng@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22tracing: probes: Fix a possible race in trace_probe_log APIsMasami Hiramatsu (Google)5-3/+27
[ Upstream commit fd837de3c9cb1a162c69bc1fb1f438467fe7f2f5 ] Since the shared trace_probe_log variable can be accessed and modified via probe event create operation of kprobe_events, uprobe_events, and dynamic_events, it should be protected. In the dynamic_events, all operations are serialized by `dyn_event_ops_mutex`. But kprobe_events and uprobe_events interfaces are not serialized. To solve this issue, introduces dyn_event_create(), which runs create() operation under the mutex, for kprobe_events and uprobe_events. This also uses lockdep to check the mutex is held when using trace_probe_log* APIs. Link: https://lore.kernel.org/all/174684868120.551552.3068655787654268804.stgit@devnote2/ Reported-by: Paul Cacheux <paulcacheux@gmail.com> Closes: https://lore.kernel.org/all/20250510074456.805a16872b591e2971a4d221@kernel.org/ Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe events") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09tracing: Fix oob write in trace_seq_to_buffer()Jeongjun Park1-2/+3
commit f5178c41bb43444a6008150fe6094497135d07cb upstream. syzbot reported this bug: ================================================================== BUG: KASAN: slab-out-of-bounds in trace_seq_to_buffer kernel/trace/trace.c:1830 [inline] BUG: KASAN: slab-out-of-bounds in tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822 Write of size 4507 at addr ffff888032b6b000 by task syz.2.320/7260 CPU: 1 UID: 0 PID: 7260 Comm: syz.2.320 Not tainted 6.15.0-rc1-syzkaller-00301-g3bde70a2c827 #0 PREEMPT(full) Hardware name: Google Google Compute Engi