summaryrefslogtreecommitdiff
path: root/kernel/trace/trace.c
AgeCommit message (Collapse)AuthorFilesLines
2026-04-18tracing: Fix trace_buf_size= cmdline parameter with sizes >= 2GCalvin Owens1-3/+3
commit d008ba8be8984760e36d7dcd4adbd5a41a645708 upstream. Some of the sizing logic through tracer_alloc_buffers() uses int internally, causing unexpected behavior if the user passes a value that does not fit in an int (on my x86 machine, the result is uselessly tiny buffers). Fix by plumbing the parameter's real type (unsigned long) through to the ring buffer allocation functions, which already use unsigned long. It has always been possible to create larger ring buffers via the sysfs interface: this only affects the cmdline parameter. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org Fixes: 73c5162aa362 ("tracing: keep ring buffer to minimum size till used") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02tracing: Fix tracing_marker may trigger page fault during preempt_disableLuo Gengkun1-2/+2
[ Upstream commit 3d62ab32df065e4a7797204a918f6489ddb8a237 ] Both tracing_mark_write and tracing_mark_raw_write call __copy_from_user_inatomic during preempt_disable. But in some case, __copy_from_user_inatomic may trigger page fault, and will call schedule() subtly. And if a task is migrated to other cpu, the following warning will be trigger: if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing))) An example can illustrate this issue: process flow CPU --------------------------------------------------------------------- tracing_mark_raw_write(): cpu:0 ... ring_buffer_lock_reserve(): cpu:0 ... cpu = raw_smp_processor_id() cpu:0 cpu_buffer = buffer->buffers[cpu] cpu:0 ... ... __copy_from_user_inatomic(): cpu:0 ... # page fault do_mem_abort(): cpu:0 ... # Call schedule schedule() cpu:0 ... # the task schedule to cpu1 __buffer_unlock_commit(): cpu:1 ... ring_buffer_unlock_commit(): cpu:1 ... cpu = raw_smp_processor_id() cpu:1 cpu_buffer = buffer->buffers[cpu] cpu:1 As shown above, the process will acquire cpuid twice and the return values are not the same. To fix this problem using copy_from_user_nofault instead of __copy_from_user_inatomic, as the former performs 'access_ok' before copying. Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04ftrace: Fix potential warning in trace_printk_seq during ftrace_dumpTengda Wu1-2/+2
[ Upstream commit 4013aef2ced9b756a410f50d12df9ebe6a883e4a ] When calling ftrace_dump_one() concurrently with reading trace_pipe, a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race condition. The issue occurs because: CPU0 (ftrace_dump) CPU1 (reader) echo z > /proc/sysrq-trigger !trace_empty(&iter) trace_iterator_reset(&iter) <- len = size = 0 cat /sys/kernel/tracing/trace_pipe trace_find_next_entry_inc(&iter) __find_next_entry ring_buffer_empty_cpu <- all empty return NULL trace_printk_seq(&iter.seq) WARN_ON_ONCE(s->seq.len >= s->seq.size) In the context between trace_empty() and trace_find_next_entry_inc() during ftrace_dump, the ring buffer data was consumed by other readers. This caused trace_find_next_entry_inc to return NULL, failing to populate `iter.seq`. At this point, due to the prior trace_iterator_reset, both `iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal, the WARN_ON_ONCE condition is triggered. Move the trace_printk_seq() into the if block that checks to make sure the return value of trace_find_next_entry_inc() is non-NULL in ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before subsequent operations. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@elte.hu> Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com Fixes: d769041f8653 ("ring_buffer: implement new locking") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28tracing: Limit access to parser->buffer when trace_get_user failedPu Lehui1-6/+12
[ Upstream commit 6a909ea83f226803ea0e718f6e88613df9234d58 ] When the length of the string written to set_ftrace_filter exceeds FTRACE_BUFF_MAX, the following KASAN alarm will be triggered: BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0 Read of size 1 at addr ffff0000d00bd5ba by task ash/165 CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x34/0x50 (C) dump_stack_lvl+0xa0/0x158 print_address_description.constprop.0+0x88/0x398 print_report+0xb0/0x280 kasan_report+0xa4/0xf0 __asan_report_load1_noabort+0x20/0x30 strsep+0x18c/0x1b0 ftrace_process_regex.isra.0+0x100/0x2d8 ftrace_regex_release+0x484/0x618 __fput+0x364/0xa58 ____fput+0x28/0x40 task_work_run+0x154/0x278 do_notify_resume+0x1f0/0x220 el0_svc+0xec/0xf0 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1ac/0x1b0 The reason is that trace_get_user will fail when processing a string longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0. Then an OOB access will be triggered in ftrace_regex_release-> ftrace_process_regex->strsep->strpbrk. We can solve this problem by limiting access to parser->buffer when trace_get_user failed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com Fixes: 8c9af478c06b ("ftrace: Handle commands when closing set_ftrace_filter file") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28tracing: Remove unneeded goto out logicSteven Rostedt1-23/+14
[ Upstream commit c89504a703fb779052213add0e8ed642f4a4f1c8 ] Several places in the trace.c file there's a goto out where the out is simply a return. There's no reason to jump to the out label if it's not doing any more logic but simply returning from the function. Replace the goto outs with a return and remove the out labels. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.538726745@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27tracing: Fix compilation warning on arm32Pan Taixi1-1/+1
commit 2fbdb6d8e03b70668c0876e635506540ae92ab05 upstream. On arm32, size_t is defined to be unsigned int, while PAGE_SIZE is unsigned long. This hence triggers a compilation warning as min() asserts the type of two operands to be equal. Casting PAGE_SIZE to size_t solves this issue and works on other target architectures as well. Compilation warning details: kernel/trace/trace.c: In function 'tracing_splice_read_pipe': ./include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ ./include/linux/minmax.h:26:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ ... kernel/trace/trace.c:6771:8: note: in expansion of macro 'min' min((size_t)trace_seq_used(&iter->seq), ^~~ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250526013731.1198030-1-pantaixi@huaweicloud.com Fixes: f5178c41bb43 ("tracing: Fix oob write in trace_seq_to_buffer()") Reviewed-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Pan Taixi <pantaixi@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04tracing: Fix oob write in trace_seq_to_buffer()Jeongjun Park1-2/+3
commit f5178c41bb43444a6008150fe6094497135d07cb upstream. syzbot reported this bug: ================================================================== BUG: KASAN: slab-out-of-bounds in trace_seq_to_buffer kernel/trace/trace.c:1830 [inline] BUG: KASAN: slab-out-of-bounds in tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822 Write of size 4507 at addr ffff888032b6b000 by task syz.2.320/7260 CPU: 1 UID: 0 PID: 7260 Comm: syz.2.320 Not tainted 6.15.0-rc1-syzkaller-00301-g3bde70a2c827 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc3/0x670 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189 __asan_memcpy+0x3c/0x60 mm/kasan/shadow.c:106 trace_seq_to_buffer kernel/trace/trace.c:1830 [inline] tracing_splice_read_pipe+0x6be/0xdd0 kernel/trace/trace.c:6822 .... ================================================================== It has been reported that trace_seq_to_buffer() tries to copy more data than PAGE_SIZE to buf. Therefore, to prevent this, we should use the smaller of trace_seq_used(&iter->seq) and PAGE_SIZE as an argument. Link: https://lore.kernel.org/20250422113026.13308-1-aha310510@gmail.com Reported-by: syzbot+c8cd2d2c412b868263fb@syzkaller.appspotmail.com Fixes: 3c56819b14b0 ("tracing: splice support for tracing_pipe") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-09tracing: Prevent bad count for tracing_cpumask_writeLizhi Xu1-0/+3
[ Upstream commit 98feccbf32cfdde8c722bc4587aaa60ee5ac33f0 ] If a large count is provided, it will trigger a warning in bitmap_parse_user. Also check zero for it. Cc: stable@vger.kernel.org Fixes: 9e01c1b74c953 ("cpumask: convert kernel trace functions") Link: https://lore.kernel.org/20241216073238.2573704-1-lizhi.xu@windriver.com Reported-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0aecfd34fb878546f3fd Tested-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09kernel: Initialize cpumask before parsingTetsuo Handa1-1/+1
[ Upstream commit c5e3a41187ac01425f5ad1abce927905e4ac44e4 ] KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized. [ 148.133411][ T5509] ===================================================== [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable ----new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] ===================================================== Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20210401055823.3929-1-penguin-kernel@I-love.SAKURA.ne.jp Stable-dep-of: 98feccbf32cf ("tracing: Prevent bad count for tracing_cpumask_write") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17tracing: Have saved_cmdlines arrays all in one allocationSteven Rostedt (Google)1-10/+8
[ Upstream commit 0b18c852cc6fb8284ac0ab97e3e840974a6a8a64 ] The saved_cmdlines have three arrays for mapping PIDs to COMMs: - map_pid_to_cmdline[] - map_cmdline_to_pid[] - saved_cmdlines The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index into the other arrays. The map_cmdline_to_pid[] is a mapping back to the full pid as it can be larger than PID_MAX_DEFAULT. And the saved_cmdlines[] just holds the COMMs associated to the pids. Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated together (in reality the saved_cmdlines is just in the memory of the rounding of the allocation of the structure as it is always allocated in powers of two). The map_cmdline_to_pid[] array is allocated separately. Since the rounding to a power of two is rather large (it allows for 8000 elements in saved_cmdlines), also include the map_cmdline_to_pid[] array. (This drops it to 6000 by default, which is still plenty for most use cases). This saves even more memory as the map_cmdline_to_pid[] array doesn't need to be allocated. Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240220140703.182330529@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Mete Durlu <meted@linux.ibm.com> Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12tracing: Avoid possible softlockup in tracing_iter_reset()Zheng Yejian1-0/+2
commit 49aa8a1f4d6800721c7971ed383078257f12e8f9 upstream. In __tracing_open(), when max latency tracers took place on the cpu, the time start of its buffer would be updated, then event entries with timestamps being earlier than start of the buffer would be skipped (see tracing_iter_reset()). Softlockup will occur if the kernel is non-preemptible and too many entries were skipped in the loop that reset every cpu buffer, so add cond_resched() to avoid it. Cc: stable@vger.kernel.org Fixes: 2f26ebd549b9a ("tracing: use timestamp to determine start of latency traces") Link: https://lore.kernel.org/20240827124654.3817443-1-zhengyejian@huaweicloud.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23tracing: Inform kmemleak of saved_cmdlines allocationSteven Rostedt (Google)1-0/+3
commit 2394ac4145ea91b92271e675a09af2a9ea6840b7 upstream. The allocation of the struct saved_cmdlines_buffer structure changed from: s = kmalloc(sizeof(*s), GFP_KERNEL); s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL); to: orig_size = sizeof(*s) + val * TASK_COMM_LEN; order = get_order(orig_size); size = 1 << (order + PAGE_SHIFT); page = alloc_pages(GFP_KERNEL, order); if (!page) return NULL; s = page_address(page); memset(s, 0, sizeof(*s)); s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL); Where that s->saved_cmdlines allocation looks to be a dangling allocation to kmemleak. That's because kmemleak only keeps track of kmalloc() allocations. For allocations that use page_alloc() directly, the kmemleak needs to be explicitly informed about it. Add kmemleak_alloc() and kmemleak_free() around the page allocation so that it doesn't give the following false positive: unreferenced object 0xffff8881010c8000 (size 32760): comm "swapper", pid 0, jiffies 4294667296 hex dump (first 32 bytes): ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ backtrace (crc ae6ec1b9): [<ffffffff86722405>] kmemleak_alloc+0x45/0x80 [<ffffffff8414028d>] __kmalloc_large_node+0x10d/0x190 [<ffffffff84146ab1>] __kmalloc+0x3b1/0x4c0 [<ffffffff83ed7103>] allocate_cmdlines_buffer+0x113/0x230 [<ffffffff88649c34>] tracer_alloc_buffers.isra.0+0x124/0x460 [<ffffffff8864a174>] early_trace_init+0x14/0xa0 [<ffffffff885dd5ae>] start_kernel+0x12e/0x3c0 [<ffffffff885f5758>] x86_64_start_reservations+0x18/0x30 [<ffffffff885f582b>] x86_64_start_kernel+0x7b/0x80 [<ffffffff83a001c3>] secondary_startup_64_no_verify+0x15e/0x16b Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r.fsf@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240214112046.09a322d6@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic") Reported-by: Kalle Valo <kvalo@kernel.org> Tested-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23tracing: Fix wasted memory in saved_cmdlines logicSteven Rostedt (Google)1-38/+37
commit 44dc5c41b5b1267d4dd037d26afc0c4d3a568acb upstream. While looking at improving the saved_cmdlines cache I found a huge amount of wasted memory that should be used for the cmdlines. The tracing data saves pids during the trace. At sched switch, if a trace occurred, it will save the comm of the task that did the trace. This is saved in a "cache" that maps pids to comms and exposed to user space via the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by default 128 comms. The structure that uses this creates an array to store the pids using PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure to be of the size of 131104 bytes on 64 bit machines. In hex: 131104 = 0x20020, and since the kernel allocates generic memory in powers of two, the kernel would allocate 0x40000 or 262144 bytes to store this structure. That leaves 131040 bytes of wasted space. Worse, the structure points to an allocated array to store the comm names, which is 16 bytes times the amount of names to save (currently 128), which is 2048 bytes. Instead of allocating a separate array, make the structure end with a variable length string and use the extra space for that. This is similar to a recommendation that Linus had made about eventfs_inode names: https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/ Instead of allocating a separate string array to hold the saved comms, have the structure end with: char saved_cmdlines[]; and round up to the next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN It will use this extra space for the saved_cmdline portion. Now, instead of saving only 128 comms by default, by using this wasted space at the end of the structure it can save over 8000 comms and even saves space by removing the need for allocating the other array. Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Mete Durlu <meted@linux.ibm.com> Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-25tracing: Have large events show up as '[LINE TOO BIG]' instead of nothingSteven Rostedt (Google)1-1/+5
[ Upstream commit b55b0a0d7c4aa2dac3579aa7e6802d1f57445096 ] If a large event was added to the ring buffer that is larger than what the trace_seq can handle, it just drops the output: ~# cat /sys/kernel/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-859 [001] ..... 141.118951: tracing_mark_write <...>-859 [001] ..... 141.148201: tracing_mark_write: 78901234 Instead, catch this case and add some context: ~# cat /sys/kernel/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-852 [001] ..... 121.550551: tracing_mark_write[LINE TOO BIG] <...>-852 [001] ..... 121.550581: tracing_mark_write: 78901234 This now emulates the same output as trace_pipe. Link: https://lore.kernel.org/linux-trace-kernel/20231209171058.78c1a026@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-05tracing: Fix blocked reader of snapshot bufferSteven Rostedt (Google)1-3/+17
commit 39a7dc23a1ed0fe81141792a09449d124c5953bd upstream. If an application blocks on the snapshot or snapshot_raw files, expecting to be woken up when a snapshot occurs, it will not happen. Or it may happen with an unexpected result. That result is that the application will be reading the main buffer instead of the snapshot buffer. That is because when the snapshot occurs, the main and snapshot buffers are swapped. But the reader has a descriptor still pointing to the buffer that it originally connected to. This is fine for the main buffer readers, as they may be blocked waiting for a watermark to be hit, and when a snapshot occurs, the data that the main readers want is now on the snapshot buffer. But for waiters of the snapshot buffer, they are waiting for an event to occur that will trigger the snapshot and they can then consume it quickly to save the snapshot before the next snapshot occurs. But to do this, they need to read the new snapshot buffer, not the old one that is now receiving new data. Also, it does not make sense to have a watermark "buffer_percent" on the snapshot buffer, as the snapshot buffer is static and does not receive new data except all at once. Link: https://lore.kernel.org/linux-trace-kernel/20231228095149.77f5b45d@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: debdd57f5145f ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-20tracing: Update snapshot buffer on resize if it is allocatedSteven Rostedt (Google)1-2/+2
commit d06aff1cb13d2a0d52b48e605462518149c98c81 upstream. The snapshot buffer is to mimic the main buffer so that when a snapshot is needed, the snapshot and main buffer are swapped. When the snapshot buffer is allocated, it is set to the minimal size that the ring buffer may be at and still functional. When it is allocated it becomes the same size as the main ring buffer, and when the main ring buffer changes in size, it should do. Currently, the resize only updates the snapshot buffer if it's used by the current tracer (ie. the preemptirqsoff tracer). But it needs to be updated anytime it is allocated. When changing the size of the main buffer, instead of looking to see if the current tracer is utilizing the snapshot buffer, just check if it is allocated to know if it should be updated or not. Also fix typo in comment just above the code change. Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13tracing: Stop current tracer when resizing bufferSteven Rostedt (Google)1-3/+7
[ Upstream commit d78ab792705c7be1b91243b2544d1a79406a2ad7 ] When the ring buffer is being resized, it can cause side effects to the running tracer. For instance, there's a race with irqsoff tracer that swaps individual per cpu buffers between the main buffer and the snapshot buffer. The resize operation modifies the main buffer and then the snapshot buffer. If a swap happens in between those two operations it will break the tracer. Simply stop the running tracer before resizing the buffers and enable it again when finished. Link: https://lkml.kernel.org/r/20231205220010.748996423@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 3928a8a2d9808 ("ftrace: make work with new ring buffer") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13tracing: Set actual size after ring buffer resizeZheng Yejian1-8/+11
[ Upstream commit 6d98a0f2ac3c021d21be66fa34e992137cd25bcb ] Currently we can resize trace ringbuffer by writing a value into file 'buffer_size_kb', then by reading the file, we get the value that is usually what we wrote. However, this value may be not actual size of trace ring buffer because of the round up when doing resize in kernel, and the actual size would be more useful. Link: https://lore.kernel.org/linux-trace-kernel/20230705002705.576633-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Stable-dep-of: d78ab792705c ("tracing: Stop current tracer when resizing buffer") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13tracing: Fix a possible race when disabling buffered eventsPetr Pavlu1-4/+8
commit c0591b1cccf708a47bc465c62436d669a4213323 upstream. Function trace_buffered_event_disable() is responsible for freeing pages backing buffered events and this process can run concurrently with trace_event_buffer_lock_reserve(). The following race is currently possible: * Function trace_buffered_event_disable() is called on CPU 0. It increments trace_buffered_event_cnt on each CPU and waits via synchronize_rcu() for each user of trace_buffered_event to complete. * After synchronize_rcu() is finished, function trace_buffered_event_disable() has the exclusive access to trace_buffered_event. All counters trace_buffered_event_cnt are at 1 and all pointers trace_buffered_event are still valid. * At this point, on a different CPU 1, the execution reaches trace_event_buffer_lock_reserve(). The function calls preempt_disable_notrace() and only now enters an RCU read-side critical section. The function proceeds and reads a still valid pointer from trace_buffered_event[CPU1] into the local variable "entry". However, it doesn't yet read trace_buffered_event_cnt[CPU1] which happens later. * Function trace_buffered_event_disable() continues. It frees trace_buffered_event[CPU1] and decrements trace_buffered_event_cnt[CPU1] back to 0. * Function trace_event_buffer_lock_reserve() continues. It reads and increments trace_buffered_event_cnt[CPU1] from 0 to 1. This makes it believe that it can use the "entry" that it already obtained but the pointer is now invalid and any access results in a use-after-free. Fix the problem by making a second synchronize_rcu() call after all trace_buffered_event values are set to NULL. This waits on all potential users in trace_event_buffer_lock_reserve() that still read a previous pointer from trace_buffered_event. Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/ Link: https://lkml.kernel.org/r/20231205161736.19663-4-petr.pavlu@suse.com Cc: stable@vger.kernel.org Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13tracing: Fix incomplete locking when disabling buffered eventsPetr Pavlu1-8/+4
commit 7fed14f7ac9cf5e38c693836fe4a874720141845 upstream. The following warning appears when using buffered events: [ 203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420 [...] [ 203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G E 6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a [ 203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 [ 203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420 [ 203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff [ 203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202 [ 203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000 [ 203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400 [ 203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000 [ 203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008 [ 203.781846] FS: 00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000 [ 203.781851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0 [ 203.781862] Call Trace: [ 203.781870] <TASK> [ 203.851949] trace_event_buffer_commit+0x1ea/0x250 [ 203.851967] trace_event_raw_event_sys_enter+0x83/0xe0 [ 203.851983] syscall_trace_enter.isra.0+0x182/0x1a0 [ 203.851990] do_syscall_64+0x3a/0xe0 [ 203.852075] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 203.852090] RIP: 0033:0x7f4cd870fa77 [ 203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48 [ 203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089 [ 203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77 [ 203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0 [ 203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0 [ 203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40 [ 204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0 [ 204.049256] </TASK> For instance, it can be triggered by running these two commands in parallel: $ while true; do echo hist:key=id.syscall:val=hitcount > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger; done $ stress-ng --sysinfo $(nproc) The warning indicates that the current ring_buffer_per_cpu is not in the committing state. It happens because the active ring_buffer_event doesn't actually come from the ring_buffer_per_cpu but is allocated from trace_buffered_event. The bug is in function trace_buffered_event_disable() where the following normally happens: * The code invokes disable_trace_buffered_event() via smp_call_function_many() and follows it by synchronize_rcu(). This increments the per-CPU variable trace_buffered_event_cnt on each target CPU and grants trace_buffered_event_disable() the exclusive access to the per-CPU variable trace_buffered_event. * Maintenance is performed on trace_buffered_event, all per-CPU event buffers get freed. * The code invokes enable_trace_buffered_event() via smp_call_function_many(). This decrements trace_buffered_event_cnt and releases the access to trace_buffered_event. A problem is that smp_call_function_many() runs a given function on all target CPUs except on the current one. The following can then occur: * Task X executing trace_buffered_event_disable() runs on CPU 0. * The control reaches synchronize_rcu() and the task gets rescheduled on another CPU 1. * The RCU synchronization finishes. At this point, trace_buffered_event_disable() has the exclusive access to all trace_buffered_event variables except trace_buffered_event[CPU0] because trace_buffered_event_cnt[CPU0] is never incremented and if the buffer is currently unused, remains set to 0. * A different task Y is scheduled on CPU 0 and hits a trace event. The code in trace_event_buffer_lock_reserve() sees that trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the buffer provided by trace_buffered_event[CPU0]. * Task X continues its execution in trace_buffered_event_disable(). The code incorrectly frees the event buffer pointed by trace_buffered_event[CPU0] and resets the variable to NULL. * Task Y writes event data to the now freed buffer and later detects the created inconsistency. The issue is observable since commit dea499781a11 ("tracing: Fix warning in trace_buffered_event_disable()") which moved the call of trace_buffered_event_disable() in __ftrace_event_enable_disable() earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..). The underlying problem in trace_buffered_event_disable() is however present since the original implementation in commit 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events"). Fix the problem by replacing the two smp_call_function_many() calls with on_each_cpu_mask() which invokes a given callback on all CPUs. Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/ Link: https://lkml.kernel.org/r/20231205161736.19663-2-petr.pavlu@suse.com Cc: stable@vger.kernel.org Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") Fixes: dea499781a11 ("tracing: Fix warning in trace_buffered_event_disable()") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13tracing: Disable snapshot buffer when stopping instance tracersSteven Rostedt (Google)1-76/+34
commit b538bf7d0ec11ca49f536dfda742a5f6db90a798 upstream. It use to be that only the top level instance had a snapshot buffer (for latency tracers like wakeup and irqsoff). When stopping a tracer in an instance would not disable the snapshot buffer. This could have some unintended consequences if the irqsoff tracer is enabled. Consolidate the tracing_start/stop() with tracing_start/stop_tr() so that all instances behave the same. The tracing_start/stop() functions will just call their respective tracing_start/stop_tr() with the global_array passed in. Link: https://lkml.kernel.org/r/20231205220011.041220035@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 6d9b3fa5e7f6 ("tracing: Move tracing_max_latency into trace_array") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13tracing: Always update snapshot buffer sizeSteven Rostedt (Google)1-2/+1
commit 7be76461f302ec05cbd62b90b2a05c64299ca01f upstream. It use to be that only the top level instance had a snapshot buffer (for latency tracers like wakeup and irqsoff). The update of the ring buffer size would check if the instance was the top level and if so, it would also update the snapshot buffer as it needs to be the same as the main buffer. Now that lower level instances also has a snapshot buffer, they too need to update their snapshot buffer sizes when the main buffer is changed, otherwise the following can be triggered: # cd /sys/kernel/tracing # echo 1500 > buffer_size_kb # mkdir instances/foo # echo irqsoff > instances/foo/current_tracer # echo 1000 > instances/foo/buffer_size_kb Produces: WARNING: CPU: 2 PID: 856 at kernel/trace/trace.c:1938 update_max_tr_single.part.0+0x27d/0x320 Which is: ret = ring_buffer_swap_cpu(tr->max_buffer.buffer, tr->array_buffer.buffer, cpu); if (ret == -EBUSY) { [..] } WARN_ON_ONCE(ret && ret != -EAGAIN && ret != -EBUSY); <== here That's because ring_buffer_swap_cpu() has: int ret = -EINVAL; [..] /* At least make sure the two buffers are somewhat the same */ if (cpu_buffer_a->nr_pages != cpu_buffer_b->nr_pages) goto out; [..] out: return ret; } Instead, update all instances' snapshot buffer sizes when their main buffer size is updated. Link: https://lkml.kernel.org/r/20231205220010.454662151@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 6d9b3fa5e7f6 ("tracing: Move tracing_max_latency into trace_array") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13tracing: Fix a warning when allocating buffered events failsPetr Pavlu1-6/+5
[ Upstream commit 34209fe83ef8404353f91ab4ea4035dbc9922d04 ] Function trace_buffered_event_disable() produces an unexpected warning when the previous call to trace_buffered_event_enable() fails to allocate pages for buffered events. The situation can occur as follows: * The counter trace_buffered_event_ref is at 0. * The soft mode gets enabled for some event and trace_buffered_event_enable() is called. The function increments trace_buffered_event_ref to 1 and starts allocating event pages. * The allocation fails for some page and trace_buffered_event_disable() is called for cleanup. * Function trace_buffered_event_disable() decrements trace_buffered_event_ref back to 0, recognizes that it was the last use of buffered events and frees all allocated pages. * The control goes back to trace_buffered_event_enable() which returns. The caller of trace_buffered_event_enable() has no information that the function actually failed. * Some time later, the soft mode is disabled for the same event. Function trace_buffered_event_disable() is called. It warns on "WARN_ON_ONCE(!trace_buffered_event_ref)" and returns. Buffered events are just an optimization and can handle failures. Make trace_buffered_event_enable() exit on the first failure and left any cleanup later to when trace_buffered_event_disable() is called. Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/ Link: https://lkml.kernel.org/r/20231205161736.19663-3-petr.pavlu@suse.com Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28tracing: Have trace_event_file have ref countersSteven Rostedt (Google)1-0/+15
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream. The following can crash the kernel: # cd /sys/kernel/tracing # echo 'p:sched schedule' > kprobe_events # exec 5>>events/kprobes/sched/enable # > kprobe_events # exec 5>&- The above commands: 1. Change directory to the tracefs directory 2. Create a kprobe event (doesn't matter what one) 3. Open bash file descriptor 5 on the enable file of the kprobe event 4. Delete the kprobe event (removes the files too) 5. Close the bash file descriptor 5 The above causes a crash! BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:tracing_release_file_tr+0xc/0x50 What happens here is that the kprobe event creates a trace_event_file "file" descriptor that represents the file in tracefs to the event. It maintains state of the event (is it enabled for the given instance?). Opening the "enable" file gets a reference to the event "file" descriptor via the open file descriptor. When the kprobe event is deleted, the file is also deleted from the tracefs system which also frees the event "file" descriptor. But as the tracefs file is still opened by user space, it will not be totally removed until the final dput() is called on it. But this is not true with the event "file" descriptor that is already freed. If the user does a write to or simply closes the file descriptor it will reference the event "file" descriptor that was just freed, causing a use-after-free bug. To solve this, add a ref count to the event "file" descriptor as well as a new flag called "FREED". The "file" will not be freed until the last reference is released. But the FREE flag will be set when the event is removed to prevent any more modifications to that event from happening, even if there's still a reference to the event "file" descriptor. Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files") Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10tracing: Increase trace array ref count on enable and filter filesSteven Rostedt (Google)1-0/+27
[ Upstream commit f5ca233e2e66dc1c249bf07eefa37e34a6c9346a ] When the trace event enable and filter files are opened, increment the trace array ref counter, otherwise they can be accessed when the trace array is being deleted. The ref counter keeps the trace array from being deleted while those files are opened. Link: https://lkml.kernel.org/r/20230907024803.456187066@goodmis.org Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23tracing: Have option files inc the trace array ref countSteven Rostedt (Google)1-1/+22
commit 7e2cfbd2d3c86afcd5c26b5c4b1dd251f63c5838 upstream. The option files update the options for a given trace array. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace_array when an option file is opened. Link: https://lkml.kernel.org/r/20230907024804.086679464@goodmis.org Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yejian <zhengyejian1@huawei.com> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-23tracing: Have current_trace inc the trace array ref countSteven Rostedt (Google)1-1/+2
commit 9b37febc578b2e1ad76a105aab11d00af5ec3d27 upstream. The current_trace updates the trace array tracer. For an instance, if the file is opened and the instance is deleted, reading or writing to the file will cause a use after free. Up the ref count of the trace array when current_trace is opened. Link: https://lkml.kernel.org/r/20230907024803.877687227@goodmis.org Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yejian <zhengyejian1@huawei.com> Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()") Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-19tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSYBrian Foster1-2/+2
commit 3d07fa1dd19035eb0b13ae6697efd5caa9033e74 upstream. The pipe cpumask used to serialize opens between the main and percpu trace pipes is not zeroed or initialized. This can result in spurious -EBUSY returns if underlying memory is not fully zeroed. This has been observed by immediate failure to read the main trace_pipe file on an otherwise newly booted and idle system: # cat /sys/kernel/debug/tracing/trace_pipe cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy Zero the allocation of pipe_cpumask to avoid the problem. Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com Cc: stable@vger.kernel.org Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes") Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>