summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2017-08-30tracing: Fix freeing of filter in create_filter() when set_str is falseSteven Rostedt (VMware)1-0/+4
commit 8b0db1a5bdfcee0dbfa89607672598ae203c9045 upstream. Performing the following task with kmemleak enabled: # cd /sys/kernel/tracing/events/irq/irq_handler_entry/ # echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger # echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger # echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800b9290308 (size 32): comm "bash", pid 1114, jiffies 4294848451 (age 141.139s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290 [<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940 [<ffffffff812639c9>] create_filter+0xa9/0x160 [<ffffffff81263bdc>] create_event_filter+0xc/0x10 [<ffffffff812655e5>] set_trigger_filter+0xe5/0x210 [<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490 [<ffffffff812652e2>] event_trigger_write+0x1a2/0x260 [<ffffffff8138cf87>] __vfs_write+0xd7/0x380 [<ffffffff8138f421>] vfs_write+0x101/0x260 [<ffffffff8139187b>] SyS_write+0xab/0x130 [<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe [<ffffffffffffffff>] 0xffffffffffffffff The function create_filter() is passed a 'filterp' pointer that gets allocated, and if "set_str" is true, it is up to the caller to free it, even on error. The problem is that the pointer is not freed by create_filter() when set_str is false. This is a bug, and it is not up to the caller to free the filter on error if it doesn't care about the string. Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com Fixes: 38b78eb85 ("tracing: Factorize filter creation") Reported-by: Chunyu Hu <chuhu@redhat.com> Tested-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27tracing: Fix kmemleak in instance_rmdirChunyu Hu1-0/+1
commit db9108e054700c96322b0f0028546aa4e643cf0b upstream. Hit the kmemleak when executing instance_rmdir, it forgot releasing mem of tracing_cpumask. With this fix, the warn does not appear any more. unreferenced object 0xffff93a8dfaa7c18 (size 8): comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s) hex dump (first 8 bytes): ff ff ff ff ff ff ff ff ........ backtrace: [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280 [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30 [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10 [<ffffffff88571ab0>] instance_mkdir+0x90/0x240 [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70 [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0 [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100 [<ffffffff88403857>] do_syscall_64+0x67/0x150 [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com Fixes: ccfe9e42e451 ("tracing: Make tracing_cpumask available for all instances") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate resultsPavankumar Kondeti1-1/+1
commit c59f29cb144a6a0dfac16ede9dc8eafc02dc56ca upstream. The 's' flag is supposed to indicate that a softirq is running. This can be detected by testing the preempt_count with SOFTIRQ_OFFSET. The current code tests the preempt_count with SOFTIRQ_MASK, which would be true even when softirqs are disabled but not serving a softirq. Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15tracing/kprobes: Allow to create probe with a module name starting with a digitSabrina Dubroca1-13/+8
commit 9e52b32567126fe146f198971364f68d3bc5233f upstream. Always try to parse an address, since kstrtoul() will safely fail when given a symbol as input. If that fails (which will be the case for a symbol), try to parse a symbol instead. This allows creating a probe such as: p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0 Which is necessary for this command to work: perf probe -m 8021q -a vlan_gro_receive Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25tracing/kprobes: Enforce kprobes teardown after testingThomas Gleixner1-0/+5
commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream. Enabling the tracer selftest triggers occasionally the warning in text_poke(), which warns when the to be modified page is not marked reserved. The reason is that the tracer selftest installs kprobes on functions marked __init for testing. These probes are removed after the tests, but that removal schedules the delayed kprobes_optimizer work, which will do the actual text poke. If the work is executed after the init text is freed, then the warning triggers. The bug can be reproduced reliably when the work delay is increased. Flush the optimizer work and wait for the optimizing/unoptimizing lists to become empty before returning from the kprobes tracer selftest. That ensures that all operations which were queued due to the probes removal have completed. Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 6274de498 ("kprobes: Support delayed unoptimizing") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30perf: Avoid horrible stack usagePeter Zijlstra (Intel)4-6/+8
commit 86038c5ea81b519a8a1fcfcd5e4599aab0cdd119 upstream. Both Linus (most recent) and Steve (a while ago) reported that perf related callbacks have massive stack bloat. The problem is that software events need a pt_regs in order to properly report the event location and unwind stack. And because we could not assume one was present we allocated one on stack and filled it with minimal bits required for operation. Now, pt_regs is quite large, so this is undesirable. Furthermore it turns out that most sites actually have a pt_regs pointer available, making this even more onerous, as the stack space is pointless waste. This patch addresses the problem by observing that software events have well defined nesting semantics, therefore we can use static per-cpu storage instead of on-stack. Linus made the further observation that all but the scheduler callers of perf_sw_event() have a pt_regs available, so we change the regular perf_sw_event() to require a valid pt_regs (where it used to be optional) and add perf_sw_event_sched() for the scheduler. We have a scheduler specific call instead of a more generic _noregs() like construct because we can assume non-recursion from the scheduler and thereby simplify the code further (_noregs would have to put the recursion context call inline in order to assertain which __perf_regs element to use). One last note on the implementation of perf_trace_buf_prepare(); we allow .regs = NULL for those cases where we already have a pt_regs pointer available and do not need another. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30ring-buffer: Have ring_buffer_iter_empty() return true when emptySteven Rostedt (VMware)1-2/+14
commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream. I noticed that reading the snapshot file when it is empty no longer gives a status. It suppose to show the status of the snapshot buffer as well as how to allocate and use it. For example: ># cat snapshot # tracer: nop # # # * Snapshot is allocated * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') But instead it just showed an empty buffer: ># cat snapshot # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | What happened was that it was using the ring_buffer_iter_empty() function to see if it was empty, and if it was, it showed the status. But that function was returning false when it was empty. The reason was that the iter header page was on the reader page, and the reader page was empty, but so was the buffer itself. The check only tested to see if the iter was on the commit page, but the commit page was no longer pointing to the reader page, but as all pages were empty, the buffer is also. Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30tracing: Allocate the snapshot buffer before enabling probeSteven Rostedt (VMware)1-3/+5
commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream. Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-22ring-buffer: Fix return value check in test_ringbuffer()Wei Yongjun1-4/+4
commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream. In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-22ftrace: Fix removing of second function probeSteven Rostedt (VMware)1-4/+16
commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream. When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-15fgraph: Handle a case where a tracer ignores set_graph_notraceSteven Rostedt (Red Hat)1-3/+14
[ Upstream commit 794de08a16cf1fc1bf785dc48f66d36218cf6d88 ] Both the wakeup and irqsoff tracers can use the function graph tracer when the display-graph option is set. The problem is that they ignore the notrace file, and record the entry of functions that would be ignored by the function_graph tracer. This causes the trace->depth to be recorded into the ring buffer. The set_graph_notrace uses a trick by adding a large negative number to the trace->depth when a graph function is to be ignored. On trace output, the graph function uses the depth to record a stack of functions. But since the depth is negative, it accesses the array with a negative number and causes an out of bounds access that can cause a kernel oops or corrupt data. Have the print functions handle cases where a tracer still records functions even when they are in set_graph_notrace. Also add warnings if the depth is below zero before accessing the array. Note, the function graph logic will still prevent the return of these functions from being recorded, which means that they will be left hanging without a return. For example: # echo '*spin*' > set_graph_notrace # echo 1 > options/display-graph # echo wakeup > current_tracer # cat trace [...] _raw_spin_lock() { preempt_count_add() { do_raw_spin_lock() { update_rq_clock(); Where it should look like: _raw_spin_lock() { preempt_count_add(); do_raw_spin_lock(); } update_rq_clock(); Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
2016-07-12tracing: Handle NULL formats in hold_module_trace_bprintk_format()Steven Rostedt (Red Hat)1-1/+6
[ Upstream commit 70c8217acd4383e069fe1898bbad36ea4fcdbdcc ] If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com Reported-by: xingzhen <zhengjun.xing@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-06-06ring-buffer: Prevent overflow of size in ring_buffer_resize()Steven Rostedt (Red Hat)1-5/+4
[ Upstream commit 59643d1535eb220668692a5359de22545af579f6 ] If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Cc: stable@vger.kernel.org # 3.5+ Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-06-06ring-buffer: Use long for nr_pages to avoid overflow failuresSteven Rostedt (Red Hat)1-12/+14
[ Upstream commit 9b94a8fba501f38368aef6ac1b30e7335252a220 ] The size variable to change the ring buffer in ftrace is a long. The nr_pages used to update the ring buffer based on the size is int. On 64 bit machines this can cause an overflow problem. For example, the following will cause the ring buffer to crash: # cd /sys/kernel/debug/tracing # echo 10 > buffer_size_kb # echo 8556384240 > buffer_size_kb Then you get the warning of: WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260 Which is: RB_WARN_ON(cpu_buffer, nr_removed); Note each ring buffer page holds 4080 bytes. This is because: 1) 10 causes the ring buffer to have 3 pages. (10kb requires 3 * 4080 pages to hold) 2) (2^31 / 2^10 + 1) * 4080 = 8556384240 The value written into buffer_size_kb is shifted by 10 and then passed to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE which is 4080. 8761737461760 / 4080 = 2147484672 4) nr_pages is subtracted from the current nr_pages (3) and we get: 2147484669. This value is saved in a signed integer nr_pages_to_update 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int turns into the value of -2147482627 6) As the value is a negative number, in update_pages_handler() it is negated and passed to rb_remove_pages() and 2147482627 pages will be removed, which is much larger than 3 and it causes the warning because not all the pages asked to be removed were removed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001 Cc: stable@vger.kernel.org # 2.6.28+ Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer") Reported-by: Hao Qin <QEver.cn@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-06-06ring-buffer: Move recursive check to per_cpu descriptorSteven Rostedt (Red Hat)1-18/+19
[ Upstream commit 58a09ec6e3ec88c9c7e061479f1ef7fe93324a87 ] Instead of using a global per_cpu variable to perform the recursive checks into the ring buffer, use the already existing per_cpu descriptor that is part of the ring buffer itself. Not only does this simplify the code, it also allows for one ring buffer to be used within the guts of the use of another ring buffer. For example trace_printk() can now be used within the ring buffer to record changes done by an instance into the main ring buffer. The recursion checks will prevent the trace_printk() itself from causing recursive issues with the main ring buffer (it is just ignored), but the recursive checks wont prevent the trace_printk() from recording other ring buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-06-06ring-buffer: Remove duplicate use of '&' in recursive codeSteven Rostedt (Red Hat)1-4/+1
[ Upstream commit d631c8cceb1d1d06f372878935949d421585186b ] A clean up of the recursive protection code changed val = this_cpu_read(current_context); val--; val &= this_cpu_read(current_context); to val = this_cpu_read(current_context); val &= val & (val - 1); Which has a duplicate use of '&' as the above is the same as val = val & (val - 1); Actually, it would be best to remove that line altogether and just add it to where it is used. And Christoph even mentioned that it can be further compacted to just a single line: __this_cpu_and(current_context, __this_cpu_read(current_context) - 1); Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.org Suggested-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-06-06ring-buffer: Add unlikelys to make fast path the defaultSteven Rostedt (Red Hat)1-5/+5
[ Upstream commit 3205f8063b6cc54b20d5080fb79dfcbd9c39e93d ] I was running the trace_event benchmark and noticed that the times to record a trace_event was all over the place. I looked at the assembly of the ring_buffer_lock_reserver() and saw this: <ring_buffer_lock_reserve>: 31 c0 xor %eax,%eax 48 83 3d 76 47 bd 00 cmpq $0x1,0xbd4776(%rip) # ffffffff81d10d60 <ring_buffer_flags> 01 55 push %rbp 48 89 e5 mov %rsp,%rbp 75 1d jne ffffffff8113c60d <ring_buffer_lock_reserve+0x2d> 65 ff 05 69 e3 ec 7e incl %gs:0x7eece369(%rip) # a960 <__preempt_count> 8b 47 08 mov 0x8(%rdi),%eax 85 c0 test %eax,%eax +---- 74 12 je ffffffff8113c610 <ring_buffer_lock_reserve+0x30> | 65 ff 0d 5b e3 ec 7e decl %gs:0x7eece35b(%rip) # a960 <__preempt_count> | 0f 84 85 00 00 00 je ffffffff8113c690 <ring_buffer_lock_reserve+0xb0> | 31 c0 xor %eax,%eax | 5d pop %rbp | c3 retq | 90 nop +---> 65 44 8b 05 48 e3 ec mov %gs:0x7eece348(%rip),%r8d # a960 <__preempt_count> 7e 41 81 e0 ff ff ff 7f and $0x7fffffff,%r8d b0 08 mov $0x8,%al 65 8b 0d 58 36 ed 7e mov %gs:0x7eed3658(%rip),%ecx # fc80 <current_context> 41 f7 c0 00 ff 1f 00 test $0x1fff00,%r8d 74 1e je ffffffff8113c64f <ring_buffer_lock_reserve+0x6f> 41 f7 c0 00 00 10 00 test $0x100000,%r8d b0 01 mov $0x1,%al 75 13 jne ffffffff8113c64f <ring_buffer_lock_reserve+0x6f> 41 81 e0 00 00 0f 00 and $0xf0000,%r8d 49 83 f8 01 cmp $0x1,%r8 19 c0 sbb %eax,%eax 83 e0 02 and $0x2,%eax 83 c0 02 add $0x2,%eax 85 c8 test %ecx,%eax 75 ab jne ffffffff8113c5fe <ring_buffer_lock_reserve+0x1e> 09 c8 or %ecx,%eax 65 89 05 24 36 ed 7e mov %eax,%gs:0x7eed3624(%rip) # fc80 <current_context> The arrow is the fast path. After adding the unlikely's, the fast path looks a bit better: <ring_buffer_lock_reserve>: 31 c0 xor %eax,%eax 48 83 3d 76 47 bd 00 cmpq $0x1,0xbd4776(%rip) # ffffffff81d10d60 <ring_buffer_flags> 01 55 push %rbp 48 89 e5 mov %rsp,%rbp 75 7b jne ffffffff8113c66b <ring_buffer_lock_reserve+0x8b> 65 ff 05 69 e3 ec 7e incl %gs:0x7eece369(%rip) # a960 <__preempt_count> 8b 47 08 mov 0x8(%rdi),%eax 85 c0 test %eax,%eax 0f 85 9f 00 00 00 jne ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1> 65 8b 0d 57 e3 ec 7e mov %gs:0x7eece357(%rip),%ecx # a960 <__preempt_count> 81 e1 ff ff ff 7f and $0x7fffffff,%ecx b0 08 mov $0x8,%al 65 8b 15 68 36 ed 7e mov %gs:0x7eed3668(%rip),%edx # fc80 <current_context> f7 c1 00 ff 1f 00 test $0x1fff00,%ecx 75 50 jne ffffffff8113c670 <ring_buffer_lock_reserve+0x90> 85 d0 test %edx,%eax 75 7d jne ffffffff8113c6a1 <ring_buffer_lock_reserve+0xc1> 09 d0 or %edx,%eax 65 89 05 53 36 ed 7e mov %eax,%gs:0x7eed3653(%rip) # fc80 <current_context> 65 8b 05 fc da ec 7e mov %gs:0x7eecdafc(%rip),%eax # a130 <cpu_number> 89 c2 mov %eax,%edx Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-05-17tracing: Don't display trigger file for events that can't be enabledChunyu Hu1-2/+7
[ Upstream commit 854145e0a8e9a05f7366d240e2f99d9c1ca6d6dd ] Currently register functions for events will be called through the 'reg' field of event class directly without any check when seting up triggers. Triggers for events that don't support register through debug fs (events under events/ftrace are for trace-cmd to read event format, and most of them don't have a register function except events/ftrace/functionx) can't be enabled at all, and an oops will be hit when setting up trigger for those events, so just not creating them is an easy way to avoid the oops. Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com Cc: stable@vger.kernel.org # 3.14+ Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-04-18tracing: Fix trace_printk() to print when not using bprintk()Steven Rostedt (Red Hat)1-0/+3
[ Upstream commit 3debb0a9ddb16526de8b456491b7db60114f7b5e ] The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()" Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-04-18tracing: Fix crash from reading trace_pipe with sendfileSteven Rostedt (Red Hat)1-1/+4
[ Upstream commit a29054d9478d0435ab01b7544da4f674ab13f533 ] If tracing contains data and the trace_pipe file is read with sendfile(), then it can trigger a NULL pointer dereference and various BUG_ON within the VM code. There's a patch to fix this in the splice_to_pipe() code, but it's also a good idea to not let that happen from trace_pipe either. Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 2.6.30+ Reported-by: Rabin Vincent <rabin.vincent@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-03-10tracing: Fix showing function event in available_eventsSteven Rostedt (Red Hat)1-1/+2
[ Upstream commit d045437a169f899dfb0f6f7ede24cc042543ced9 ] The ftrace:function event is only displayed for parsing the function tracer data. It is not used to enable function tracing, and does not include an "enable" file in its event directory. Originally, this event was kept separate from other events because it did not have a ->reg parameter. But perf added a "reg" parameter for its use which caused issues, because it made the event available to functions where it was not compatible for. Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable" added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event from being enabled by normal trace events. But this commit missed keeping the function event from being displayed by the "available_events" directory, which is used to show what events can be enabled by set_event. One documented way to enable all events is to: cat available_events > set_event But because the function event is displayed in the available_events, this now causes an INVALID error: cat: write error: Invalid argument Reported-by: Chunyu Hu <chuhu@redhat.com> Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable" Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-04tracing: Have branch tracer use recursive field of task structSteven Rostedt (Red Hat)2-7/+11
[ Upstream commit 6224beb12e190ff11f3c7d4bf50cb2922878f600 ] Fengguang Wu's tests triggered a bug in the branch tracer's start up test when CONFIG_DEBUG_PREEMPT set. This was because that config adds some debug logic in the per cpu field, which calls back into the branch tracer. The branch tracer has its own recursive checks, but uses a per cpu variable to implement it. If retrieving the per cpu variable calls back into the branch tracer, you can see how things will break. Instead of using a per cpu variable, use the trace_recursion field of the current task struct. Simply set a bit when entering the branch tracing and clear it when leaving. If the bit is set on entry, just don't do the tracing. There's also the case with lockdep, as the local_irq_save() called before the recursion can also trigger code that can call back into the function. Changing that to a raw_local_irq_save() will protect that as well. This prevents the recursion and the inevitable crash that follows. Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com Cc: stable@vger.kernel.org # 3.10+ Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-08-04tracing/filter: Do not WARN on operand count going below zeroSteven Rostedt (Red Hat)1-1/+3
[ Upstream commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 ] When testing the fix for the trace filter, I could not come up with a scenario where the operand count goes below zero, so I added a WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case that it can happen (although the filter would be wrong). # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter That is, a single operation without any operands will hit the path where the WARN_ON_ONCE() can trigger. Although this is harmless, and the filter is reported as a error. But instead of spitting out a warning to the kernel dmesg, just fail nicely and report it via the proper channels. Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-13tracing: Have filter check for balanced opsSteven Rostedt1-2/+7
[ Upstream commit 2cf30dc180cea808077f003c5116388183e54f9e ] When the following filter is used it causes a warning to trigger: # cd /sys/kernel/debug/tracing # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter -bash: echo: write error: Invalid argument # cat events/ext4/ext4_truncate_exit/filter ((dev==1)blocks==2) ^ parse_error: No error ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990() Modules linked in: bnep lockd grace bluetooth ... CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0 0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea Call Trace: [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0 [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80 [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81159065>] replace_preds+0x3c5/0x990 [<ffffffff811596b2>] create_filter+0x82/0xb0 [<ffffffff81159944>] apply_event_filter+0xd4/0x180 [<ffffffff81152bbf>] event_filter_write+0x8f/0x120 [<ffffffff811db2a8>] __vfs_write+0x28/0xe0 [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0 [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0 [<ffffffff811dc408>] vfs_write+0xb8/0x1b0 [<ffffffff811dc72f>] SyS_write+0x4f/0xb0 [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a ---[ end trace e11028bd95818dcd ]--- Worse yet, reading the error message (the filter again) it says that there was no error, when there clearly was. The issue is that the code that checks the input does not check for balanced ops. That is, having an op between a closed parenthesis and the next token. This would only cause a warning, and fail out before doing any real harm, but it should still not caues a warning, and the error reported should work: # cd /sys/kernel/debug/tracing # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter -bash: echo: write error: Invalid argument # cat events/ext4/ext4_truncate_exit/filter ((dev==1)blocks==2) ^ parse_error: Meaningless filter expression And give no kernel warning. Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: stable@vger.kernel.org # 2.6.31+ Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-05tracing/filter: Do not allow infix to exceed end of stringSteven Rostedt (Red Hat)1-0/+6
[ Upstream commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 ] While debugging a WARN_ON() for filtering, I found that it is possible for the filter string to be referenced after its end. With the filter: # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter The filter_parse() function can call infix_get_op() which calls infix_advance() that updates the infix filter pointers for the cnt and tail without checking if the filter is already at the end, which will put the cnt to zero and the tail beyond the end. The loop then calls infix_next() that has ps->infix.cnt--; return ps->infix.string[ps->infix.tail++]; The cnt will now be below zero, and the tail that is returned is already passed the end of the filter string. So far the allocation of the filter string usually has some buffer that is zeroed out, but if the filter string is of the exact size of the allocated buffer there's no guarantee that the charater after the nul terminating character will be zero. Luckily, only root can write to the filter. Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-07-03ring-buffer-benchmark: Fix the wrong sched_priority of producerWang Long1-1/+1
[ Upstream commit 108029323910c5dd1ef8fa2d10da1ce5fbce6e12 ] The producer should be used producer_fifo as its sched_priority, so correct it. Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17tracing: Handle ftrace_dump() atomic context in graph_trace_open()Rabin Vincent1-2/+6
[ Upstream commit ef99b88b16bee753fa51207abdc58ae660453ec6 ] graph_trace_open() can be called in atomic context from ftrace_dump(). Use GFP_ATOMIC for the memory allocations when that's the case, in order to avoid the following splat. BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0 Backtrace: .. [<8004dc94>] (__might_sleep) from [<801371f4>] (kmem_cache_alloc_trace+0x160/0x238) r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0 [<80137094>] (kmem_cache_alloc_trace) from [<800cbd60>] (graph_trace_open+0x30/0xd0) r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8 r4:810d16f0 [<800cbd30>] (graph_trace_open) from [<800c79c4>] (trace_init_global_iter+0x50/0x9c) r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30 [<800c7974>] (trace_init_global_iter) from [<800c7aa0>] (ftrace_dump+0x90/0x2ec) r4:810d2580 r3:00000000 [<800c7a10>] (ftrace_dump) from [<80414b2c>] (sysrq_ftrace_dump+0x1c/0x20) r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a r4:808d5394 [<80414b10>] (sysrq_ftrace_dump) from [<800169b8>] (return_to_handler+0x0/0x18) [<80415498>] (__handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210 [<80415668>] (handle_sysrq) from [<800169b8>] (return_to_handler+0x0/0x18) [<8042a760>] (pl011_int) from [<800169b8>] (return_to_handler+0x0/0x18) r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60 r4:8454fe00 [<8007782c>] (handle_irq_event_percpu) from [<80077b44>] (handle_irq_event+0x4c/0x6c) r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60 r4:84f01e00 [<80077af8>] (handle_irq_event) from [<8007aa28>] (handle_fasteoi_irq+0xf0/0x1ac) r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000 [<8007a938>] (handle_fasteoi_irq) from [<80076dc0>] (generic_handle_irq+0x3c/0x4c) r6:00000026 r5:00000000 r4:00000026 r3:8007a938 [<80076d84>] (generic_handle_irq) from [<80077128>] (__handle_domain_irq+0x8c/0xfc) r4:808c1e38 r3:0000002e [<8007709c>] (__handle_domain_irq) from [<800087b8>] (gic_handle_irq+0x34/0x6c) r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015 r4:8880210c r3:808c7ef0 [<80008784>] (gic_handle_irq) from [<80014044>] (__irq_svc+0x44/0x7c) Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ring-buffer: Replace this_cpu_*() with __this_cpu_*()Steven Rostedt1-6/+5
[ Upstream commit 80a9b64e2c156b6523e7a01f2ba6e5d86e722814 ] It has come to my attention that this_cpu_read/write are horrible on architectures other than x86. Worse yet, they actually disable preemption or interrupts! This caused some unexpected tracing results on ARM. 101.356868: preempt_count_add <-ring_buffer_lock_reserve 101.356870: preempt_count_sub <-ring_buffer_lock_reserve The ring_buffer_lock_reserve has recursion protection that requires accessing a per cpu variable. But since preempt_disable() is traced, it too got traced while accessing the variable that is suppose to prevent recursion like this. The generic version of this_cpu_read() and write() are: #define this_cpu_generic_read(pcp) \ ({ typeof(pcp) ret__; \ preempt_disable(); \ ret__ = *this_cpu_ptr(&(pcp)); \ preempt_enable(); \ ret__; \ }) #define this_cpu_generic_to_op(pcp, val, op) \ do { \ unsigned long flags; \ raw_local_irq_save(flags); \ *__this_cpu_ptr(&(pcp)) op val; \ raw_local_irq_restore(flags); \ } while (0) Which is unacceptable for locations that know they are within preempt disabled or interrupt disabled locations. Paul McKenney stated that __this_cpu_() versions produce much better code on other architectures than this_cpu_() does, if we know that the call is done in a preempt disabled location. I also changed the recursive_unlock() to use two local variables instead of accessing the per_cpu variable twice. Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home Cc: stable@vger.kernel.org Acked-by: Christoph Lameter <cl@linux.com> Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28ftrace: Fix ftrace enable ordering of sysctl ftrace_enabledSteven Rostedt (Red Hat)1-3/+3
[ Upstream commit 524a38682573b2e15ab6317ccfe50280441514be ] Some archs (specifically PowerPC), are sensitive with the ordering of the enabling of the calls to function tracing and setting of the function to use to be traced. That is, update_ftrace_function() sets what function the ftrace_caller trampoline should call. Some archs require this to be set before calling ftrace_run_update_code(). Another bug was discovered, that ftrace_startup_sysctl() called ftrace_run_update_code() directly. If the function the ftrace_caller trampoline changes, then it will not be updated. Instead a call to ftrace_startup_enable() should be called because it tests to see if the callback changed since the code was disabled, and will tell the arch to update appropriately. Most archs do not need this notification, but PowerPC does. The problem could be seen by the following commands: # echo 0 > /proc/sys/kernel/ftrace_enabled # echo function > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # cat /sys/kernel/debug/tracing/trace The trace will show that function tracing was not active. Cc: stable@vger.kernel.org # 2.6.27+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctlPratyush Anand1-6/+22
[ Upstream commit 1619dc3f8f555ee1cdd3c75db3885d5715442b12 ] When ftrace is enabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when ftrace is disabled globally through the proc interface, we must check if ftrace_graph_active is set. If it is set, then we should also pass the FTRACE_STOP_FUNC_RET command to ftrace_run_update_code(). Consider the following situation. # echo 0 > /proc/sys/kernel/ftrace_enabled After this ftrace_enabled = 0. # echo function_graph > /sys/kernel/debug/tracing/current_tracer Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never called. # echo 1 > /proc/sys/kernel/ftrace_enabled Now ftrace_enabled will be set to true, but still ftrace_