Age | Commit message (Collapse) | Author | Files | Lines |
|
Update the function graph code to modify the function addresses for a
previous boot buffer so that it matches the current kallsyms (note this
does not handle module addresses, yet).
After a reboot, instead of seeing:
# trace-cmd show -B boot_mapped | tail -n30
swapper/0-1 [000] d..2. 56.286470: 0) 0.481 us | 0xffffffff925da5c4();
swapper/0-1 [000] d.... 56.286471: 0) 4.065 us | }
swapper/0-1 [000] d.... 56.286471: 0) 4.920 us | }
swapper/0-1 [000] d..1. 56.286472: 0) | 0xffffffff92536254() {
swapper/0-1 [000] d..1. 56.286472: 0) + 28.974 us | 0xffffffff92534e30();
swapper/0-1 [000] d.... 56.286516: 0) + 43.881 us | }
swapper/0-1 [000] d..1. 56.286517: 0) | 0xffffffff925136c4() {
swapper/0-1 [000] d..1. 56.286518: 0) | 0xffffffff92514a14() {
swapper/0-1 [000] d..1. 56.286518: 0) 6.003 us | 0xffffffff92514200();
swapper/0-1 [000] d.... 56.286529: 0) + 11.510 us | }
swapper/0-1 [000] d.... 56.286529: 0) + 12.895 us | }
swapper/0-1 [000] d.... 56.286530: 0) ! 382.884 us | }
swapper/0-1 [000] d..1. 56.286530: 0) | 0xffffffff92536444() {
swapper/0-1 [000] d..1. 56.286531: 0) | 0xffffffff92536254() {
swapper/0-1 [000] d..1. 56.286531: 0) + 26.335 us | 0xffffffff92534e30();
swapper/0-1 [000] d.... 56.286560: 0) + 29.511 us | }
swapper/0-1 [000] d.... 56.286561: 0) + 30.452 us | }
swapper/0-1 [000] d..1. 56.286562: 0) | 0xffffffff9253c014() {
swapper/0-1 [000] d..1. 56.286562: 0) | 0xffffffff9253bed4() {
swapper/0-1 [000] d..1. 56.286563: 0) + 13.465 us | 0xffffffff92536684();
swapper/0-1 [000] d.... 56.286577: 0) + 14.651 us | }
swapper/0-1 [000] d.... 56.286577: 0) + 15.821 us | }
swapper/0-1 [000] d..1. 56.286578: 0) 0.667 us | 0xffffffff92547074();
swapper/0-1 [000] d..1. 56.286579: 0) 0.453 us | 0xffffffff924f35c4();
swapper/0-1 [000] d.... 56.286580: 0) # 3906.348 us | }
swapper/0-1 [000] d..1. 56.286581: 0) | 0xffffffff92531a14() {
swapper/0-1 [000] d..1. 56.286581: 0) 0.518 us | 0xffffffff92505cb4();
swapper/0-1 [000] d..1. 56.286595: 0) | 0xffffffff92db83c4() {
swapper/0-1 [000] d..1. 56.286596: 0) | 0xffffffff92dec2e4() {
swapper/0-1 [000] d..1. 56.286597: 0) | 0xffffffff92db5304() {
It now shows:
# trace-cmd show -B boot_mapped | tail -n30
swapper/0-1 [000] d..2. 363.079099: 0) 0.483 us | preempt_count_sub();
swapper/0-1 [000] d.... 363.079100: 0) 4.112 us | }
swapper/0-1 [000] d.... 363.079101: 0) 4.979 us | }
swapper/0-1 [000] d..1. 363.079101: 0) | disable_local_APIC() {
swapper/0-1 [000] d..1. 363.079102: 0) + 29.153 us | clear_local_APIC.part.0();
swapper/0-1 [000] d.... 363.079148: 0) + 46.517 us | }
swapper/0-1 [000] d..1. 363.079149: 0) | mcheck_cpu_clear() {
swapper/0-1 [000] d..1. 363.079149: 0) | mce_intel_feature_clear() {
swapper/0-1 [000] d..1. 363.079150: 0) 5.871 us | lmce_supported();
swapper/0-1 [000] d.... 363.079161: 0) + 11.340 us | }
swapper/0-1 [000] d.... 363.079161: 0) + 12.638 us | }
swapper/0-1 [000] d.... 363.079162: 0) ! 383.518 us | }
swapper/0-1 [000] d..1. 363.079162: 0) | lapic_shutdown() {
swapper/0-1 [000] d..1. 363.079163: 0) | disable_local_APIC() {
swapper/0-1 [000] d..1. 363.079163: 0) + 26.144 us | clear_local_APIC.part.0();
swapper/0-1 [000] d.... 363.079192: 0) + 29.424 us | }
swapper/0-1 [000] d.... 363.079192: 0) + 30.376 us | }
swapper/0-1 [000] d..1. 363.079193: 0) | restore_boot_irq_mode() {
swapper/0-1 [000] d..1. 363.079194: 0) | native_restore_boot_irq_mode() {
swapper/0-1 [000] d..1. 363.079194: 0) + 13.863 us | disconnect_bsp_APIC();
swapper/0-1 [000] d.... 363.079209: 0) + 14.933 us | }
swapper/0-1 [000] d.... 363.079209: 0) + 16.009 us | }
swapper/0-1 [000] d..1. 363.079210: 0) 0.694 us | hpet_disable();
swapper/0-1 [000] d..1. 363.079211: 0) 0.511 us | iommu_shutdown_noop();
swapper/0-1 [000] d.... 363.079212: 0) # 3980.260 us | }
swapper/0-1 [000] d..1. 363.079212: 0) | native_machine_emergency_restart() {
swapper/0-1 [000] d..1. 363.079213: 0) 0.495 us | tboot_shutdown();
swapper/0-1 [000] d..1. 363.079230: 0) | acpi_reboot() {
swapper/0-1 [000] d..1. 363.079231: 0) | acpi_reset() {
swapper/0-1 [000] d..1. 363.079232: 0) | acpi_os_write_port() {
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20240813171257.478901820@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use
that instead.
Link: https://lore.kernel.org/linux-trace-kernel/171509107907.162236.6564679266777519065.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.796709456@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The use of the task->trace_recursion for the logic used for the
set_graph_function was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.
Link: https://lore.kernel.org/linux-trace-kernel/171509105520.162236.10339831553995971290.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.472955399@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow for instances to have their own ftrace_ops part of the fgraph_ops
that makes the funtion_graph tracer filter on the set_ftrace_filter file
of the instance and not the top instance.
This uses the new ftrace_startup_subops(), by using graph_ops as the
"manager ops" that defines the callback function and adds the functions
defined by the filters of the ops for each trace instance. The callback
defined by the manager ops will call the registered fgraph ops that were
added to the fgraph_array.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509102088.162236.15758883237657317789.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.832946261@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509099743.162236.1699959255446248163.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.190630762@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.
Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509098588.162236.4787930115997357578.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.035147698@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Analyzing system call failures with the function_graph tracer can be a
time-consuming process, particularly when locating the kernel function
that first returns an error in the trace logs. This change aims to
simplify the process by recording the function return value to the
'retval' member of 'ftrace_graph_ret' and printing it when outputting
the trace log.
We have introduced new trace options: funcgraph-retval and
funcgraph-retval-hex. The former controls whether to display the return
value, while the latter controls the display format.
Please note that even if a function's return type is void, a return
value will still be printed. You can simply ignore it.
This patch only establishes the fundamental infrastructure. Subsequent
patches will make this feature available on some commonly used processor
architectures.
Here is an example:
I attempted to attach the demo process to a cpu cgroup, but it failed:
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument
The strace logs indicate that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...
To capture trace logs during a write system call, use the following
commands:
cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > options/funcgraph-retval
echo 0 > options/funcgraph-retval-hex
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
cat trace > ~/trace.log
To locate the root cause, search for error code -22 directly in the file
trace.log and identify the first function that returned -22. Once you
have identified this function, examine its code to determine the root
cause.
For example, in the trace log below, cpu_cgroup_can_attach
returned -22 first, so we can focus our analysis on this function to
identify the root cause.
...
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
...
Link: https://lkml.kernel.org/r/1fc502712c981e0e6742185ba242992170ac9da8.1680954589.git.pengdonglin@sangfor.com.cn
Tested-by: Florian Kauer <florian.kauer@linutronix.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Replace the obsolete and ambiguos macro in_irq() with new
macro in_hardirq().
Link: https://lkml.kernel.org/r/20210930000342.6016-1-changbin.du@gmail.com
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When building the files in the tracefs file system, do not by default set
any permissions for OTH (other). This will make it easier for admins who
want to define a group for accessing tracefs and not having to first
disable all the permission bits for "other" in the file system.
As tracing can leak sensitive information, it should never by default
allowing all users access. An admin can still set the permission bits for
others to have access, which may be useful for creating a honeypot and
seeing who takes advantage of it and roots the machine.
Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix ~59 single-word typos in the tracing code comments, and fix
the grammar in a handful of places.
Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com
Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The state of the interrupts (irqflags) and the preemption counter are
both passed down to tracing_generic_entry_update(). Only one bit of
irqflags is actually required: The on/off state. The complete 32bit
of the preemption counter isn't needed. Just whether of the upper bits
(softirq, hardirq and NMI) are set and the preemption depth is needed.
The irqflags and the preemption counter could be evaluated early and the
information stored in an integer `trace_ctx'.
tracing_generic_entry_update() would use the upper bits as the
TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth
(considering that one must be substracted from the counter in one
special cases).
The actual preemption value is not used except for the tracing record.
The `irqflags' variable is mostly used only for the tracing record. An
exception here is for instance wakeup_tracer_call() or
probe_wakeup_sched_switch() which explicilty disable interrupts and use
that `irqflags' to save (and restore) the IRQ state and to record the
state.
Struct trace_event_buffer has also the `pc' and flags' members which can
be replaced with `trace_ctx' since their actual value is not used
outside of trace recording.
This will reduce tracing_generic_entry_update() to simply assign values
to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved
to _tracing_gen_ctx_flags() which replaces preempt_count() and
local_save_flags() invocations.
As an example, ftrace_syscall_enter() may invoke:
- trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update()
- event_trigger_unlock_commit()
-> ftrace_trace_stack() -> … -> tracing_generic_entry_update()
-> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()
In this case the TRACE_FLAG_* bits were evaluated three times. By using
the `trace_ctx' they are evaluated once and assigned three times.
A build with all tracers enabled on x86-64 with and without the patch:
text data bss dec hex filename
21970669 17084168 7639260 46694097 2c87ed1 vmlinux.old
21970293 17084168 7639260 46693721 2c87d59 vmlinux.new
text shrank by 379 bytes, data remained constant.
Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Inspecting the data structures of the function graph tracer, I found that
the overrun value is unsigned long, which is 8 bytes on a 64 bit machine,
and not only that, the depth is an int (4 bytes). The overrun can be simply
an unsigned int (4 bytes) and pack the ftrace_graph_ret structure better.
The depth is moved up next to the func, as it is used more often with func,
and improves cache locality.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
pointer
Current tracing_init_dentry() return a d_entry pointer, while is not
necessary. This function returns NULL on success or error on failure,
which means there is no valid d_entry pointer return.
Let's return 0 on success and negative value for error.
Link: https://lkml.kernel.org/r/20200712011036.70948-5-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When the ring buffer was first created, the iterator followed the normal
producer/consumer operations where it had both a peek() operation, that just
returned the event at the current location, and a read(), that would return
the event at the current location and also increment the iterator such that
the next peek() or read() will return the next event.
The only use of the ring_buffer_read() is currently to move the iterator to
the next location and nothing now actually reads the event it returns.
Rename this function to its actual use case to ring_buffer_iter_advance(),
which also adds the "iter" part to the name, which is more meaningful. As
the timestamp returned by ring_buffer_read() was never used, there's no
reason that this new version should bother having returning it. It will also
become a void function.
Link: http://lkml.kernel.org/r/20200317213416.018928618@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As there's two struct ring_buffers in the kernel, it causes some confusion.
The other one being the perf ring buffer. It was agreed upon that as neither
of the ring buffers are generic enough to be used globally, they should be
renamed as:
perf's ring_buffer -> perf_buffer
ftrace's ring_buffer -> trace_buffer
This implements the changes to the ring buffer that ftrace uses.
Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As we are working to remove the generic "ring_buffer" name that is used by
both tracing and perf, the ring_buffer name for tracing will be renamed to
trace_buffer, and perf's ring buffer will be renamed to perf_buffer.
As there already exists a trace_buffer that is used by the trace_arrays, it
needs to be first renamed to array_buffer.
Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
We already have tested it before. The second one should be removed.
With this change, the performance should have little improvement.
Link: http://lkml.kernel.org/r/20190730140850.7927-1-changbin.du@gmail.com
Cc: stable@vger.kernel.org
Fixes: 9cd2992f2d6c ("fgraph: Have set_graph_notrace only affect function_graph tracer")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Don't mix context flags with function duration info.
Instead of this:
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+
# --------------------------------------------------------------------
# latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 0) <idle>-0 | dNh5 | /* 0:120:R + [000] 11: 0:R migration/0 */
2 us | 0) <idle>-0 | dNh5 0.000 us | (null)();
4 us | 0) <idle>-0 | dNh4 | _raw_spin_unlock() {
4 us | 0) <idle>-0 | dNh4 0.304 us | preempt_count_sub();
5 us | 0) <idle>-0 | dNh3 1.063 us | }
5 us | 0) <idle>-0 | dNh3 0.266 us | ttwu_stat();
6 us | 0) <idle>-0 | dNh3 | _raw_spin_unlock_irqrestore() {
6 us | 0) <idle>-0 | dNh3 0.273 us | preempt_count_sub();
6 us | 0) <idle>-0 | dNh2 0.818 us | }
Show this:
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 4.20.0+
# --------------------------------------------------------------------
# latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0)
# -----------------
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 0) <idle>-0 | dNs. | | /* 0:120:R + [000] 339:100:R kworker/0:1H */
3 us | 0) <idle>-0 | dNs. | 0.000 us | (null)();
67 us | 0) <idle>-0 | dNs. | 0.721 us | ttwu_stat();
69 us | 0) <idle>-0 | dNs. | 0.607 us | _raw_spin_unlock_irqrestore();
71 us | 0) <idle>-0 | .Ns. | 0.598 us | _raw_spin_lock_irq();
72 us | 0) <idle>-0 | .Ns. | 0.584 us | _raw_spin_lock_irq();
73 us | 0) <idle>-0 | dNs. | + 11.118 us | __next_timer_interrupt();
75 us | 0) <idle>-0 | dNs. | | call_timer_fn() {
76 us | 0) <idle>-0 | dNs. | | delayed_work_timer_fn() {
76 us | 0) <idle>-0 | dNs. | | __queue_work() {
...
Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When function_graph is used for latency tracers, relative timestamp
is more straightforward than absolute timestamp as function trace
does. This change adds relative timestamp support to function_graph
and applies to latency tracers (wakeup and irqsoff).
Instead of:
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 5.0.0-rc1-test
# --------------------------------------------------------------------
# latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __schedule
# => ended at: _raw_spin_unlock_irq
#
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
124.974306 | 2) systemd-693 | d..1 0.000 us | __schedule();
124.974307 | 2) systemd-693 | d..1 | rcu_note_context_switch() {
124.974308 | 2) systemd-693 | d..1 0.487 us | rcu_preempt_deferred_qs();
124.974309 | 2) systemd-693 | d..1 0.451 us | rcu_qs();
124.974310 | 2) systemd-693 | d..1 2.301 us | }
[..]
124.974826 | 2) <idle>-0 | d..2 | finish_task_switch() {
124.974826 | 2) <idle>-0 | d..2 | _raw_spin_unlock_irq() {
124.974827 | 2) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq();
124.974828 | 2) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on();
<idle>-0 2d..2 552us : <stack trace>
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64
Show:
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+
# --------------------------------------------------------------------
# latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
# -----------------
# | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __schedule
# => ended at: _raw_spin_unlock_irq
#
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| /
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 7) sshd-1704 | d..1 0.000 us | __schedule();
1 us | 7) sshd-1704 | d..1 | rcu_note_context_switch() {
1 us | 7) sshd-1704 | d..1 0.611 us | rcu_preempt_deferred_qs();
2 us | 7) sshd-1704 | d..1 0.484 us | rcu_qs();
3 us | 7) sshd-1704 | d..1 2.599 us | }
[..]
509 us | 7) <idle>-0 | d..2 | finish_task_switch() {
510 us | 7) <idle>-0 | d..2 | _raw_spin_unlock_irq() {
510 us | 7) <idle>-0 | d..2 0.000 us | _raw_spin_unlock_irq();
512 us | 7) <idle>-0 | d..2 0.000 us | tracer_hardirqs_on();
<idle>-0 7d..2 543us : <stack trace>
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64
Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Move the function function_graph_ret_addr() to fgraph.c, as the management
of the curr_ret_stack is going to change, and all the accesses to ret_stack
needs to be done in fgraph.c.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.
This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.
The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
configured
When the function profiler is not configured, the "graph_time" option is
meaningless, as the function profiler is the only thing that makes use of
it. Do not expose it if the profiler is not configured.
Link: http://lkml.kernel.org/r/20181123061133.GA195223@google.com
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The curr_ret_stack is no longer set to a negative value when a function is
not to be traced by the function graph tracer. Remove the usage of
FTRACE_NOTRACE_DEPTH, as it is no longer needed.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In order to make the function graph infrastructure more generic, there can
not be code specific for the function_graph tracer in the generic code. This
includes the set_graph_notrace logic, that stops all graph calls when a
function in the set_graph_notrace is hit.
By using the trace_recursion mask, we can use a bit in the current
task_struct to implement the notrace code, and move the logic out of
fgraph.c and into trace_functions_graph.c and keeps it affecting only the
tracer and not all call graph callbacks.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As the function graph infrastructure can be used by thing other than
tracing, moving the code to its own file out of the trace_functions_graph.c
code makes more sense.
The fgraph.c file will only contain the infrastructure required to hook into
functions and their return code.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage
depth instead of curr_ret_stack") removed a parameter from the call
ftrace_push_return_trace() that made it so that the entire call was under 80
characters, but it did not remove the line break. There's no reason to break
that line up, so make it a single line.
Link: http://lkml.kernel.org/r/20181122100322.GN2131@hirez.programming.kicks-ass.net
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.
The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.
The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.
Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The function graph profiler uses the ret_stack to store the "subtime" and
reuse it by nested functions and also on the return. But the current logic
has the profiler callback called before the ret_stack is updated, and it is
just modifying the ret_stack that will later be allocated (it's just lucky
that the "subtime" is not touched when it is allocated).
This could also cause a crash if we are at the end of the ret_stack when
this happens.
By reversing the order of the allocating the ret_stack and then calling the
callbacks attached to a function being traced, the ret_stack entry is no
longer used before it is allocated.
Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In the past, curr_ret_stack had two functions. One was to denote the depth
of the call graph, the other is to keep track of where on the ret_stack the
data is used. Although they may be slightly related, there are two cases
where they need to be used differently.
The one case is that it keeps the ret_stack data from being corrupted by an
interrupt coming in and overwriting the data still in use. The other is just
to know where the depth of the stack currently is.
The function profiler uses the ret_stack to save a "subtime" variable that
is part of the data on the ret_stack. If curr_ret_stack is modified too
early, then this variable can be corrupted.
The "max_depth" option, when set to 1, will record the first functions going
into the kernel. To see all top functions (when dealing with timings), the
depth variable needs to be lowered before calling the return hook. But by
lowering the curr_ret_stack, it makes the data on the ret_stack still being
used by the return hook susceptible to being overwritten.
Now that there's two variables to handle both cases (curr_ret_depth), we can
move them to the locations where they can handle both cases.
Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.
Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).
Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.
The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.
To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.
Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Currently all the architectures do basically the same thing in preparing the
function graph tracer on entry to a function. This code can be pulled into a
generic location and then this will allow the function graph tracer to be
fixed, as well as extended.
Create a new function graph helper function_graph_enter() that will call the
hook function (ftrace_graph_entry) and the shadow stack operation
(ftrace_push_return_trace), and remove the need of the architecture code to
manage the shadow stack.
This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.
Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The function_graph tracer does not show the interrupt return marker for the
leaf entry. On leaf entries, we see an unbalanced interrupt marker (the
interrupt was entered, but nevern left).
Before:
1) | SyS_write() {
1) | __fdget_pos() {
1) 0.061 us | __fget_light();
1) 0.289 us | }
1) | vfs_write() {
1) 0.049 us | rw_verify_area();
1) + 15.424 us | __vfs_write();
1) ==========> |
1) 6.003 us | smp_apic_timer_interrupt();
1) 0.055 us | __fsnotify_parent();
1) 0.073 us | fsnotify();
1) + 23.665 us | }
1) + 24.501 us | }
After:
0) | SyS_write() {
0) | __fdget_pos() {
0) 0.052 us | __fget_light();
0) 0.328 us | }
0) | vfs_write() {
0) 0.057 us | rw_verify_area();
0) | __vfs_write() {
0) ==========> |
0) 8.548 us | smp_apic_timer_interrupt();
0) <========== |
0) + 36.507 us | } /* __vfs_write */
0) 0.049 us | __fsnotify_parent();
0) 0.066 us | fsnotify();
0) + 50.064 us | }
0) + 50.952 us | }
Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel.com
Cc: stable@vger.kernel.org
Fixes: f8b755ac8e0cc ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files whi |