summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
AgeCommit message (Collapse)AuthorFilesLines
2025-03-14perf script: Update brstack syntax documentationYujie Liu1-7/+16
The following commits added new fields/flags to the branch stack field list: commit 1f48989cdc7d ("perf script: Output branch sample type") commit 6ade6c646035 ("perf script: Show branch speculation info") commit 1e66dcff7b9b ("perf script: Add not taken event for branch stack") Update brstack syntax documentation to be consistent with the latest branch stack field list. Improve the descriptions to help users interpret the fields accurately. Signed-off-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20250312072329.419020-1-yujie.liu@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Add --code-with-type option.Namhyung Kim1-0/+4
This option is to show data type info in the regular (code) annotation. It tries to find data type for each (memory) instruction in the function. It'd be useful to see function-level memory access pattern and also to debug the data type profiling result. The output would be added at the end of the line and have "# data-type:" prefix. For now, it only works with --stdio mode for simplicity. I can work on enabling it for TUI later. $ perf annotate --stdio --code-with-type Percent | Source code & Disassembly of vmlinux for cpu/mem-loads/ppk (253 samples, percent: local period) --------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81baa000 <check_preemption_disabled>: 0.00 : ffffffff81baa000: pushq %r12 # data-type: (stack operation) 0.00 : ffffffff81baa002: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81baa003: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff81baa004: subq $0x8, %rsp 18.00 : ffffffff81baa008: movl %gs:0x7e48893d(%rip), %ebx # 0x3294c <pcpu_hot+0xc> # data-type: struct pcpu_hot +0xc (cpu_number) 12.58 : ffffffff81baa00f: movl %gs:0x7e488932(%rip), %eax # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa016: testl $0x7fffffff, %eax 0.00 : ffffffff81baa01b: je 0xffffffff81baa02c <check_preemption_disabled+0x2c> 0.00 : ffffffff81baa01d: addq $0x8, %rsp 0.00 : ffffffff81baa021: movl %ebx, %eax 14.19 : ffffffff81baa023: popq %rbx # data-type: (stack operation) 18.86 : ffffffff81baa024: popq %rbp # data-type: (stack operation) 12.10 : ffffffff81baa025: popq %r12 # data-type: (stack operation) 17.78 : ffffffff81baa027: jmp 0xffffffff81bc1170 <__x86_return_thunk> 6.49 : ffffffff81baa02c: callq *0xc9139e(%rip) # 0xffffffff8283b3d0 <pv_ops+0xf0> # data-type: (stack operation) 0.00 : ffffffff81baa032: testb $0x2, %ah 0.00 : ffffffff81baa035: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa037: movq %rdi, %rbp 0.00 : ffffffff81baa03a: movq %gs:0x32940, %rax # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa043: testb $0x4, 0x2f(%rax) # data-type: struct task_struct +0x2f (flags) 0.00 : ffffffff81baa047: je 0xffffffff81baa052 <check_preemption_disabled+0x52> 0.00 : ffffffff81baa049: cmpl $0x1, 0x3d0(%rax) # data-type: struct task_struct +0x3d0 (nr_cpus_allowed) 0.00 : ffffffff81baa050: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa052: movq %gs:0x32940, %r12 # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa05b: cmpw $0x0, 0x7f0(%r12) # data-type: struct task_struct +0x7f0 (migration_disabled) 0.00 : ffffffff81baa065: movq %rsi, (%rsp) 0.00 : ffffffff81baa069: jne 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa06b: movl 0xe8dd13(%rip), %eax # 0xffffffff82a37d84 <system_state> # data-type: enum system_states +0 0.00 : ffffffff81baa071: testl %eax, %eax 0.00 : ffffffff81baa073: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa075: incl %gs:0x7e4888cc(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa07c: movq $-0x7e14a100, %rdi 0.00 : ffffffff81baa083: callq 0xffffffff81148c40 <__printk_ratelimit> # data-type: (stack operation) 0.00 : ffffffff81baa088: testl %eax, %eax 0.00 : ffffffff81baa08a: je 0xffffffff81baa0d5 <check_preemption_disabled+0xd5> 0.00 : ffffffff81baa08c: movl 0x958(%r12), %r9d # data-type: struct task_struct +0x958 (pid) 0.00 : ffffffff81baa094: movq (%rsp), %rdx # data-type: char* +0 0.00 : ffffffff81baa098: movq %rbp, %rsi 0.00 : ffffffff81baa09b: leaq 0xb88(%r12), %r8 # data-type: struct task_struct +0xb88 (comm) 0.00 : ffffffff81baa0a3: movl %gs:0x7e48889e(%rip), %ecx # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0aa: andl $0x7fffffff, %ecx 0.00 : ffffffff81baa0b0: movq $-0x7dd3cdf0, %rdi 0.00 : ffffffff81baa0b7: subl $0x1, %ecx 0.00 : ffffffff81baa0ba: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0bf: movq 0x20(%rsp), %rsi 0.00 : ffffffff81baa0c4: movq $-0x7ddb8c7e, %rdi 0.00 : ffffffff81baa0cb: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0d0: callq 0xffffffff81b7ab60 <dump_stack> # data-type: (stack operation) 0.00 : ffffffff81baa0d5: decl %gs:0x7e48886c(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0dc: jmp 0xffffffff81baa01d <check_preemption_disabled+0x1d> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Report owner stack in usermodeChun-Tse Shao1-2/+3
This patch parses `owner_lock_stat` into a RB tree, enabling ordered reporting of owner lock statistics with stack traces. It also updates the documentation for the `-o` option in contention mode, decouples `-o` from `-t`, and issues a warning to inform users about the new behavior of `-ov`. Example output: $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe ... contended total wait max wait avg wait type caller 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 0xffffffffac6318e7 pipe_read+0x57 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 0xffffffffac631ee0 pipe_write+0x50 0xffffffffac6241db vfs_write+0x3bb 0xffffffffac6244ab ksys_write+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac69249b do_epoll_pwait.part.0+0xb 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 === owner stack trace === 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 0xffffffffac631bd8 pipe_read+0x348 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 ... Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf list: Document -v option deduplication featureJames Clark1-1/+1
-v disables deduplication of similarly suffixed PMUs so add it to the help and doc strings. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-19perf tools: Fix up some comments and code to properly use the event_source busGreg Kroah-Hartman2-7/+7
In sysfs, the perf events are all located in /sys/bus/event_source/devices/ but some places ended up hard-coding the location to be at the root of /sys/devices/ which could be very risky as you do not exactly know what type of device you are accessing in sysfs at that location. So fix this all up by properly pointing everything at the bus device list instead of the root of the sysfs devices/ tree. Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/2025021955-implant-excavator-179d@gregkh Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Add latency and parallelism profiling documentationDmitry Vyukov4-19/+124
Describe latency and parallelism profiling, related flags, and differences with the currently only supported CPU-consumption-centric profiling. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/a13f270ed33cedb03ce9ebf9ddbd064854ca0f19.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Add --latency flagDmitry Vyukov2-0/+9
Add record/report --latency flag that allows to capture and show latency-centric profiles rather than the default CPU-consumption-centric profiles. For latency profiles record captures context switch events, and report shows Latency as the first column. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-12perf trace: Add --summary-mode optionNamhyung Kim1-0/+4
The --summary-mode option will select how to show the syscall summary at the end. By default, it'll show the summary for each thread and it's the same as if --summary-mode=thread is passed. The other option is to show total summary, which is --summary-mode=total. I'd like to have this instead of a separate option like --total-summary because we may want to add a new summary mode (by cgroup) later. $ sudo ./perf trace -as --summary-mode=total sleep 1 Summary of events: total, 21580 events syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ epoll_wait 1305 0 14716.712 0.000 11.277 551.529 8.87% futex 1256 89 13331.197 0.000 10.614 733.722 15.49% poll 669 0 6806.618 0.000 10.174 459.316 11.77% ppoll 220 0 3968.797 0.000 18.040 516.775 25.35% clock_nanosleep 1 0 1000.027 1000.027 1000.027 1000.027 0.00% epoll_pwait 21 0 592.783 0.000 28.228 522.293 88.29% nanosleep 16 0 60.515 0.000 3.782 10.123 33.33% ioctl 510 0 4.284 0.001 0.008 0.182 8.84% recvmsg 1434 775 3.497 0.001 0.002 0.174 6.37% write 1393 0 2.854 0.001 0.002 0.017 1.79% read 1063 100 2.236 0.000 0.002 0.083 5.11% ... Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250205205443.1986408-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-24Merge tag 'perf-tools-for-v6.14-2025-01-21' of ↵Linus Torvalds8-233/+419
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf-tools updates from Namhyung Kim: "There are a lot of changes in the perf tools in this cycle. build: - Use generic syscall table to generate syscall numbers on supported archs - This also enables to get rid of libaudit which was used for syscall numbers - Remove python2 support as it's deprecated for years - Fix issues on static build with libzstd perf record: - Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing: # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires kernel support (which was added in v6.13) perf lock: - 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires kernel/BPF support (which was added in v6.13) perf ftrace: - 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets: # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION | COUNT | GRAPH | 0 - 200 ns | 186 | ### | 200 - 300 ns | 256 | ##### | 300 - 400 ns | 364 | ####### | 400 - 500 ns | 223 | #### | 500 - 600 ns | 111 | ## | 600 - 700 ns | 41 | | 700 - 800 ns | 141 | ## | 800 - ... ns | 169 | ### | # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 - As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily - 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold perf script: - Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others: - 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky - Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) - Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat' - Update ARM Neoverse N2/V2 JSON events and metrics" * tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits) perf test: Update event_groups test to use instructions perf bench: Fix undefined behavior in cmpworker() perf annotate: Prefer passing evsel to evsel->core.idx perf lock: Rename fields in lock_type_table perf lock: Add percpu-rwsem for type filter perf lock: Fix parse_lock_type which only retrieve one lock flag perf lock: Fix return code for functions in __cmd_contention perf hist: Fix width calculation in hpp__fmt() perf hist: Fix bogus profiles when filters are enabled perf hist: Deduplicate cmp/sort/collapse code perf test: Improve verbose documentation perf test: Add a runs-per-test flag perf test: Fix parallel/sequential option documentation perf test: Send list output to stdout rather than stderr perf test: Rename functions and variables for better clarity perf tools: Expose quiet/verbose variables in Makefile.perf perf config: Add a function to set one variable in .perfconfig perf test perftool_testsuite: Return correct value for skipping perf test perftool_testsuite: Add missing description perf test record+probe_libc_inet_pton: Make test resilient ...
2025-01-17perf lock: Add percpu-rwsem for type filterChun-Tse Shao1-2/+2
percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: 4f701063bfa2 ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Improve verbose documentationIan Rogers1-1/+4
Add a little more detail on the output expectations for each verbose level. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Add a runs-per-test flagIan Rogers1-0/+5
To detect flakes it is useful to run tests more than once. Add a runs-per-test flag that will run each test multiple times. Example output: ``` $ perf test -r 3 lbr -v 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok ``` Update the documentation for the runs-per-test option. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Fix parallel/sequential option documentationIan Rogers1-7/+3
The parallel option was removed in commit 94d1a913bdc4 ("perf test: Make parallel testing the default"). Update the sequential documentation to reflect it isn't the default except for "exclusive" tests. Fixes: 94d1a913bdc4 ("perf test: Make parallel testing the default") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-10perf docs: arm_spe: Document new discard modeJames Clark1-0/+26
Document the flag along with PMU events to hint what it's used for and give an example with other useful options to get minimal output. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2025-01-10perf tools: Remove dependency on libauditCharlie Jenkins1-2/+0
All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is no longer needed. With the removal of the flag, the related GENERIC_SYSCALL_TABLE can also be removed. libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT was not defined, so libaudit is also no longer needed for any architecture. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-01-08perf ftrace profile: Add --graph-opts optionNamhyung Kim1-0/+8
Like trace subcommand, it should be able to pass some options to control the tracing behavior for the function graph tracer. But some options are limited in order to maintain the internal behavior. For example, it can limit the function call depth like below: # perf ftrace profile --graph-opts depth=5 -- myprog Committer testing: root@number:~# perf ftrace profile --graph-opts thresh=1000 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001419.301 500709.650 1000032.000 2 x64_sys_call 1000032.000 1000032.000 1000032.000 1 __x64_sys_clock_nanosleep 1000032.000 1000032.000 1000032.000 1 common_nsleep 1000031.000 1000031.000 1000031.000 1 do_nanosleep 1000031.000 1000031.000 1000031.000 1 hrtimer_nanosleep 1000024.000 1000024.000 1000024.000 1 schedule 1387.208 1387.208 1387.208 1 __x64_sys_execve 1386.691 1386.691 1386.691 1 do_execveat_common.isra.0 1334.170 1334.170 1334.170 1 bprm_execve 1258.413 1258.413 1258.413 1 load_elf_binary 1123.068 1123.068 1123.068 1 begin_new_exec 1113.550 1113.550 1113.550 1 mmput 1109.237 1109.237 1109.237 1 exit_mmap root@number:~# perf ftrace profile --graph-opts thresh=1200 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001448.204 500724.102 1000018.000 2 x64_sys_call 1000017.000 1000017.000 1000017.000 1 __x64_sys_clock_nanosleep 1000017.000 1000017.000 1000017.000 1 common_nsleep 1000017.000 1000017.000 1000017.000 1 hrtimer_nanosleep 1000016.000 1000016.000 1000016.000 1 do_nanosleep 1000012.000 1000012.000 1000012.000 1 schedule 1430.112 1430.112 1430.112 1 __x64_sys_execve 1429.581 1429.581 1429.581 1 do_execveat_common.isra.0 1376.289 1376.289 1376.289 1 bprm_execve 1301.743 1301.743 1301.743 1 load_elf_binary root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-26perf docs: Add documentation for --force-btf optionHoward Chu1-0/+5
The --force-btf option is intended for debugging purposes and is currently undocumented. Add documentation for it. Committer notes: We need a follow up patch expanding on what can be done via BTF and what isn't possible and thus needs further work to convert kernel C source code into tables that can then be associated with syscall integer args and struct members, as discussed in: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-3-howardchu95@gmail.com Link: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf intel-pt: Add documentation for pause / resumeAdrian Hunter1-0/+108
Document the use of aux-action config term and provide a simple example. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf intel-pt: Improve man page formatAdrian Hunter1-219/+267
Improve format of config terms and section references. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf tools: Parse aux-actionAdrian Hunter1-0/+4
Add parsing for aux-action to accept "pause", "resume" or "start-paused" values. "start-paused" is valid only for AUX area events. "pause" and "resume" are valid only for events grouped with an AUX area event as the group leader. However, like with aux-output, the events will be automatically grouped if they are not currently in a group, and the AUX area event precedes the other events. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Add --max-latency optionGabriele Monaco1-0/+4
This patch adds a max-latency option as discussed, in case the number of buckets is more than 22, we don't observe the setting (for now, let's say). By default or if 0 is passed, the value is automatically determined based on the number of buckets, range and minimum, so that we fill all available buffers (equivalent to the behaviour before this patch). We now get something like this: # perf ftrace latency --bucket-range=20 \ --min-latency 10 \ --max-latency=100 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 10 us | 1731 | ################ | 10 - 30 us | 1 | | 30 - 50 us | 0 | | 50 - 70 us | 0 | | 70 - 90 us | 0 | | 90 - 100 us | 0 | | 100 - ... us | 0 | | Note the maximum is observed also if it doesn't cover completely a full range (the second to last range is 10us long to let the last start at 100 sharp), this looks to me more sensible and eases the computations, since we don't need to account for the range while filling the buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Introduce --min-latency to narrow down into a latency rangeArnaldo Carvalho de Melo1-0/+4
Things below and over will be in the first and last, outlier, buckets. Without it: # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 200 ns | 0 | | 200 - 400 ns | 44 | | 400 - 600 ns | 291 | # | 600 - 800 ns | 506 | ## | 800 - 1000 ns | 148 | | 1.00 - 1.20 us | 581 | ## | 1.20 - 1.40 us | 2199 | ########## | 1.40 - 1.60 us | 1048 | #### | 1.60 - 1.80 us | 1448 | ###### | 1.80 - 2.00 us | 1091 | ##### | 2.00 - 2.20 us | 517 | ## | 2.20 - 2.40 us | 318 | # | 2.40 - 2.60 us | 370 | # | 2.60 - 2.80 us | 271 | # | 2.80 - 3.00 us | 150 | | 3.00 - 3.20 us | 85 | | 3.20 - 3.40 us | 48 | | 3.40 - 3.60 us | 40 | | 3.60 - 3.80 us | 22 | | 3.80 - 4.00 us | 13 | | 4.00 - 4.20 us | 14 | | 4.20 - ... us | 626 | ## | # # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=20 --min-latency=1200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1200 ns | 1243 | ##### | 1.20 - 1.22 us | 141 | | 1.22 - 1.24 us | 202 | | 1.24 - 1.26 us | 209 | | 1.26 - 1.28 us | 219 | | 1.28 - 1.30 us | 208 | | 1.30 - 1.32 us | 245 | # | 1.32 - 1.34 us | 246 | # | 1.34 - 1.36 us | 224 | # | 1.36 - 1.38 us | 219 | | 1.38 - 1.40 us | 206 | | 1.40 - 1.42 us | 190 | | 1.42 - 1.44 us | 190 | | 1.44 - 1.46 us | 146 | | 1.46 - 1.48 us | 140 | | 1.48 - 1.50 us | 125 | | 1.50 - 1.52 us | 115 | | 1.52 - 1.54 us | 102 | | 1.54 - 1.56 us | 87 | | 1.56 - 1.58 us | 90 | | 1.58 - 1.60 us | 85 | | 1.60 - ... us | 5487 | ######################## | # Now we want focus on the latencies starting at 1.2us, with a finer grained range of 20ns: This is all on a live system, so statistically interesting, but not narrowing down on the same numbers, so a 'perf ftrace latency record' seems interesting to then use all on the same snapshot of latencies. A --max-latency counterpart should come next, at first limiting the max-latency to 20 * bucket-size, as we have a fixed buckets array with 20 + 2 entries (+ for the outliers) and thus would need to make it larger for higher latencies. We also may need a way to ask for not considering the out of range values (first and last buckets) when drawing the buckets bars. Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Introduce --bucket-range to ask for linear bucketingArnaldo Carvalho de Melo1-0/+3
In addition to showing it exponentially, using log2() to figure out the histogram index, allow for showing it linearly: The preexisting more, the default: # perf ftrace latency --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1 ns | 0 | | 1 - 2 ns | 0 | | 2 - 4 ns | 0 | | 4 - 8 ns | 0 | | 8 - 16 ns | 0 | | 16 - 32 ns | 0 | | 32 - 64 ns | 0 | | 64 - 128 ns | 238 | # | 128 - 256 ns | 1704 | ########## | 256 - 512 ns | 672 | ### | 512 - 1024 ns | 4458 | ########################## | 1 - 2 us | 677 | #### | 2 - 4 us | 5 | | 4 - 8 us | 0 | | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - ... ms | 0 | | # The new histogram mode: # perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1 ns | 0 | | 1 - 151 ns | 265 | # | 151 - 301 ns | 1797 | ########### | 301 - 451 ns | 258 | # | 451 - 601 ns | 289 | # | 601 - 751 ns | 2049 | ############# | 751 - 901 ns | 967 | ###### | 901 - 1051 ns | 513 | ### | 1.05 - 1.20 us | 114 | | 1.20 - 1.35 us | 559 | ### | 1.35 - 1.50 us | 189 | # | 1.50 - 1.65 us | 137 | | 1.65 - 1.80 us | 32 | | 1.80 - 1.95 us | 2 | | 1.95 - 2.10 us | 0 | | 2.10 - 2.25 us | 1 | | 2.25 - 2.40 us | 1 | | 2.40 - 2.55 us | 0 | | 2.55 - 2.70 us | 0 | | 2.70 - 2.85 us | 0 | | 2.85 - 3.00 us | 1 | | 3.00 - ... us | 4 | | # Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-3-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf config: Fix trival typo 'an' -> 'can'Arnaldo Carvalho de Melo1-1/+1
Just a trivial typo, should be 'can', did a spell check on the rest of the file just in case, nothing more stood out. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13perf disasm: Allow configuring what disassemblers to useArnaldo Carvalho de Melo1-0/+13
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc. More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced. So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded. This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted. As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods. root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8 /* Update task and its cfs_rq load average */ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp } /* rq->task_clock normalized against any time this cfs_rq has spent throttled */ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~# After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-09perf docs: Document tool and hwmon eventsIan Rogers1-0/+15
Add a few paragraphs on tool and hwmon events. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-29perf arm-spe: Update --itrace help textGraham Woodward2-2/+2
The --itrace help now needs updating to reflect that the --itrace=b argument sythesises branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-5-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-21perf test: Document the -w/--workload optionArn