summaryrefslogtreecommitdiff
path: root/tools/perf/builtin-annotate.c
AgeCommit message (Collapse)AuthorFilesLines
2024-08-14perf annotate: Display the branch counter histogramKan Liang1-3/+7
Display the branch counter histogram in the annotation view. Press 'B' to display the branch counter's abbreviation list as well. Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }', 4000 Hz, Event count (approx.): f3 /home/sdp/test/tchain_edit [Percent: local period] Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%) │ 0000000000401755 <f3>: 0.00 0.00 │ endbr64 │ push %rbp │ mov %rsp,%rbp │ movl $0x0,-0x4(%rbp) 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25 11.03 11.03 │ 11: mov -0x4(%rbp),%eax │ and $0x1,%eax │ test %eax,%eax 17.13 17.13 │2.41 1 |A |- | ↓ je 21 │ addl $0x1,-0x4(%rbp) 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp) 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp) 11.03 11.03 │0.61 3 |A |- | ↑ jle 11 │ nop │ pop %rbp 0.00 0.00 │0.24 20 |AA |B | ← ret Originally-by: Tinghao Zhang <tinghao.zhang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-14perf annotate: Save branch counters for each blockKan Liang1-1/+2
When annotating a basic block, it's useful to display the occurrences of other events in the block. The branch counter feature is only available for newer Intel platforms. So a dedicated option to display the branch counters is not introduced. Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation. When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation. Accounting the branch counters as well when accounting the cycles in hist__account_cycles(). In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter. In a sample, all the branch counters for a branch are saved in a u64 space. Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3. Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-5-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf annotate: Use perf_tool__init()Ian Rogers1-22/+20
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-18-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tool: Constify tool pointersIan Rogers1-1/+1
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf annotate: Fix --group behavior when leader has no samplesNamhyung Kim1-2/+12
When --group option is used, it should display all events together. But the current logic only checks if the first (leader) event has samples or not. Let's check the member events as well. Also it missed to put the linked samples from member evsels to the output RB-tree so that it can be displayed in the output. For example, take a look at this example. $ ./perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u It has three events but 'path_put' function has samples only for mem-stores (second) event. $ sudo ./perf annotate --stdio -f path_put Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period) ---------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 : ffffffffae600020: endbr64 0.00 : ffffffffae600024: nopl (%rax, %rax) 91.22 : ffffffffae600029: pushq %rbx 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 8.78 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 : ffffffffae600039: popq %rbx 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 : ffffffffae60003f: nop Therefore, it didn't show up when --group option is used since the leader ("mem-loads") event has no samples. But now it checks both events. Before: $ sudo ./perf annotate --stdio -f --group path_put (no output) After: $ sudo ./perf annotate --stdio -f --group path_put Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period) ------------------------------------------------------------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 0.00 0.00 : ffffffffae600020: endbr64 0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax) 0.00 91.22 0.00 : ffffffffae600029: pushq %rbx 0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 0.00 0.00 : ffffffffae600039: popq %rbx 0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 0.00 0.00 : ffffffffae60003f: nop Committer testing: Before: root@number:~# perf annotate --group --stdio2 clear_page_erms root@number:~# After: root@number:~# perf annotate --group --stdio2 clear_page_erms Samples: 125 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period] clear_page_erms() /proc/kcore Percent 0xffffffff990c6cc0 <clear_page_erms>: endbr64 movl $0x1000,%ecx xorl %eax,%eax 0.00 100.00 0.00 rep stosb %al, (%rdi) ← retq int3 int3 int3 int3 nop nop root@number:~# Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Add --skip-empty optionNamhyung Kim1-0/+2
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report. For example, the following command would use 3 events including dummy. $ perf mem record -a -- perf test -w noploop $ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u Just using perf annotate with --group will show the all 3 events. $ perf annotate --group --stdio | head Percent | Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d Now with --skip-empty, it'll hide the last dummy event. $ perf annotate --group --stdio --skip-empty | head Percent | Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d Committer testing: root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# Before: root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# After: root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31perf annotate: Set instruction name to be used with insn-stat when using raw ↵Athira Rajeev1-2/+2
instruction Since the "ins.name" is not set while using raw instruction, 'perf annotate' with insn-stat gives wrong data: Result from "./perf annotate --data-type --insn-stat": Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%) Name : Good Bad ----------------------------------------------------------- : 419 196 This patch sets "dl->ins.name" in arch specific function "check_ppc_insn" while initialising "struct disasm_line". Also update "ins_find" function to pass "struct disasm_line" as a parameter so as to set its name field in arch specific call. With the patch changes: Annotate Instruction stats total 609, ok 446 (73.2%), bad 163 (26.8%) Name/opcode : Good Bad ----------------------------------------------------------- 58 : 323 80 32 : 49 43 34 : 33 11 OP_31_XOP_LDX : 8 20 40 : 23 0 OP_31_XOP_LWARX : 5 1 OP_31_XOP_LWZX : 2 3 OP_31_XOP_LDARX : 3 0 33 : 0 2 OP_31_XOP_LBZX : 0 1 OP_31_XOP_LWAX : 0 1 OP_31_XOP_LHZX : 0 1 Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-16-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-15perf hist: Add symbol_conf.skip_emptyNamhyung Kim1-2/+2
Add the skip_empty flag to symbol_conf and set the value from the report command to preserve the existing behavior. This makes the code simpler and will be needed other code which is hard to add a new argument. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-4-namhyung@kernel.org
2024-05-06perf dso: Add reference count checking and accessor functionsIan Rogers1-3/+3
Add reference count checking to struct dso, this can help with implementing correct reference counting discipline. To avoid RC_CHK_ACCESS everywhere, add accessor functions for the variables in struct dso. The majority of the change is mechanical in nature and not easy to split up. Committer testing: 'perf test' up to this patch shows no regressions. But: util/symbol.c: In function ‘dso__load_bfd_symbols’: util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’ 1683 | dso__set_adjust_symbols(dso); | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from util/symbol.c:21: util/dso.h:268:20: note: declared here 268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val) | ^~~~~~~~~~~~~~~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1 MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/ make[6]: *** Waiting for unfinished jobs.... This was updated: - symbols__fixup_end(&dso->symbols, false); - symbols__fixup_duplicate(&dso->symbols); - dso->adjust_symbols = 1; + symbols__fixup_end(dso__symbols(dso), false); + symbols__fixup_duplicate(dso__symbols(dso)); + dso__set_adjust_symbols(dso); But not build tested with BUILD_NONDISTRO and libbfd devel files installed (binutils-devel on fedora). Add the missing argument: symbols__fixup_end(dso__symbols(dso), false); symbols__fixup_duplicate(dso__symbols(dso)); - dso__set_adjust_symbols(dso); + dso__set_adjust_symbols(dso, true); Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf annotate: Fix data type profiling on stdioNamhyung Kim1-1/+1
The loop in hists__find_annotations() never set the 'nd' pointer to NULL and it makes stdio output repeating the last element forever. I think it doesn't set to NULL for TUI to prevent it from exiting unexpectedly. But it should just set on stdio mode. Fixes: d001c7a7f4736743 ("perf annotate-data: Add hist_entry__annotate_data_tui()") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240423020643.740029-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Add hist_entry__annotate_data_tui()Namhyung Kim1-4/+26
Support data type profiling output on TUI. Testing from Arnaldo: First make sure that the debug information for your workload binaries in embedded in them by building it with '-g' or install the debuginfo packages, since our workload is 'find': root@number:~# type find find is hashed (/usr/bin/find) root@number:~# rpm -qf /usr/bin/find findutils-4.9.0-5.fc39.x86_64 root@number:~# dnf debuginfo-install findutils <SNIP> root@number:~# Then collect some data: root@number:~# echo 1 > /proc/sys/vm/drop_caches root@number:~# perf mem record find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ] root@number:~# Finally do data-type annotation with the following command, that will default, as 'perf report' to the --tui mode, with lines colored to highlight the hotspots, etc. root@number:~# perf annotate --data-type Annotate type: 'struct predicate' (58 samples) Percent Offset Size Field 100.00 0 312 struct predicate { 0.00 0 8 PRED_FUNC pred_func; 0.00 8 8 char* p_name; 0.00 16 4 enum predicate_type p_type; 0.00 20 4 enum predicate_precedence p_prec; 0.00 24 1 _Bool side_effects; 0.00 25 1 _Bool no_default_print; 0.00 26 1 _Bool need_stat; 0.00 27 1 _Bool need_type; 0.00 28 1 _Bool need_inum; 0.00 32 4 enum EvaluationCost p_cost; 0.00 36 4 float est_success_rate; 0.00 40 1 _Bool literal_control_chars; 0.00 41 1 _Bool artificial; 0.00 48 8 char* arg_text; <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Add hist_entry__annotate_data_tty()Namhyung Kim1-105/+1
And move the related code into util/annotate-data.c file. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate: Show progress of sample processingNamhyung Kim1-2/+13
Like 'perf report', it can take a while to process samples. Show a progress window to inform users how that it is not stuck. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Honor output options with --data-typeNamhyung Kim1-6/+38
For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme for users to identify fields with big overhead easily. Users can use --show-total-period or --show-nr-samples to change the output style like in the normal perf annotate output. Before: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ samples offset size field 34 0 9792 struct task_struct { 2 0 24 struct thread_info thread_info { 0 0 8 long unsigned int flags; 1 8 8 long unsigned int syscall_work; 0 16 4 u32 status; 1 20 4 u32 cpu; }; After: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ Percent offset size field 100.00 0 9792 struct task_struct { 3.55 0 24 struct thread_info thread_info { 0.00 0 8 long unsigned int flags; 1.63 8 8 long unsigned int syscall_work; 0.00 16 4 u32 status; 1.91 20 4 u32 cpu; }; Committer testing: First collect a suitable perf.data file for use with 'perf annotate --data-type': root@number:~# perf mem record -a sleep 1s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ] root@number:~# Then, before: root@number:~# perf annotate --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ samples offset size field 6 0 40 union { 6 0 40 struct __pthread_mutex_s __data { 2 0 4 int __lock; 0 4 4 unsigned int __count; 0 8 4 int __owner; 1 12 4 unsigned int __nusers; 2 16 4 int __kind; 1 20 2 short int __spins; 0 22 2 short int __elision; 0 24 16 __pthread_list_t __list { 0 24 8 struct __pthread_internal_list* __prev; 0 32 8 struct __pthread_internal_list* __next; }; }; 0 0 0 char* __size; 2 0 8 long int __align; }; <SNIP> And after: Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ Percent offset size field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 31.27 0 4 int __lock; 0.00 4 4 unsigned int __count; 0.00 8 4 int __owner; 7.67 12 4 unsigned int __nusers; 53.10 16 4 int __kind; 7.96 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0 0 char* __size; 31.27 0 8 long int __align; }; <SNIP> The lines with percentages >= 7.67 have its percentages red colored. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Get rid of duplicate --group option itemNamhyung Kim1-2/+0
The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other. $ perf annotate -h 2>&1 | grep group --group Show event group information together --group Show event group information together Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Implement instruction trackingNamhyung Kim1-0/+1
If it failed to find a variable for the location directly, it might be due to a missing variable in the source code. For example, accessing pointer variables in a chain can result in the case like below: struct foo *foo = ...; int i = foo->bar->baz; The DWARF debug information is created for each variable so it'd have one for 'foo'. But there's no variable for 'foo->bar' and then it cannot know the type of 'bar' and 'baz'. The above source code can be compiled to the follow x86 instructions: mov 0x8(%rax), %rcx mov 0x4(%rcx), %rdx <=== PMU sample mov %rdx, -4(%rbp) Let's say 'foo' is located in the %rax and it has a pointer to struct foo. But perf sample is captured in the second instruction and there is no variable or type info for the %rcx. It'd be great if compiler could generate debug info for %rcx, but we should handle it on our side. So this patch implements the logic to iterate instructions and update the type table for each location. As it already collected a list of scopes including the target instruction, we can use it to construct the type table smartly. +---------------- scope[0] subprogram | | +-------------- scope[1] lexical_block | | | | +------------ scope[2] inlined_subroutine | | | | | | +---------- scope[3] inlined_subroutine | | | | | | | | +-------- scope[4] lexical_block | | | | | | | | | | *** target instruction ... Image the target instruction has 5 scopes, each scope will have its own variables and parameters. Then it can start with the innermost scope (4). So it'd search the shortest path from the start of scope[4] to the target address and build a list of basic blocks. Then it iterates the basic blocks with the variables in the scope and update the table. If it finds a type at the target instruction, then returns it. Otherwise, it moves to the upper scope[3]. Now it'd search the shortest path from the start of scope[3] to the start of scope[4]. Then connect it to the existing basic block list. Then it'd iterate the blocks with variables for both scopes. It can repeat this until it finds a type at the target instruction or reaches to the top scope[0]. As the basic blocks contain the shortest path, it won't worry about branches and can update the table simply. The final check will be done by find_matching_type() in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add --insn-stat option for debuggingNamhyung Kim1-0/+41
This is for a debugging purpose. It'd be useful to see per-instrucion level success/failure stats. $ perf annotate --data-type --insn-stat Annotate Instruction stats total 264, ok 143 (54.2%), bad 121 (45.8%) Name : Good Bad ----------------------------------------------------------- movq : 45 31 movl : 22 11 popq : 0 19 cmpl : 16 3 addq : 8 7 cmpq : 11 3 cmpxchgl : 3 7 cmpxchgq : 8 0 incl : 3 3 movzbl : 4 2 incq : 4 2 decl : 6 0 ... Committer notes: So these are about being able to find the type for accesses from these instructions, we should improve the naming, but it is for debugging, we can improve this later: @@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he) continue; mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset); + if (mem_type) + istat->good++; + else + istat->bad++; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add --type-stat option for debuggingNamhyung Kim1-1/+43
The --type-stat option is to be used with --data-type and to print detailed failure reasons for the data type annotation. $ perf annotate --data-type --type-stat Annotate data type stats: total 294, ok 116 (39.5%), bad 178 (60.5%) ----------------------------------------------------------- 30 : no_sym 40 : no_insn_ops 33 : no_mem_ops 63 : no_var 4 : no_typeinfo 8 : bad_offset Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Support event group displayNamhyung Kim1-12/+77
When events are grouped together, it'd be natural to show them at once like in other mode. Handle group leaders with members to collect the number of samples together and display like below: $ perf annotate --data-type --group ... Annotate type: 'struct page' in vmlinux (1 samples): event[0] = cpu/mem-loads,ldlat=30/P event[1] = cpu/mem-stores/P event[2] = dummy:u ============================================================================ samples offset size field 1 0 0 0 64 struct page { 0 0 0 0 8 long unsigned int flags; 0 0 0 8 40 union { 0 0 0 8 40 struct { 0 0 0 8 16 union { 0 0 0 8 16 struct list_head lru { 0 0 0 8 8 struct list_head* next; 0 0 0 16 8 struct list_head* prev; }; 0 0 0 8 16 struct { 0 0 0 8 8 void* __filler; 0 0 0 16 4 unsigned int mlock_count; }; 0 0 0 8 16 struct list_head buddy_list { 0 0 0 8 8 struct list_head* next; 0 0 0 16 8 struct list_head* prev; }; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-16-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add --data-type optionNamhyung Kim1-1/+96
Support data type annotation with new --data-type option. It internally uses type sort key to collect sample histogram for the type and display every members like below. $ perf annotate --data-type ... Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples): ============================================================================ samples offset size field 13 0 640 struct cfs_rq { 2 0 16 struct load_weight load { 2 0 8 unsigned long weight; 0 8 4 u32 inv_weight; }; 0 16 8 unsigned long runnable_weight; 0 24 4 unsigned int nr_running; 1 28 4 unsigned int h_nr_running; ... For simplicity it prints the number of samples per field for now. But it should be easy to show the overhead percentage instead. The number at the outer struct is a sum of the numbers of the inner members. For example, struct cfs_rq got total 13 samples, and 2 came from the load (struct load_weight) and 1 from h_nr_running. Similarly, the struct load_weight got total 2 samples and they all came from the weight field. I've added two new flags in the symbol_conf for this. The annotate_data_member is to get the members of the type. This is also needed for perf report with typeoff sort key. The annotate_data_sample is to update sample stats for each offset and used only in annotate. Currently it only support stdio output mode, TUI support can be added later. Committer testing: With the perf.data from the previous csets, a very simple, short duration one: # perf annotate --data-type Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 16 struct list_head { 0 0 8 struct list_head* next; 1 8 8 struct list_head* prev; }; Annotate type: 'char' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 1 char ; # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Ensure init/exit for global optionsNamhyung Kim1-4/+4
Now it only cares about the global options so it can just handle it without the argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf ui/browser/annotate: Use global annotation_optionsNamhyung Kim1-1/+1
Now it can use the global options and no need save local browser options separately. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Use global annotation_optionsNamhyung Kim1-4/+3
Now it can directly use the global options and no need to pass it as an argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-5-namhyung@kernel.org [ Fixup build with GTK2=1 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Introduce global annotation_optionsNamhyung Kim1-22/+21
The annotation options are to control the behavior of objdump and the output. It's basically used by 'perf annotate' but 'perf report' and 'perf top' can call it on TUI dynamically. But it doesn't need to have a copy of annotation options in many places. As most of the work is done in the util/annotate.c file, add a global variable and set/use it instead of having their own copies. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Move max_coverage from 'struct annotation' to 'struct ↵Namhyung Kim1-2/+5
annotated_branch' The max_coverage field is only used when branch stack info is available so it'd be natural to move to 'struct annotated_branch'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf addr_location: Add init/exit/copy functionsIan Rogers1-9/+19
struct addr_location holds references to multiple reference counted objects. Add init/exit functions to make maintenance of those more consistent with the rest of the code and to try to avoid leaks. Modification of thread reference counts isn't included in this change. Committer notes: I needed to initialize result to sample->ip to make sure is set to something, fixing a compile time error, mostly keeping the previous logic as build_alloc_func_list() already does debugging/error prints about what went wrong if it takes the 'goto out'. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com&