Age | Commit message (Collapse) | Author | Files | Lines |
|
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When annotating a basic block, it's useful to display the occurrences
of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters
are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the
accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64
space.
Because the saturation of a branch counter is small, e.g., for Intel
Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-5-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-18-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When --group option is used, it should display all events together. But
the current logic only checks if the first (leader) event has samples or
not. Let's check the member events as well.
Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.
For example, take a look at this example.
$ ./perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
It has three events but 'path_put' function has samples only for
mem-stores (second) event.
$ sudo ./perf annotate --stdio -f path_put
Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
----------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 : ffffffffae600020: endbr64
0.00 : ffffffffae600024: nopl (%rax, %rax)
91.22 : ffffffffae600029: pushq %rbx
0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
8.78 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 : ffffffffae600039: popq %rbx
0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 : ffffffffae60003f: nop
Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples. But now it checks both
events.
Before:
$ sudo ./perf annotate --stdio -f --group path_put
(no output)
After:
$ sudo ./perf annotate --stdio -f --group path_put
Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 0.00 0.00 : ffffffffae600020: endbr64
0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax)
0.00 91.22 0.00 : ffffffffae600029: pushq %rbx
0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 0.00 0.00 : ffffffffae600039: popq %rbx
0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 0.00 0.00 : ffffffffae60003f: nop
Committer testing:
Before:
root@number:~# perf annotate --group --stdio2 clear_page_erms
root@number:~#
After:
root@number:~# perf annotate --group --stdio2 clear_page_erms
Samples: 125 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period]
clear_page_erms() /proc/kcore
Percent 0xffffffff990c6cc0 <clear_page_erms>:
endbr64
movl $0x1000,%ecx
xorl %eax,%eax
0.00 100.00 0.00 rep stosb %al, (%rdi)
← retq
int3
int3
int3
int3
nop
nop
root@number:~#
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like in 'perf report', we want to hide empty events in the 'perf annotate'
output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d
Committer testing:
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#
Before:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 0.00 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
After:
root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
instruction
Since the "ins.name" is not set while using raw instruction,
'perf annotate' with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad
-----------------------------------------------------------
: 419 196
This patch sets "dl->ins.name" in arch specific function
"check_ppc_insn" while initialising "struct disasm_line".
Also update "ins_find" function to pass "struct disasm_line" as a
parameter so as to set its name field in arch specific call.
With the patch changes:
Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)
Name/opcode : Good Bad
-----------------------------------------------------------
58 : 323 80
32 : 49 43
34 : 33 11
OP_31_XOP_LDX : 8 20
40 : 23 0
OP_31_XOP_LWARX : 5 1
OP_31_XOP_LWZX : 2 3
OP_31_XOP_LDARX : 3 0
33 : 0 2
OP_31_XOP_LBZX : 0 1
OP_31_XOP_LWAX : 0 1
OP_31_XOP_LHZX : 0 1
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-16-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add the skip_empty flag to symbol_conf and set the value from the report
command to preserve the existing behavior. This makes the code simpler
and will be needed other code which is hard to add a new argument.
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240607202918.2357459-4-namhyung@kernel.org
|
|
Add reference count checking to struct dso, this can help with
implementing correct reference counting discipline. To avoid
RC_CHK_ACCESS everywhere, add accessor functions for the variables in
struct dso.
The majority of the change is mechanical in nature and not easy to
split up.
Committer testing:
'perf test' up to this patch shows no regressions.
But:
util/symbol.c: In function ‘dso__load_bfd_symbols’:
util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’
1683 | dso__set_adjust_symbols(dso);
| ^~~~~~~~~~~~~~~~~~~~~~~
In file included from util/symbol.c:21:
util/dso.h:268:20: note: declared here
268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val)
| ^~~~~~~~~~~~~~~~~~~~~~~
make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1
MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/
make[6]: *** Waiting for unfinished jobs....
This was updated:
- symbols__fixup_end(&dso->symbols, false);
- symbols__fixup_duplicate(&dso->symbols);
- dso->adjust_symbols = 1;
+ symbols__fixup_end(dso__symbols(dso), false);
+ symbols__fixup_duplicate(dso__symbols(dso));
+ dso__set_adjust_symbols(dso);
But not build tested with BUILD_NONDISTRO and libbfd devel files installed
(binutils-devel on fedora).
Add the missing argument:
symbols__fixup_end(dso__symbols(dso), false);
symbols__fixup_duplicate(dso__symbols(dso));
- dso__set_adjust_symbols(dso);
+ dso__set_adjust_symbols(dso, true);
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The loop in hists__find_annotations() never set the 'nd' pointer to NULL
and it makes stdio output repeating the last element forever. I think
it doesn't set to NULL for TUI to prevent it from exiting unexpectedly.
But it should just set on stdio mode.
Fixes: d001c7a7f4736743 ("perf annotate-data: Add hist_entry__annotate_data_tui()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240423020643.740029-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support data type profiling output on TUI.
Testing from Arnaldo:
First make sure that the debug information for your workload binaries
in embedded in them by building it with '-g' or install the debuginfo
packages, since our workload is 'find':
root@number:~# type find
find is hashed (/usr/bin/find)
root@number:~# rpm -qf /usr/bin/find
findutils-4.9.0-5.fc39.x86_64
root@number:~# dnf debuginfo-install findutils
<SNIP>
root@number:~#
Then collect some data:
root@number:~# echo 1 > /proc/sys/vm/drop_caches
root@number:~# perf mem record find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ]
root@number:~#
Finally do data-type annotation with the following command, that will
default, as 'perf report' to the --tui mode, with lines colored to
highlight the hotspots, etc.
root@number:~# perf annotate --data-type
Annotate type: 'struct predicate' (58 samples)
Percent Offset Size Field
100.00 0 312 struct predicate {
0.00 0 8 PRED_FUNC pred_func;
0.00 8 8 char* p_name;
0.00 16 4 enum predicate_type p_type;
0.00 20 4 enum predicate_precedence p_prec;
0.00 24 1 _Bool side_effects;
0.00 25 1 _Bool no_default_print;
0.00 26 1 _Bool need_stat;
0.00 27 1 _Bool need_type;
0.00 28 1 _Bool need_inum;
0.00 32 4 enum EvaluationCost p_cost;
0.00 36 4 float est_success_rate;
0.00 40 1 _Bool literal_control_chars;
0.00 41 1 _Bool artificial;
0.00 48 8 char* arg_text;
<SNIP>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
And move the related code into util/annotate-data.c file.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like 'perf report', it can take a while to process samples.
Show a progress window to inform users how that it is not stuck.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For data type profiling output, it should be in sync with normal output
so make it display percentage for each field. Also use coloring scheme
for users to identify fields with big overhead easily.
Users can use --show-total-period or --show-nr-samples to change the
output style like in the normal perf annotate output.
Before:
$ perf annotate --data-type
Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
============================================================================
samples offset size field
34 0 9792 struct task_struct {
2 0 24 struct thread_info thread_info {
0 0 8 long unsigned int flags;
1 8 8 long unsigned int syscall_work;
0 16 4 u32 status;
1 20 4 u32 cpu;
};
After:
$ perf annotate --data-type
Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples):
============================================================================
Percent offset size field
100.00 0 9792 struct task_struct {
3.55 0 24 struct thread_info thread_info {
0.00 0 8 long unsigned int flags;
1.63 8 8 long unsigned int syscall_work;
0.00 16 4 u32 status;
1.91 20 4 u32 cpu;
};
Committer testing:
First collect a suitable perf.data file for use with 'perf annotate --data-type':
root@number:~# perf mem record -a sleep 1s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ]
root@number:~#
Then, before:
root@number:~# perf annotate --data-type
Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
============================================================================
samples offset size field
6 0 40 union {
6 0 40 struct __pthread_mutex_s __data {
2 0 4 int __lock;
0 4 4 unsigned int __count;
0 8 4 int __owner;
1 12 4 unsigned int __nusers;
2 16 4 int __kind;
1 20 2 short int __spins;
0 22 2 short int __elision;
0 24 16 __pthread_list_t __list {
0 24 8 struct __pthread_internal_list* __prev;
0 32 8 struct __pthread_internal_list* __next;
};
};
0 0 0 char* __size;
2 0 8 long int __align;
};
<SNIP>
And after:
Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples):
============================================================================
Percent offset size field
100.00 0 40 union {
100.00 0 40 struct __pthread_mutex_s __data {
31.27 0 4 int __lock;
0.00 4 4 unsigned int __count;
0.00 8 4 int __owner;
7.67 12 4 unsigned int __nusers;
53.10 16 4 int __kind;
7.96 20 2 short int __spins;
0.00 22 2 short int __elision;
0.00 24 16 __pthread_list_t __list {
0.00 24 8 struct __pthread_internal_list* __prev;
0.00 32 8 struct __pthread_internal_list* __next;
};
};
0.00 0 0 char* __size;
31.27 0 8 long int __align;
};
<SNIP>
The lines with percentages >= 7.67 have its percentages red colored.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The options array in cmd_annotate() has duplicate --group options. It
only needs one and let's get rid of the other.
$ perf annotate -h 2>&1 | grep group
--group Show event group information together
--group Show event group information together
Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If it failed to find a variable for the location directly, it might be
due to a missing variable in the source code. For example, accessing
pointer variables in a chain can result in the case like below:
struct foo *foo = ...;
int i = foo->bar->baz;
The DWARF debug information is created for each variable so it'd have
one for 'foo'. But there's no variable for 'foo->bar' and then it
cannot know the type of 'bar' and 'baz'.
The above source code can be compiled to the follow x86 instructions:
mov 0x8(%rax), %rcx
mov 0x4(%rcx), %rdx <=== PMU sample
mov %rdx, -4(%rbp)
Let's say 'foo' is located in the %rax and it has a pointer to struct
foo. But perf sample is captured in the second instruction and there
is no variable or type info for the %rcx.
It'd be great if compiler could generate debug info for %rcx, but we
should handle it on our side. So this patch implements the logic to
iterate instructions and update the type table for each location.
As it already collected a list of scopes including the target
instruction, we can use it to construct the type table smartly.
+---------------- scope[0] subprogram
|
| +-------------- scope[1] lexical_block
| |
| | +------------ scope[2] inlined_subroutine
| | |
| | | +---------- scope[3] inlined_subroutine
| | | |
| | | | +-------- scope[4] lexical_block
| | | | |
| | | | | *** target instruction
...
Image the target instruction has 5 scopes, each scope will have its own
variables and parameters. Then it can start with the innermost scope
(4). So it'd search the shortest path from the start of scope[4] to
the target address and build a list of basic blocks. Then it iterates
the basic blocks with the variables in the scope and update the table.
If it finds a type at the target instruction, then returns it.
Otherwise, it moves to the upper scope[3]. Now it'd search the shortest
path from the start of scope[3] to the start of scope[4]. Then connect
it to the existing basic block list. Then it'd iterate the blocks with
variables for both scopes. It can repeat this until it finds a type at
the target instruction or reaches to the top scope[0].
As the basic blocks contain the shortest path, it won't worry about
branches and can update the table simply.
The final check will be done by find_matching_type() in the next patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This is for a debugging purpose. It'd be useful to see per-instrucion
level success/failure stats.
$ perf annotate --data-type --insn-stat
Annotate Instruction stats
total 264, ok 143 (54.2%), bad 121 (45.8%)
Name : Good Bad
-----------------------------------------------------------
movq : 45 31
movl : 22 11
popq : 0 19
cmpl : 16 3
addq : 8 7
cmpq : 11 3
cmpxchgl : 3 7
cmpxchgq : 8 0
incl : 3 3
movzbl : 4 2
incq : 4 2
decl : 6 0
...
Committer notes:
So these are about being able to find the type for accesses from these
instructions, we should improve the naming, but it is for debugging, we
can improve this later:
@@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
continue;
mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset);
+ if (mem_type)
+ istat->good++;
+ else
+ istat->bad++;
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --type-stat option is to be used with --data-type and to print
detailed failure reasons for the data type annotation.
$ perf annotate --data-type --type-stat
Annotate data type stats:
total 294, ok 116 (39.5%), bad 178 (60.5%)
-----------------------------------------------------------
30 : no_sym
40 : no_insn_ops
33 : no_mem_ops
63 : no_var
4 : no_typeinfo
8 : bad_offset
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When events are grouped together, it'd be natural to show them at once
like in other mode. Handle group leaders with members to collect the
number of samples together and display like below:
$ perf annotate --data-type --group
...
Annotate type: 'struct page' in vmlinux (1 samples):
event[0] = cpu/mem-loads,ldlat=30/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
============================================================================
samples offset size field
1 0 0 0 64 struct page {
0 0 0 0 8 long unsigned int flags;
0 0 0 8 40 union {
0 0 0 8 40 struct {
0 0 0 8 16 union {
0 0 0 8 16 struct list_head lru {
0 0 0 8 8 struct list_head* next;
0 0 0 16 8 struct list_head* prev;
};
0 0 0 8 16 struct {
0 0 0 8 8 void* __filler;
0 0 0 16 4 unsigned int mlock_count;
};
0 0 0 8 16 struct list_head buddy_list {
0 0 0 8 8 struct list_head* next;
0 0 0 16 8 struct list_head* prev;
};
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-16-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support data type annotation with new --data-type option. It internally
uses type sort key to collect sample histogram for the type and display
every members like below.
$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...
For simplicity it prints the number of samples per field for now.
But it should be easy to show the overhead percentage instead.
The number at the outer struct is a sum of the numbers of the inner
members. For example, struct cfs_rq got total 13 samples, and 2 came
from the load (struct load_weight) and 1 from h_nr_running. Similarly,
the struct load_weight got total 2 samples and they all came from the
weight field.
I've added two new flags in the symbol_conf for this. The
annotate_data_member is to get the members of the type. This is also
needed for perf report with typeoff sort key. The annotate_data_sample
is to update sample stats for each offset and used only in annotate.
Currently it only support stdio output mode, TUI support can be added
later.
Committer testing:
With the perf.data from the previous csets, a very simple, short
duration one:
# perf annotate --data-type
Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples):
============================================================================
samples offset size field
1 0 16 struct list_head {
0 0 8 struct list_head* next;
1 8 8 struct list_head* prev;
};
Annotate type: 'char' in [kernel.kallsyms] (1 samples):
============================================================================
samples offset size field
1 0 1 char ;
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now it only cares about the global options so it can just handle it
without the argument.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now it can use the global options and no need save local browser
options separately.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now it can directly use the global options and no need to pass it as an
argument.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-5-namhyung@kernel.org
[ Fixup build with GTK2=1 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The annotation options are to control the behavior of objdump and the
output. It's basically used by 'perf annotate' but 'perf report' and
'perf top' can call it on TUI dynamically.
But it doesn't need to have a copy of annotation options in many places.
As most of the work is done in the util/annotate.c file, add a global
variable and set/use it instead of having their own copies.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
annotated_branch'
The max_coverage field is only used when branch stack info is available
so it'd be natural to move to 'struct annotated_branch'.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
struct addr_location holds references to multiple reference counted
objects. Add init/exit functions to make maintenance of those more
consistent with the rest of the code and to try to avoid
leaks. Modification of thread reference counts isn't included in this
change.
Committer notes:
I needed to initialize result to sample->ip to make sure is set to
something, fixing a compile time error, mostly keeping the previous
logic as build_alloc_func_list() already does debugging/error prints
about what went wrong if it takes the 'goto out'.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com& |