summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2024-10-28perf cap: Add __NR_capget to arch/x86 unistdIan Rogers1-7/+3
As there are duplicated kernel headers in tools/include libc can pick up the wrong definitions. This was causing the wrong system call for capget in perf. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: e25ebda78e230283 ("perf cap: Tidy up and improve capability testing") Closes: https://lore.kernel.org/lkml/cc7d6bdf-1aeb-4179-9029-4baf50b59342@intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241026055448.312247-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers: Update the linux/unaligned.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+1
To pick up the changes in: 7f053812dab3946c ("random: vDSO: minimize and simplify header includes") That required adding a copy of include/vdso/unaligned.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/unaligned.h include/linux/unaligned.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORTArnaldo Carvalho de Melo1-0/+3
Noticed while building on a raspbian arm 32-bit system. There was also this other case, fixed by adding a missing util/stat.h with the prototypes: /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1396:6: error: no previous prototype for ‘perf_stat__set_no_csv_summary’ [-Werror=missing-prototypes] 1396 | void perf_stat__set_no_csv_summary(int set __maybe_unused) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1400:6: error: no previous prototype for ‘perf_stat__set_big_num’ [-Werror=missing-prototypes] 1400 | void perf_stat__set_big_num(int set __maybe_unused) | ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors In other architectures this must be building due to some lucky indirect inclusion of that header. Fixes: 9dabf4003423c8d3 ("perf python: Switch module to linking libraries from building source") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZxllAtpmEw5fg9oy@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf test: Handle perftool-testsuite_probe failure due to broken DWARFVeronika Molnarova1-15/+54
Test case test_adding_blacklisted ends in failure if the blacklisted probe is of an assembler function with no DWARF available. At the same time, probing the blacklisted function with ASM DWARF doesn't test the blacklist itself as the failure is a result of the broken DWARF. When the broken DWARF output is encountered, check if the probed function was compiled by the assembler. If so, the broken DWARF message is expected and does not report a perf issue, else report a failure. If the ASM DWARF affected the probe, try the next probe on the blacklist. If the first 5 probes are defective due to broken DWARF, skip the test case. Fixes: def5480d63c1e847 ("perf testsuite probe: Add test for blacklisted kprobes handling") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241017161555.236769-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace: Fix non-listed archs in the syscalltbl routinesJiri Slaby1-0/+10
This fixes a build breakage on 32-bit arm, where the syscalltbl__id_at_idx() function was missing. Committer notes: Generating a proper syscall table from a copy of arch/arm/tools/syscall.tbl ends up being too big a patch for this rc stage, I started doing it but while testing noticed some other problems with using BPF to collect pointer args on arm7 (32-bit) will maybe continue trying to make it work on the next cycle... Fixes: 7a2fb5619cc1fb53 ("perf trace: Fix iteration of syscall ids in syscalltbl->entries") Suggested-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: <jslaby@suse.cz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/lkml/3a592835-a14f-40be-8961-c0cee7720a94@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf build: Change the clang check back to 12.0.1Howard Chu1-2/+2
This serves as a revert for this patch: https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/ Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace augmented_raw_syscalls: Add more checks to pass the verifierHoward Chu1-2/+18
Add some more checks to pass the verifier in more kernels. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com [ Reduced the patch removing things that can be done later ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace augmented_raw_syscalls: Add extra array index bounds checking to ↵Arnaldo Carvalho de Melo1-0/+2
satisfy some BPF verifiers In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise kernels go, have backports from modern kernels, the verifier complains about lack of bounds check for the index into the array of syscall arguments, on a BPF bytecode generated by clang 17, with: ; } else if (size < 0 && size >= -6) { /* buffer */ 116: (b7) r1 = -6 117: (2d) if r1 > r6 goto pc-30 R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value ; index = -(size + 1); 118: (a7) r6 ^= -1 119: (67) r6 <<= 32 120: (77) r6 >>= 32 ; aug_size = args->args[index]; 121: (67) r6 <<= 3 122: (79) r1 = *(u64 *)(r10 -24) 123: (0f) r1 += r6 last_idx 123 first_idx 116 regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24) regs=40 stack=0 before 121: (67) r6 <<= 3 regs=40 stack=0 before 120: (77) r6 >>= 32 regs=40 stack=0 before 119: (67) r6 <<= 32 regs=40 stack=0 before 118: (a7) r6 ^= -1 regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30 regs=42 stack=0 before 116: (b7) r1 = -6 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value parent didn't have regs=40 stack=0 marks last_idx 110 first_idx 98 regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5 regs=42 stack=0 before 109: (b7) r1 = 1 regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7 regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value parent already had regs=40 stack=0 marks 124: (79) r6 = *(u64 *)(r1 +16) R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value R1 unbounded memory access, make sure to bounds check any such access processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3 If we add this line, as used in other BPF programs, to cap that index: index &= 7; The generated BPF program is considered safe by that version of the BPF verifier, allowing perf to collect the syscall args in one more kernel using the BPF based pointer contents collector. With the above one-liner it works with that kernel: [root@dell-per740-01 ~]# uname -a Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890 0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }) = 0 [root@dell-per740-01 ~]# As well as with the one in Fedora 40: root@number:~# uname -a Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux root@number:~# perf trace -e *sleep* sleep 1.234567890 0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0 root@number:~# Song Liu reported that this one-liner was being optimized out by clang 18, so I suggested and he tested that adding a compiler barrier before it made clang v18 to keep it and the verifier in the kernel in Song's case (Meta's 5.12 based kernel) also was happy with the resulting bytecode. I'll investigate using virtme-ng[1] to have all the perf BPF based functionality thoroughly tested over multiple kernels and clang versions. [1] https://kernel-recipes.org/en/2024/virtme-ng/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <andrea.righi@linux.dev> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-17perf trace: The return from 'write' isn't a pidArnaldo Carvalho de Melo1-1/+1
When adding a explicit beautifier for the 'write' syscall when the BPF based buffer collector was introduced there was a cut'n'paste error that carried the syscall_fmt->errpid setting from a nearby syscall (waitid) that returns a pid. So the write return was being suppressed by the return pretty printer, remove that field, reverting it back to the default return handler, that prints positive numbers as-is and interpret negative values as errnos. I actually introduced the problem while making Howard's original patch work just with the 'write' syscall, as we couldn't just look for any buffers, the ones that are filled in by the kernel couldn't use the same sys_enter BPF collector. Fixes: b257fac12f38d7f5 ("perf trace: Pretty print buffer data") Reported-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/lkml/bcf50648-3c7e-4513-8717-0d14492c53b9@linaro.org Link: https://lore.kernel.org/all/Zt8jTfzDYgBPvFCd@x1/#t Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-08Merge tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of ↵Linus Torvalds14-32/+161
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix an assert() to handle captured and unprocessed ARM CoreSight CPU traces - Fix static build compilation error when libdw isn't installed or is too old - Add missing include when building with !HAVE_DWARF_GETLOCATIONS_SUPPORT - Add missing refcount put on 32-bit DSOs - Fix disassembly of user space binaries by setting the binary_type of DSO when loading - Update headers with the kernel sources, including asound.h, sched.h, fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's cputype.h * tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace perf build: Fix build feature-dwarf_getlocations fail for old libdw perf build: Fix static compilation error when libdw is not installed perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT tools headers arm64: Sync arm64's cputype.h with the kernel sources perf tools: Cope with differences for lib/list_sort.c copy from the kernel tools check_headers.sh: Add check variant that excludes some hunks perf beauty: Update copy of linux/socket.h with the kernel sources tools headers UAPI: Sync the linux/in.h with the kernel sources perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools include UAPI: Sync linux/fcntl.h copy with the kernel sources tools include UAPI: Sync linux/sched.h copy with the kernel sources tools include UAPI: Sync sound/asound.h copy with the kernel sources perf vdso: Missed put on 32-bit dsos perf symbol: Set binary_type of dso when loading
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro3-3/+3
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02perf cs-etm: Fix the assert() to handle captured and unprocessed cpu traceIlkka Koskinen1-1/+1
If one builds perf with DEBUG=1, captures data on multiple CPUs and finally runs 'perf report -C <cpu>' for only one of the cpus, assert() aborts the program. This happens because there are empty queues with format set. This patch changes the condition to abort only if a queue is not empty and if the format is unset. $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1 $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1 Aborted (core dumped) Fixes: 57880a7966be510c ("perf: cs-etm: Allocate queues for all CPUs") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02perf build: Fix build feature-dwarf_getlocations fail for old libdwYang Jihong1-1/+4
For libdw versions below 0.177, need to link libdl.a in addition to libbebl.a during static compilation, otherwise feature-dwarf_getlocations compilation will fail. Before: $ make LDFLAGS=-static BUILD: Doing 'make -j20' parallel build <SNIP> Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157 <SNIP> $ cat ../build/feature/test-dwarf_getlocations.make.output /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend': (.text+0x20): undefined reference to `dlclose' collect2: error: ld returned 1 exit status After: $ make LDFLAGS=-static <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> $ ./perf probe Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...] or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...] or: perf probe [<options>] --del '[GROUP:]EVENT' ... or: perf probe --list [GROUP:]EVENT ... <SNIP> Fixes: 536661da6ea18fe6 ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02perf build: Fix static compilation error when libdw is not installedYang Jihong1-1/+1
If libdw is not installed in build environment, the output of 'pkg-config --modversion libdw' is empty, causing LIBDW_VERSION_2 to be empty and the shell test will have the following error: /bin/sh: 1: test: -lt: unexpected operator Before: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found /bin/sh: 1: test: -lt: unexpected operator After: 1. libdw is not installed: $ pkg-config --modversion libdw Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Package libdw was not found in the pkg-config search path. Perhaps you should add the directory containing `libdw.pc' to the PKG_CONFIG_PATH environment variable No package 'libdw' found Makefile.config:473: No libdw DWARF unwind found, Please install elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR 2. libdw version is lower than 0.177 $ pkg-config --modversion libdw 0.176 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> INSTALL libsubcmd_headers INSTALL libapi_headers INSTALL libperf_headers INSTALL libsymbol_headers INSTALL libbpf_headers LINK perf 3. libdw version is higher than 0.177 $ pkg-config --modversion libdw 0.186 $ make LDFLAGS=-static -j16 BUILD: Doing 'make -j20' parallel build <SNIP> Auto-detecting system features: ... dwarf: [ on ] <SNIP> CC util/bpf-utils.o CC util/pfm.o LD util/perf-util-in.o LD perf-util-in.o AR libperf-util.a LINK perf Fixes: 536661da6ea18fe6 ("perf: build: Only link libebl.a for old libdw") Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240919013513.118527-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORTJames Clark1-0/+1
The linked fixes commit added an #include "dwarf-aux.h" to disasm.h which gets picked up in a lot of places. Without HAVE_DWARF_GETLOCATIONS_SUPPORT the stubs return an errno, so include errno.h to fix the following build error: In file included from util/disasm.h:8, from util/annotate.h:16, from builtin-top.c:23: util/dwarf-aux.h: In function 'die_get_var_range': util/dwarf-aux.h:183:10: error: 'ENOTSUP' undeclared (first use in this function) 183 | return -ENOTSUP; | ^~~~~~~ Fixes: 782959ac248ac3cb ("perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking") Signed-off-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241001123625.1063153-1-james.clark@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02perf tools: Cope with differences for lib/list_sort.c copy from the kernelArnaldo Carvalho de Melo2-1/+35
With 6d74e1e371d43a7b ("tools/lib/list_sort: remove redundant code for cond_resched handling") we need to use the newly added hunk based exceptions when comparing the copy we carry in tools/lib/ to the original file, do it by adding the hunks that we know will be the expected diff. If at some point the original file is updated in other parts, then we should flag and check the file for update. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-02tools check_headers.sh: Add check variant that excludes some hunksArnaldo Carvalho de Melo1-0/+24
With 6d74e1e371d43a7b ("tools/lib/list_sort: remove redundant code for cond_resched handling") we end up with a multi-line variation in the merge_final() implementation, one that the simple line based exceptions we had so far can't cope. Thus this check has been failing: Warning: Kernel ABI header differences: diff -u tools/lib/list_sort.c lib/list_sort.c So add a new check routine that uses grep -vf to exclude some hunks that we store in the tools/perf/check-header_ignore_hunks/ directory. This first patch is just the new check routine, the next one will use it to check lib/list_sort.c. Acked-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20240930202136.16904-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30perf beauty: Update copy of linux/socket.h with the kernel sourcesArnaldo Carvalho de Melo2-0/+5
To pick the changes in: 8f0b3cc9a4c102c2 ("tcp: RX path for devmem TCP") That don't result in any changes in the tables generated from that header. But while updating I noticed we need to support the new MSG_SOCK_DEVMEM flag in the hard coded table for the msg flags table, add it. This silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Please see tools/include/uapi/README for details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZvrO_eT9e_41xrNv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with ↵Arnaldo Carvalho de Melo1-2/+2
the kernel sources To pick up the change in: a1fab3e69d9d0e9b ("x86/irq: Fix comment on IRQ vector layout") That just adds some comments, so no changes in perf tooling, just silences this build warning: diff -u tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h arch/x86/include/asm/irq_vectors.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZvrKT7oQc1AOv6Vk@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30tools include UAPI: Sync linux/fcntl.h copy with the kernel sourcesArnaldo Carvalho de Melo2-24/+65
Picking the changes from: 4356d575ef0f39a3 ("fhandle: expose u64 mount id to name_to_handle_at(2)") b4fef22c2fb97fa2 ("uapi: explain how per-syscall AT_* flags should be allocated") 820a185896b77814 ("fcntl: add F_CREATED_QUERY") It just moves AT_REMOVEDIR around, and adds a bunch more AT_ for renameat2() and name_to_handle_at(). We need to improve this situation, as not all AT_ defines are applicable to all fs flags... This adds support for those new AT_ defines, addressing this build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/lkml/ZvrIKL3cREoRHIQd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30tools include UAPI: Sync linux/sched.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+1
Picking the changes from: f0e1a0643a59bf1f ("sched_ext: Implement BPF extensible scheduler class") The inclusion of the SCHED_EXT define doesn't cause any change in behaviour in tools/perf. This just silences this perf tools build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/lkml/ZvrDShNVXotZpiwk@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-30tools include UAPI: Sync sound/asound.h copy with the kernel sourcesArnaldo Carvalho de Melo1-1/+16
Picking the changes from: 37745918e0e7575b ("ALSA: timer: Introduce virtual userspace-driven timers") Which entails no changes in the tooling side as it only introduces new SNDRV_TIMER_IOCTL_ ioctls, and the ones tracked by scripts in tools/perf/trace/beauty/ are only SNDRV_PCM_IOCTL_ and SNDRV_CTL_IOCTL_, we still need to support SNDRV_TIMER_IOCTL_ ones, but that probably will be one of the first for a BTF enumeration based approach :-) This silences this perf tools build warning: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ivan Orlov <ivan.orlov0322@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/lkml/ZvrB-g_E7g2ArlYW@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-27perf vdso: Missed put on 32-bit dsosIan Rogers1-1/+3
If the dso type doesn't match then NULL is returned but the dso should be put first. Fixes: f649ed80f3cabbf1 ("perf dsos: Tidy reference counting and locking") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240912182757.762369-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-22perf symbol: Set binary_type of dso when loadingNamhyung Kim1-0/+3
For the kernel dso, it sets the binary type of dso when loading the symbol table. But it seems not to do that for user DSOs. Actually it sets the symtab type only. It's not clear why we want to maintain the two separately but it uses the binary type info before getting the disassembly. Let's use the symtab type as binary type too if it's not set. I think it's ok to set the binary type when it founds a symsrc whether or not it has actual symbols. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Alexander Monakov <amonakov@ispras.ru> Link: https://lore.kernel.org/r/20240426215139.1271039-1-namhyung@kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: LKML <linux-kernel@vger.kernel.org> Cc: <linux-perf-users@vger.kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf trace: Mark the 'head' arg in the set_robust_list syscall as coming ↵Arnaldo Carvalho de Melo1-0/+2
from user space With that it uses the generic BTF based pretty printer: This one we need to think about, not being acquainted with this syscall, should we _traverse_ that list somehow? Would that be useful? root@number:~# perf trace -e set_robust_list sleep 1 0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) = root@number:~# strace prints the default integer args: root@number:~# strace -e set_robust_list sleep 1 set_robust_list(0x7efd99559a20, 24) = 0 +++ exited with 0 +++ root@number:~# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user spaceArnaldo Carvalho de Melo1-0/+2
With that it uses the generic BTF based pretty printer: root@number:~# grep -w rseq /sys/kernel/tracing/events/syscalls/sys_enter_rseq/format field:struct rseq * rseq; offset:16; size:8; signed:0; print fmt: "rseq: 0x%08lx, rseq_len: 0x%08lx, flags: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->rseq)), ((unsigned long)(REC->rseq_len)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->sig)) root@number:~# Before: root@number:~# perf trace -e rseq 0.000 ( 0.017 ms): Isolated Web C/1195452 rseq(rseq: 0x7ff0ecfe6fe0, rseq_len: 32, sig: 1392848979) = 0 74.018 ( 0.006 ms): :1195453/1195453 rseq(rseq: 0x7f2af20fffe0, rseq_len: 32, sig: 1392848979) = 0 1817.220 ( 0.009 ms): Isolated Web C/1195454 rseq(rseq: 0x7f5c9ec7dfe0, rseq_len: 32, sig: 1392848979) = 0 2515.526 ( 0.034 ms): :1195455/1195455 rseq(rseq: 0x7f61503fffe0, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# After: root@number:~# perf trace -e rseq 0.000 ( 0.019 ms): Isolated Web C/1197258 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)4,.cpu_id = (__u32)4,.mm_cid = (__u32)5,}, rseq_len: 32, sig: 1392848979) = 0 1663.835 ( 0.019 ms): Isolated Web C/1197259 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)24,.cpu_id = (__u32)24,.mm_cid = (__u32)2,}, rseq_len: 32, sig: 1392848979) = 0 4750.444 ( 0.018 ms): Isolated Web C/1197260 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)8,.cpu_id = (__u32)8,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4994.132 ( 0.018 ms): Isolated Web C/1197261 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)10,.cpu_id = (__u32)10,.mm_cid = (__u32)1,}, rseq_len: 32, sig: 1392848979) = 0 4997.578 ( 0.011 ms): Isolated Web C/1197263 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)16,.cpu_id = (__u32)16,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4997.462 ( 0.014 ms): Isolated Web C/1197262 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)17,.cpu_id = (__u32)17,.mm_cid = (__u32)3,}, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# We'll probably need to come up with some way for using the BTF info to synthesize a test that then gets used and captures the output of the 'perf trace' output to check if the arguments are the ones synthesized, randomically, for now, lets make do manually: root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> /* Definition of SYS_* constants */ #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> /* Provide own rseq stub because glibc doesn't */ __attribute__((weak)) int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char *argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~#root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> /* Definition of SYS_* constants */ #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> /* Provide own rseq stub because glibc doesn't */ __attribute__((weak)) int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char *argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~# Interesting, glibc seems to be using rseq here, as in addition to the totally fake one this test case uses, we have this one, around these other syscalls: 0.175 ( 0.001 ms): rseq/1201095 set_tid_address(tidptr: 0x7f6def759a10) = 1201095 (rseq) 0.177 ( 0.001 ms): rseq/1201095 set_robust_list(head: 0x7f6def759a20, len: 24) = 0 0.178 ( 0.001 ms): rseq/1201095 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.231 ( 0.005 ms): rseq/1201095 mprotect(start: 0x7f6def93f000, len: 16384, prot: READ) = 0 0.238 ( 0.003 ms): rseq/1201095 mprotect(start: 0x403000, len: 4096, prot: READ) = 0 0.244 ( 0.004 ms): rseq/1201095 mprotect(start: 0x7f6def99c000, len: 8192, prot: READ) Matches strace (well, not really as the strace in fedora:40 doesn't know about rseq, printing just integer values in hex): set_robust_list(0x7fbc6acc7a20, 24) = 0 rseq(0x7fbc6acc8060, 0x20, 0, 0x53053053) = 0 mprotect(0x7fbc6aead000, 16384, PROT_READ) = 0 mprotect(0x403000, 4096, PROT_READ) = 0 mprotect(0x7fbc6af0a000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7fbc6aebd000, 81563) = 0 rseq(0x7fff15bb9920, 0x20, 0x181cd, 0xdeadbeaf) = -1 EINVAL (Invalid argument) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x9), ...}) = 0 getrandom("\xd0\x34\x97\x17\x61\xc2\x2b\x10", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x18ff4000 brk(0x19015000) = 0x19015000 write(1, "sys_rseq({ .cpu_id_start = 12, ."..., 136sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) ) = 136 exit_group(-1) = ? +++ exited with 255 +++ root@number:~# And also the focus for the v6.13 should be to have a better, strace like BTF pretty printer as one of the outputs we can get from the libbpf BTF dumper. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf env: Find correct branch counter info on hybridKan Liang5-6/+28
No event is printed in the "Branch Counter" column on hybrid machines. For example, $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter $ perf report --total-cycles # Branch counter abbr list: # cpu_core/branch-instructions/pp = A # cpu_core/branches/ = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter # ............... .............. ........... .......... .............. 44.54% 727.1K 0.00% 1 |+ |+ | 36.31% 592.7K 0.00% 2 |+ |+ | 17.83% 291.1K 0.00% 1 |+ |+ | The branch counter information (br_cntr_width and br_cntr_nr) in the perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS is not available on hybrid machines. Without the width information, the number of occurrences of an event cannot be calculated. For a hybrid machine, the caps information should be retrieved from the PMU_CAPS, and stored in the perf_env->pmu_caps. Add a perf_env__find_br_cntr_info() to return the correct branch counter information from the corresponding fields. Committer notes: While testing I couldn't s ee those "Branch counter" columns enabled by pressing 'B' on the TUI, after reporting it to the list Kan explained the situation: <quote Kan Liang> For a hybrid client, the "Branch Counter" feature is only supported starting from the just released Lunar Lake. Perf falls back to only "ANY" on your Raptor Lake. The "The branch counter is not available" message is expected. Here is the 'perf evlist' result from my Lunar Lake machine, # perf evlist -v cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS # </quote> Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf evlist: Print hint for groupKan Liang1-1/+7
An event group is a critical relationship. There is a -g option that can display the relationship. But it's hard for a user to know when should this option be applied. If there is an event group in the perf record, print a hint to suggest the user apply the -g to display the group information. With the patch, $ perf record -e "{cycles,instructions},instructions" sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (4 samples) ] $ $ perf evlist cycles instructions instructions # Tip: use 'perf evlist -g' to show group information $ perf evlist -g {cycles,instructions} instructions $ Committer testing: So for a perf.data file _with_ a group: root@number:~# perf evlist -g {cpu_core/branch-instructions/pp,cpu_core/branches/} dummy:u root@number:~# perf evlist cpu_core/branch-instructions/pp cpu_core/branches/ dummy:u # Tip: use 'perf evlist -g' to show group information root@number:~# Then for something _without_ a group, no hint: root@number:~# perf record ls <SNIP> [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ] root@number:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P dummy:u root@number:~# No suggestion, good. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/lkml/ZttgvduaKsVn1r4p@x1/ Link: https://lore.kernel.org/r/20240908202847.176280-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11tools: Drop nonsensical -O6Sam James1-5/+1
-O6 is very much not-a-thing. Really, this should've been dropped entirely in 49b3cd306e60b9d8 ("tools: Set the maximum optimization level according to the compiler being used") instead of just passing it for not-Clang. Just collapse it down to -O3, instead of "-O6 unless Clang, in which case -O3". GCC interprets > -O3 as -O3. It doesn't even interpret > -O3 as -Ofast, which is a good thing, given -Ofast has specific (non-)requirements for code built using it. So, this does nothing except look a bit daft. Remove the silliness and also save a few lines in the Makefiles accordingly. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Jesper Juhl <jesperjuhl76@gmail.com> Signed-off-by: Sam James <sam@gentoo.org> Acked-by: Namhyung Kim <namhyung@kerne