linux.git/tools/lib/perf, branch v6.18.21

libperf build: Always place libperf includes first

2026-03-04T12:19:27+00:00

[ Upstream commit 8c5b40678c63be6b85f1c2dc8c8b89d632faf988 ]

When building tools/perf the CFLAGS can contain a directory for the
installed headers.

As the headers may be being installed while building libperf.a this can
cause headers to be partially installed and found in the include path
while building an object file for libperf.a.

The installed header may reference other installed headers that are
missing given the partial nature of the install and then the build fails
with a missing header file.

Avoid this by ensuring the libperf source headers are always first in
the CFLAGS.

Fixes: 3143504918105156 ("libperf: Make libperf.a part of the perf build")
Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Sasha Levin

libperf cpumap: Fix perf_cpu_map__max for an empty/NULL map

2026-01-02T11:57:02+00:00

[ Upstream commit a0a4173631bfcfd3520192c0a61cf911d6a52c3a ]

Passing an empty map to perf_cpu_map__max triggered a SEGV. Explicitly
test for the empty map.

Reported-by: Ingo Molnar 
Closes: https://lore.kernel.org/linux-perf-users/aSwt7yzFjVJCEmVp@gmail.com/
Tested-by: Ingo Molnar 
Signed-off-by: Ian Rogers 
Tested-by: Thomas Richter 
Signed-off-by: Namhyung Kim 
Signed-off-by: Sasha Levin

libperf mmap: In user mmap rdpmc avoid undefined behavior

2025-10-02T18:02:47+00:00

A shift left of a signed 64-bit s64 may overflow and result in
undefined behavior caught by ubsan. Switch to a u64 instead.

Signed-off-by: Ian Rogers 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Signed-off-by: Arnaldo Carvalho de Melo

libperf event: Ensure tracing data is multiple of 8 sized

2025-09-03T15:34:54+00:00

Perf's synthetic-events.c will ensure 8-byte alignment of tracing
data, writing it after a perf_record_header_tracing_data event.

Add padding to struct perf_record_header_tracing_data to make it 16-byte
rather than 12-byte sized.

Fixes: 055c67ed39887c55 ("perf tools: Move event synthesizing routines to separate .c file")
Reviewed-by: James Clark 
Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Athira Rajeev 
Cc: Blake Jones 
Cc: Chun-Tse Shao 
Cc: Collin Funk 
Cc: Howard Chu 
Cc: Ingo Molnar 
Cc: Jan Polensky 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Li Huafei 
Cc: Mark Rutland 
Cc: Nam Cao 
Cc: Peter Zijlstra 
Cc: Steinar H. Gunderson 
Cc: Thomas Gleixner 
Link: https://lore.kernel.org/r/20250821163820.1132977-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf parse-events: Support user CPUs mixed with threads/processes

2025-07-24T20:41:35+00:00

Counting events system-wide with a specified CPU prior to this change
worked:
```
$ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1

  Performance counter stats for 'system wide':

     59,393,419,099      msr/tsc/
     33,927,965,927      msr/tsc,cpu=cpu_core/
     25,465,608,044      msr/tsc,cpu=cpu_atom/
```

However, when counting with process the counts became system wide:
```
$ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
 10.1: Basic parsing test                                            : Ok
 10.2: Parsing without PMU name                                      : Ok
 10.3: Parsing with PMU name                                         : Ok

 Performance counter stats for 'perf test -F 10':

        59,233,549      msr/tsc/
        59,227,556      msr/tsc,cpu=cpu_core/
        59,224,053      msr/tsc,cpu=cpu_atom/
```

Make the handling of CPU maps with event parsing clearer. When an
event is parsed creating an evsel the cpus should be either the PMU's
cpumask or user specified CPUs.

Update perf_evlist__propagate_maps so that it doesn't clobber the user
specified CPUs. Try to make the behavior clearer, firstly fix up
missing cpumasks. Next, perform sanity checks and adjustments from the
global evlist CPU requests and for the PMU including simplifying to
the "any CPU"(-1) value. Finally remove the event if the cpumask is
empty.

So that events are opened with a CPU and a thread change stat's
create_perf_stat_counter to give both.

With the change things are fixed:
```
$ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
 10.1: Basic parsing test                                            : Ok
 10.2: Parsing without PMU name                                      : Ok
 10.3: Parsing with PMU name                                         : Ok

 Performance counter stats for 'perf test -F 10':

        63,704,975      msr/tsc/
        47,060,704      msr/tsc,cpu=cpu_core/                        (4.62%)
        16,640,591      msr/tsc,cpu=cpu_atom/                        (2.18%)
```

However, note the "--no-scale" option is used. This is necessary as
the running time for the event on the counter isn't the same as the
enabled time because the thread doesn't necessarily run on the CPUs
specified for the counter. All counter values are scaled with:

  scaled_value = value * time_enabled / time_running

and so without --no-scale the scaled_value becomes very large. This
problem already exists on hybrid systems for the same reason. Here are
2 runs of the same code with an instructions event that counts the
same on both types of core, there is no real multiplexing happening on
the event:

```
$ perf stat -e instructions perf test -F 10
...
 Performance counter stats for 'perf test -F 10':

        87,896,447      cpu_atom/instructions/                       (14.37%)
        98,171,964      cpu_core/instructions/                       (85.63%)
...
$ perf stat --no-scale -e instructions perf test -F 10
...
 Performance counter stats for 'perf test -F 10':

        13,069,890      cpu_atom/instructions/                       (19.32%)
        83,460,274      cpu_core/instructions/                       (80.68%)
...
```
The scaling has inflated per-PMU instruction counts and the overall
count by 2x.

To fix this the kernel needs changing when a task+CPU event (or just
task event on hybrid) is scheduled out. A fix could be that the state
isn't inactive but off for such events, so that time_enabled counts
don't accumulate on them.

Reviewed-by: Thomas Falcon 
Signed-off-by: Ian Rogers 
Link: https://lore.kernel.org/r/20250719030517.1990983-13-irogers@google.com
Signed-off-by: Namhyung Kim

libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete

2025-07-24T20:41:35+00:00

This allows the perf_evsel__exit to be called when the struct
perf_evsel is embedded inside another struct, such as struct evsel in
perf.

Reviewed-by: Thomas Falcon 
Signed-off-by: Ian Rogers 
Tested-by: James Clark 
Link: https://lore.kernel.org/r/20250719030517.1990983-8-irogers@google.com
Signed-off-by: Namhyung Kim

libperf evsel: Rename own_cpus to pmu_cpus

2025-07-24T20:41:35+00:00

own_cpus is generally the cpumask from the PMU. Rename to pmu_cpus to
try to make this clearer. Variable rename with no other changes.

Reviewed-by: Thomas Falcon 
Signed-off-by: Ian Rogers 
Tested-by: James Clark 
Link: https://lore.kernel.org/r/20250719030517.1990983-7-irogers@google.com
Signed-off-by: Namhyung Kim

libperf evsel: Add missed puts and asserts

2025-06-24T17:27:51+00:00

A missed evsel__close before evsel__delete was the source of leaking
perf events due to a hybrid test. Add asserts in debug builds so that
this shouldn't happen in the future. Add puts missing on the cpu map
and thread maps.

Signed-off-by: Ian Rogers 
Link: https://lore.kernel.org/r/20250617223356.2752099-4-irogers@google.com
Signed-off-by: Namhyung Kim

perf record: collect BPF metadata from existing BPF programs

2025-06-20T21:48:35+00:00

Look for .rodata maps, find ones with 'bpf_metadata_' variables, extract
their values as strings, and create a new PERF_RECORD_BPF_METADATA
synthetic event using that data. The code gets invoked from the existing
routine perf_event__synthesize_one_bpf_prog().

For example, a BPF program with the following variables:

    const char bpf_metadata_version[] SEC(".rodata") = "3.14159";
    int bpf_metadata_value[] SEC(".rodata") = 42;

would generate a PERF_RECORD_BPF_METADATA record with:

    .prog_name        = 
    .nr_entries       = 2
    .entries[0].key   = "version"
    .entries[0].value = "3.14159"
    .entries[1].key   = "value"
    .entries[1].value = "42"

Each of the BPF programs and subprograms that share those variables would
get a distinct PERF_RECORD_BPF_METADATA record, with the ".prog_name"
showing the name of each program or subprogram. The prog_name is
deliberately the same as the ".name" field in the corresponding
PERF_RECORD_KSYMBOL record.

This code only gets invoked if support for displaying BTF char arrays
as strings is detected.

Signed-off-by: Blake Jones 
Link: https://lore.kernel.org/r/20250612194939.162730-3-blakejones@google.com
Signed-off-by: Namhyung Kim

libperf threadmap: Add perf_thread_map__idx()

2025-05-21T18:07:13+00:00

Allow computation of thread map index from a PID.

Note, with a 'struct perf_cpu_map' the sorted nature allows for a binary
search to compute the index which isn't currently possible with a
'struct perf_thread_map' as they aren't guaranteed sorted.

Signed-off-by: Ian Rogers 
Acked-by: Gautam Menghani 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Howard Chu 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Madhavan Srinivasan 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20250519195148.1708988-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo