Age | Commit message (Collapse) | Author | Files | Lines |
|
I got a strange error on ARM to fail on processing FINISHED_ROUND
record. It turned out that it was failing in symbol__alloc_hist()
because the symbol size is too big.
When a sample is captured on a specific BPF program, it failed. I've
added a debug code and found the end address of the symbol is from
the next module which is placed far way.
ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock [bpf]
ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt [bpf]
ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress [bpf] <---------- here
ffffd69b7af74048-ffffd69b7af74048: $x.5 [sha3_generic]
ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init [sha3_generic]
ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update [sha3_generic]
The logic in symbols__fixup_end() just uses curr->start to update the
prev->end. But in this case, it won't work as it's too different.
I think ARM has a different kernel memory layout for modules and BPF
than on x86. Actually there's a logic to handle kernel and module
boundary. Let's do the same for symbols between different modules.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Leo Yan <leo.yan@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-31-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-30-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-29-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- tma_c01_wait and tma_c02_wait metrics measure power-performance
states.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-28-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Add metrics tma_fp_vector_128b, tma_fp_vector_256b and
tma_info_system_cpus_utilized.
- Remove metrics tma_info_system_mem_parallel_requests,
tma_info_system_core_frequency and
tma_info_system_mem_request_latency.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-27-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-26-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-25-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-24-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-23-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-22-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-21-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-20-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and
tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-19-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-18-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-17-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-16-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc and tma_info_inst_mix_ipflop.
- Removal of tma_info_bad_spec_branch_misprediction_cost.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- Tuned thresholds for tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-15-irogers@google.com
|
|
Top-Down Microarchitecture Analysis (TMA) metrics simplify
cycle-accounting using microarchitecture-abstracted metrics
organized in one hierarchy. This update is from version 4.5 to
4.7.
The update includes:
- tma_info_bottleneck* metrics, an abstraction or summarization of
the 100+ TMA tree nodes into 12-entry familiar performance metrics.
- tma_c01_wait and tma_c02_wait metrics measure power-performance
states.
- Reduce number of events (multiplexing) for tma_info_system_gflops,
tma_info_core_flopc, tma_info_inst_mix_ipflop and tma_ports_utilized_0.
- Fixes for tma_info_bottleneck_mispredictions and
tma_info_bad_spec_branch_misprediction_cost.
- New tma_info_inst_mix_ippause metric.
- tma_serializing_operation is raised to level 3.
- Swapped tma_info_core_ilp (becomes per SMT thread) and
tma_info_pipeline_execute (per physical core).
- tma_nop_instructions and tma_shuffles_256b are lowered to level 4
under tma_other_light_ops_group.
- Reduced number of events when SMT is off.
- Tuned thresholds for tma_info_bottleneck_branching_overhead,
tma_fetch_bandwidth and tma_ports_utilized_3m.
The update came from:
https://github.com/intel/perfmon/pull/140
https://github.com/intel/perfmon/pull/138
Running the script:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-14-irogers@google.com
|
|
Update alderlake events to v1.15 released in:
https://github.com/intel/perfmon/commit/282a6951fd9f025cff6c8c0ea16b1fcec786a4cd
Documentation fixes, removal of TOPDOWN.BR_MISPREDICT_SLOTS,
deprecation of UNC_ARB_DAT_REQUESTS.RD, UNC_ARB_DAT_REQUESTS.RD and
UNC_ARB_IFA_OCCUPANCY.ALL.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-13-irogers@google.com
|
|
Update skylake events to v58 released in:
https://github.com/intel/perfmon/commit/625fb7507373fef8297052c5f9af9ffe78d460c0
Improves documentation.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-12-irogers@google.com
|
|
Update sierraforest events to v1.01 released in:
https://github.com/intel/perfmon/commit/582bca24aa0d742306cd4697c5bd1b1b529aa3ce
Adds the majority of core and uncore events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-11-irogers@google.com
|
|
Update alderlake events to v1.02 released in:
https://github.com/intel/perfmon/commit/4931178d1ede1099a3e4ac7e04ed9f073e03d219
Improves documentation and removes TOPDOWN.BR_MISPREDICT_SLOTS.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-10-irogers@google.com
|
|
Update meteorlake events to v1.07 released in:
https://github.com/intel/perfmon/commit/62517223080e46bfa9a905a1195c7febae7fdb3e
Umask changed on atom mem_bound events. Adds atom events
ARITH.FPDIV_ACTIVE, FP_FLOPS_RETIRED.ALL, FP_FLOPS_RETIRED.DP,
FP_FLOPS_RETIRED.FP32, ARITH.DIV_ACTIVE, BR_INST_RETIRED.COND,
BR_INST_RETIRED.COND_TAKEN, BR_INST_RETIRED.INDIRECT,
BR_INST_RETIRED.INDIRECT_CALL, BR_INST_RETIRED.IND_CALL,
BR_INST_RETIRED.NEAR_RETURN, DTLB_LOAD_MISSES.WALK_COMPLETED_4K,
DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M,
DTLB_STORE_MISSES.WALK_COMPLETED_4K, ITLB_MISSES.WALK_COMPLETED_4K,
and alias events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-9-irogers@google.com
|
|
Update icelake events to v1.21 released in:
https://github.com/intel/perfmon/commit/54f1246b0496112c1d2b2a49e4859c85caa3dbf4
Improves descriptions, removes TOPDOWN.BR_MISPREDICT_SLOTS.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-8-irogers@google.com
|
|
Update haswell events to v35 released in:
https://github.com/intel/perfmon/commit/c0f9b34d421941bc3e13c6ca5554e6a54e8bd574
Updates "must be precise" on RTM_RETIRED.ABORTED.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-7-irogers@google.com
|
|
Update grandridge events to v1.01 released in:
https://github.com/intel/perfmon/commit/211d60716509d8248e57450e434de98cc6e511d8
Adds the majority of core and uncore events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-6-irogers@google.com
|
|
Update emeraldrapids events to v1.03 released in:
https://github.com/intel/perfmon/commit/c7c6f72dae07fee35d5982232829c0cd37f9e28e
Adds uncore CHA events.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-5-irogers@google.com
|
|
Update broadwell events to v29 released in:
https://github.com/intel/perfmon/commit/47117146c6b9e38811618beca31eba4e41c3d874
Updates "must be precise" on RTM_RETIRED.ABORTED.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-4-irogers@google.com
|
|
Update alderlaken events to v1.24 released in:
https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c
Adds LBR_INSERTS.ANY/MISC_RETIRED.LBR_INSERTS event.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-3-irogers@google.com
|
|
Update alderlake events to v1.24 released in:
https://github.com/intel/perfmon/commit/e627dd8d89e2d2110f1d499608dd6f37aae37a8c
Adds aliased events, improves documentation and fix some event fields.
Event json automatically generated by:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240214011820.644458-2-irogers@google.com
|
|
If we instead decide to generate vmlinux.h from BTF info, it will be
there:
$ pahole timespec64
struct timespec64 {
time64_t tv_sec; /* 0 8 */
long int tv_nsec; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
$
pahole manages to find it from /sys/kernel/btf/vmlinux, that is
generated from the kernel types.
With this linux/bpf.h doesn't need to be included, as its already in the
minimalistic tools/perf/util/bpf_skel/vmlinux/vmlinux.h file or what we
need comes when generating a vmlinux.h file from BTF info, i.e. when
using GEN_VMLINUX_H=1, as noticed by Namyung in a build break before
removing linux/bpf.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/Zc_fp6CgDClPhS_O@x1
|
|
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-8-mpetlan@redhat.com
|
|
Test perf interface to kprobes: listing, adding and removing probes. It
is run as a part of perftool-testsuite_probe test case.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-7-mpetlan@redhat.com
|
|
As a form of validation, it is a common practice to check the outputs
of commands whether they contain expected patterns or match a certain
regex.
Add helpers for verifying that all regexes are found in the output, that
all lines match any pattern from a set and that a certain expression is
not present in the output.
In verbose mode these helpers log mismatches for easier failure
investigation.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-6-mpetlan@redhat.com
|
|
Add new perf probe test case that acts as an entry element in perf test
list. Runs multiple subtests from directory "base_probe", which will be
added in incomming patches and can be expanded without further editing.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-5-mpetlan@redhat.com
|
|
Initialize reporting and logging functions that unifies formatting
of the test output used for shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-4-mpetlan@redhat.com
|
|
Add settings defining sample commands later shared by shell tests. This
adds the possibility to globally adjust the default values for the whole
testsuite.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240215110231.15385-3-mpetlan@redhat.com
|
|
Unify perf regexes for checking testing output into a single file
to reduce duplicates and prevent errors when editing.
This will be used in upcomming patches in shell tests.
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: kjain@linux.ibm.com
Cc: a |