linux.git/tools/perf/Documentation/perf-ftrace.txt, branch v6.18

perf: ftrace: add graph tracer options args/retval/retval-hex/retaddr

2025-07-23T00:47:22+00:00

This change adds support for new funcgraph tracer options funcgraph-args,
funcgraph-retval, funcgraph-retval-hex and funcgraph-retaddr.

The new added options are:
  - args       : Show function arguments.
  - retval     : Show function return value.
  - retval-hex : Show function return value in hexadecimal format.
  - retaddr    : Show function return address.

 # ./perf ftrace -G vfs_write --graph-opts retval,retaddr
 # tracer: function_graph
 #
 # CPU  DURATION                  FUNCTION CALLS
 # |     |   |                     |   |   |   |
 5)               |  mutex_unlock() { /* <-rb_simple_write+0xda/0x150 */
 5)   0.188 us    |    local_clock(); /* <-lock_release+0x2ad/0x440 ret=0x3bf2a3cf90e */
 5)               |    rt_mutex_slowunlock() { /* <-rb_simple_write+0xda/0x150 */
 5)               |      _raw_spin_lock_irqsave() { /* <-rt_mutex_slowunlock+0x4f/0x200 */
 5)   0.123 us    |        preempt_count_add(); /* <-_raw_spin_lock_irqsave+0x23/0x90 ret=0x0 */
 5)   0.128 us    |        local_clock(); /* <-__lock_acquire.isra.0+0x17a/0x740 ret=0x3bf2a3cfc8b */
 5)   0.086 us    |        do_raw_spin_trylock(); /* <-_raw_spin_lock_irqsave+0x4a/0x90 ret=0x1 */
 5)   0.845 us    |      } /* _raw_spin_lock_irqsave ret=0x292 */
 5)               |      _raw_spin_unlock_irqrestore() { /* <-rt_mutex_slowunlock+0x191/0x200 */
 5)   0.097 us    |        local_clock(); /* <-lock_release+0x2ad/0x440 ret=0x3bf2a3cff1f */
 5)   0.086 us    |        do_raw_spin_unlock(); /* <-_raw_spin_unlock_irqrestore+0x23/0x60 ret=0x1 */
 5)   0.104 us    |        preempt_count_sub(); /* <-_raw_spin_unlock_irqrestore+0x35/0x60 ret=0x0 */
 5)   0.726 us    |      } /* _raw_spin_unlock_irqrestore ret=0x80000000 */
 5)   1.881 us    |    } /* rt_mutex_slowunlock ret=0x0 */
 5)   2.931 us    |  } /* mutex_unlock ret=0x0 */

Signed-off-by: Changbin Du 
Reviewed-by: Ian Rogers 
Link: https://lore.kernel.org/r/20250613114048.132336-1-changbin.du@huawei.com
Signed-off-by: Namhyung Kim

perf ftrace latency: Add -e option to measure time between two events

2025-07-15T05:51:58+00:00

In addition to the function latency, it can measure events latencies.
Some kernel tracepoints are paired and it's menningful to measure how
long it takes between the two events.  The latency is tracked for the
same thread.

Currently it only uses BPF to do the work but it can be lifted later.
Instead of having separate a BPF program for each tracepoint, it only
uses generic 'event_begin' and 'event_end' programs to attach to any
(raw) tracepoints.

  $ sudo perf ftrace latency -a -b --hide-empty \
    -e i915_request_wait_begin,i915_request_wait_end -- sleep 1
  #   DURATION     |      COUNT | GRAPH                                |
     256 -  512 us |          4 | ######                               |
       2 -    4 ms |          2 | ###                                  |
       4 -    8 ms |         12 | ###################                  |
       8 -   16 ms |         10 | ################                     |

  # statistics  (in usec)
    total time:               194915
      avg time:                 6961
      max time:                12855
      min time:                  373
         count:                   28

Reviewed-by: Ian Rogers 
Link: https://lore.kernel.org/r/20250714052143.342851-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim

perf ftrace profile: Add --graph-opts option

2025-01-08T20:20:42+00:00

Like trace subcommand, it should be able to pass some options to control
the tracing behavior for the function graph tracer.

But some options are limited in order to maintain the internal behavior.

For example, it can limit the function call depth like below:

  # perf ftrace profile --graph-opts depth=5 -- myprog

Committer testing:

  root@number:~# perf ftrace profile --graph-opts thresh=1000 -- sleep 1
  # Total (us)   Avg (us)   Max (us)      Count   Function
   1001419.301 500709.650 1000032.000          2   x64_sys_call
   1000032.000 1000032.000 1000032.000          1   __x64_sys_clock_nanosleep
   1000032.000 1000032.000 1000032.000          1   common_nsleep
   1000031.000 1000031.000 1000031.000          1   do_nanosleep
   1000031.000 1000031.000 1000031.000          1   hrtimer_nanosleep
   1000024.000 1000024.000 1000024.000          1   schedule
      1387.208   1387.208   1387.208          1   __x64_sys_execve
      1386.691   1386.691   1386.691          1   do_execveat_common.isra.0
      1334.170   1334.170   1334.170          1   bprm_execve
      1258.413   1258.413   1258.413          1   load_elf_binary
      1123.068   1123.068   1123.068          1   begin_new_exec
      1113.550   1113.550   1113.550          1   mmput
      1109.237   1109.237   1109.237          1   exit_mmap
  root@number:~# perf ftrace profile --graph-opts thresh=1200 -- sleep 1
  # Total (us)   Avg (us)   Max (us)      Count   Function
   1001448.204 500724.102 1000018.000          2   x64_sys_call
   1000017.000 1000017.000 1000017.000          1   __x64_sys_clock_nanosleep
   1000017.000 1000017.000 1000017.000          1   common_nsleep
   1000017.000 1000017.000 1000017.000          1   hrtimer_nanosleep
   1000016.000 1000016.000 1000016.000          1   do_nanosleep
   1000012.000 1000012.000 1000012.000          1   schedule
      1430.112   1430.112   1430.112          1   __x64_sys_execve
      1429.581   1429.581   1429.581          1   do_execveat_common.isra.0
      1376.289   1376.289   1376.289          1   bprm_execve
      1301.743   1301.743   1301.743          1   load_elf_binary
  root@number:~#

Reviewed-by: James Clark 
Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20250107224352.1128669-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace latency: Add --max-latency option

2024-12-10T18:16:40+00:00

This patch adds a max-latency option as discussed, in case the number of
buckets is more than 22, we don't observe the setting (for now, let's
say).

By default or if 0 is passed, the value is automatically determined
based on the number of buckets, range and minimum, so that we fill all
available buffers (equivalent to the behaviour before this patch).

We now get something like this:

  # perf ftrace latency --bucket-range=20 \
			--min-latency 10 \
			--max-latency=100 \
			-T switch_mm_irqs_off -a sleep 2
  #   DURATION     |      COUNT | GRAPH             |
       0 -   10 us |       1731 | ################  |
      10 -   30 us |          1 |                   |
      30 -   50 us |          0 |                   |
      50 -   70 us |          0 |                   |
      70 -   90 us |          0 |                   |
      90 -  100 us |          0 |                   |
     100 -  ... us |          0 |                   |

Note the maximum is observed also if it doesn't cover completely a full
range (the second to last range is 10us long to let the last start at
100 sharp), this looks to me more sensible and eases the computations,
since we don't need to account for the range while filling the buckets.

Signed-off-by: Gabriele Monaco 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Clark Williams 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Namhyung Kim 
Cc: Thomas Gleixner 
Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace latency: Introduce --min-latency to narrow down into a latency range

2024-12-10T18:16:27+00:00

Things below and over will be in the first and last, outlier, buckets.

Without it:

  # perf ftrace latency --use-nsec --use-bpf \
			--bucket-range=200 \
			-T switch_mm_irqs_off -a sleep 2
  #   DURATION     |      COUNT | GRAPH                                   |
       0 -  200 ns |          0 |                                         |
     200 -  400 ns |         44 |                                         |
     400 -  600 ns |        291 | #                                       |
     600 -  800 ns |        506 | ##                                      |
     800 - 1000 ns |        148 |                                         |
    1.00 - 1.20 us |        581 | ##                                      |
    1.20 - 1.40 us |       2199 | ##########                              |
    1.40 - 1.60 us |       1048 | ####                                    |
    1.60 - 1.80 us |       1448 | ######                                  |
    1.80 - 2.00 us |       1091 | #####                                   |
    2.00 - 2.20 us |        517 | ##                                      |
    2.20 - 2.40 us |        318 | #                                       |
    2.40 - 2.60 us |        370 | #                                       |
    2.60 - 2.80 us |        271 | #                                       |
    2.80 - 3.00 us |        150 |                                         |
    3.00 - 3.20 us |         85 |                                         |
    3.20 - 3.40 us |         48 |                                         |
    3.40 - 3.60 us |         40 |                                         |
    3.60 - 3.80 us |         22 |                                         |
    3.80 - 4.00 us |         13 |                                         |
    4.00 - 4.20 us |         14 |                                         |
    4.20 - ...  us |        626 | ##                                      |
  #
  # perf ftrace latency --use-nsec --use-bpf \
			--bucket-range=20 --min-latency=1200 \
			-T switch_mm_irqs_off -a sleep 2
  #   DURATION     |      COUNT | GRAPH                                   |
       0 - 1200 ns |       1243 | #####                                   |
    1.20 - 1.22 us |        141 |                                         |
    1.22 - 1.24 us |        202 |                                         |
    1.24 - 1.26 us |        209 |                                         |
    1.26 - 1.28 us |        219 |                                         |
    1.28 - 1.30 us |        208 |                                         |
    1.30 - 1.32 us |        245 | #                                       |
    1.32 - 1.34 us |        246 | #                                       |
    1.34 - 1.36 us |        224 | #                                       |
    1.36 - 1.38 us |        219 |                                         |
    1.38 - 1.40 us |        206 |                                         |
    1.40 - 1.42 us |        190 |                                         |
    1.42 - 1.44 us |        190 |                                         |
    1.44 - 1.46 us |        146 |                                         |
    1.46 - 1.48 us |        140 |                                         |
    1.48 - 1.50 us |        125 |                                         |
    1.50 - 1.52 us |        115 |                                         |
    1.52 - 1.54 us |        102 |                                         |
    1.54 - 1.56 us |         87 |                                         |
    1.56 - 1.58 us |         90 |                                         |
    1.58 - 1.60 us |         85 |                                         |
    1.60 - ...  us |       5487 | ########################                |
  #

Now we want focus on the latencies starting at 1.2us, with a finer
grained range of 20ns:

This is all on a live system, so statistically interesting, but not
narrowing down on the same numbers, so a 'perf ftrace latency record'
seems interesting to then use all on the same snapshot of latencies.

A --max-latency counterpart should come next, at first limiting the
max-latency to 20 * bucket-size, as we have a fixed buckets array with
20 + 2 entries (+ for the outliers) and thus would need to make it
larger for higher latencies.

We also may need a way to ask for not considering the out of range
values (first and last buckets) when drawing the buckets bars.

Co-developed-by: Gabriele Monaco 
Cc: Adrian Hunter 
Cc: Clark Williams 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Namhyung Kim 
Cc: Thomas Gleixner 
Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org
Signed-off-by: Gabriele Monaco 
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace latency: Introduce --bucket-range to ask for linear bucketing

2024-12-10T18:16:01+00:00

In addition to showing it exponentially, using log2() to figure out the
histogram index, allow for showing it linearly:

The preexisting more, the default:

  # perf ftrace latency --use-nsec --use-bpf \
  			-T switch_mm_irqs_off -a sleep 2
  #   DURATION     |      COUNT | GRAPH                                   |
       0 -    1 ns |          0 |                                         |
       1 -    2 ns |          0 |                                         |
       2 -    4 ns |          0 |                                         |
       4 -    8 ns |          0 |                                         |
       8 -   16 ns |          0 |                                         |
      16 -   32 ns |          0 |                                         |
      32 -   64 ns |          0 |                                         |
      64 -  128 ns |        238 | #                                       |
     128 -  256 ns |       1704 | ##########                              |
     256 -  512 ns |        672 | ###                                     |
     512 - 1024 ns |       4458 | ##########################              |
       1 -    2 us |        677 | ####                                    |
       2 -    4 us |          5 |                                         |
       4 -    8 us |          0 |                                         |
       8 -   16 us |          0 |                                         |
      16 -   32 us |          0 |                                         |
      32 -   64 us |          0 |                                         |
      64 -  128 us |          0 |                                         |
     128 -  256 us |          0 |                                         |
     256 -  512 us |          0 |                                         |
     512 - 1024 us |          0 |                                         |
       1 - ...  ms |          0 |                                         |
  #

The new histogram mode:

  # perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \
  			-T switch_mm_irqs_off -a sleep 2
  #   DURATION     |      COUNT | GRAPH                                   |
       0 -    1 ns |          0 |                                         |
       1 -  151 ns |        265 | #                                       |
     151 -  301 ns |       1797 | ###########                             |
     301 -  451 ns |        258 | #                                       |
     451 -  601 ns |        289 | #                                       |
     601 -  751 ns |       2049 | #############                           |
     751 -  901 ns |        967 | ######                                  |
     901 - 1051 ns |        513 | ###                                     |
    1.05 - 1.20 us |        114 |                                         |
    1.20 - 1.35 us |        559 | ###                                     |
    1.35 - 1.50 us |        189 | #                                       |
    1.50 - 1.65 us |        137 |                                         |
    1.65 - 1.80 us |         32 |                                         |
    1.80 - 1.95 us |          2 |                                         |
    1.95 - 2.10 us |          0 |                                         |
    2.10 - 2.25 us |          1 |                                         |
    2.25 - 2.40 us |          1 |                                         |
    2.40 - 2.55 us |          0 |                                         |
    2.55 - 2.70 us |          0 |                                         |
    2.70 - 2.85 us |          0 |                                         |
    2.85 - 3.00 us |          1 |                                         |
    3.00 - ...  us |          4 |                                         |
  #

Co-developed-by: Gabriele Monaco 
Cc: Adrian Hunter 
Cc: Clark Williams 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Namhyung Kim 
Cc: Thomas Gleixner 
Link: https://lore.kernel.org/r/20241112181214.1171244-3-acme@kernel.org
Signed-off-by: Gabriele Monaco 
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace profile: Add -s/--sort option

2024-07-31T19:58:18+00:00

The -s/--sort option is to sort the output by given column.

  $ sudo perf ftrace profile -s max sync | head
  # Total (us)   Avg (us)   Max (us)      Count   Function
      6301.811   6301.811   6301.811          1   __do_sys_sync
      6301.328   6301.328   6301.328          1   ksys_sync
      5320.300   1773.433   2858.819          3   iterate_supers
      2755.875     17.012   2610.633        162   sync_fs_one_sb
      2728.351    682.088   2610.413          4   ext4_sync_fs [ext4]
      2603.654   2603.654   2603.654          1   jbd2_log_wait_commit [jbd2]
      4750.615    593.827   2597.427          8   schedule
      2164.986     26.728   2115.673         81   sync_inodes_one_sb
      2143.842     26.467   2115.438         81   sync_inodes_sb

Reviewed-by: Ian Rogers 
Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Changbin Du 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Cc: Steven Rostedt (VMware) 
Link: https://lore.kernel.org/lkml/20240729004127.238611-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace: Add 'profile' command

2024-07-31T19:58:18+00:00

The 'perf ftrace profile' command is to get function execution profiles
using function-graph tracer so that users can see the total, average,
max execution time as well as the number of invocations easily.

The following is a profile for the perf_event_open syscall.

  $ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \
    perf stat -e cycles -C1 true 2> /dev/null | head
  # Total (us)   Avg (us)   Max (us)      Count   Function
        65.611     65.611     65.611          1   __x64_sys_perf_event_open
        30.527     30.527     30.527          1   anon_inode_getfile
        30.260     30.260     30.260          1   __anon_inode_getfile
        29.700     29.700     29.700          1   alloc_file_pseudo
        17.578     17.578     17.578          1   d_alloc_pseudo
        17.382     17.382     17.382          1   __d_alloc
        16.738     16.738     16.738          1   kmem_cache_alloc_lru
        15.686     15.686     15.686          1   perf_event_alloc
        14.012      7.006     11.264          2   obj_cgroup_charge
  #

Reviewed-by: Ian Rogers 
Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Changbin Du 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Cc: Steven Rostedt (VMware) 
Link: https://lore.kernel.org/lkml/20240729004127.238611-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf ftrace: Add 'tail' option to --graph-opts

2024-07-31T19:58:18+00:00

The 'graph-tail' option is to print function name as a comment at the end.
This is useful when a large function is mixed with other functions
(possibly from different CPUs).

For example,

  $ sudo perf ftrace -- perf stat true
  ...
   1)               |    get_unused_fd_flags() {
   1)               |      alloc_fd() {
   1)   0.178 us    |        _raw_spin_lock();
   1)   0.187 us    |        expand_files();
   1)   0.169 us    |        _raw_spin_unlock();
   1)   1.211 us    |      }
   1)   1.503 us    |    }

  $ sudo perf ftrace --graph-opts tail -- perf stat true
  ...
   1)               |    get_unused_fd_flags() {
   1)               |      alloc_fd() {
   1)   0.099 us    |        _raw_spin_lock();
   1)   0.083 us    |        expand_files();
   1)   0.081 us    |        _raw_spin_unlock();
   1)   0.601 us    |      } /* alloc_fd */
   1)   0.751 us    |    } /* get_unused_fd_flags */

Reviewed-by: Ian Rogers 
Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Changbin Du 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Cc: Steven Rostedt (VMware) 
Link: https://lore.kernel.org/lkml/20240729004127.238611-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf docs: Fix format of unordered lists

2023-08-16T11:37:49+00:00

Fix the format of unordered lists so the can wrap properly.

Signed-off-by: Changbin Du 
Acked-by: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20230718085242.3090797-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo