Skip to content

Instantly share code, notes, and snippets.

@sskeirik
Last active April 29, 2025 21:50
Show Gist options
  • Save sskeirik/45ba28b6dc0e1dcc052c54f21d5c8b08 to your computer and use it in GitHub Desktop.
Save sskeirik/45ba28b6dc0e1dcc052c54f21d5c8b08 to your computer and use it in GitHub Desktop.
`perf` Explainer

perf Introduction

Using the perf tool centers around the following concepts:

  1. events - these are hardware interrupts, hardware counters, or software events can be tracked
  2. targets - while these CPUs, processes, or threads are executing, events will be tracked (tracking is disabled otherwise)
  3. filters - these are predicates on event contexts used to control which events are tracked in which contexts

As I understand it, perf can operate in two main modes (as indicated by the perf stat and perf record subcommands):

  1. perf stat - count events that occur as target executes, produce a set of global event counts, e.g., count the total number of CPU cycles that elapsed while a process was executing.

    This command introduces very low monitoring overhead, but does not track event contexts and thus does not support filters.

  2. perf record - count events that occur as target executes in a particular execution context (if the context satisfies the filter), produce a set of per-context event counts. Note that an execution context includes things like:

    • the name of the currently executing function (which will be unknown unless debugging symbols are present)
    • which object file contains the currently executing function

    Note that perf record cannot record and associate literally every event occurrence as they happen too frequently. Instead, it will sample events (with a user-specified frequency) and only record sampled events.

    Also note that, by default, perf record will track the event cycles which occurs once every CPU cycle. The effect of this is that it will estimate (by sampling) how many CPU cycles were spent executing each function that your target program/thread runs (or that runs on your target CPU).

For our purposes, we generally want perf record mode, as we want to know what context a particular event was associated with.

perf record

Here is a brief explainer of the perf record command (there are many more options that are not covered here):

perf record [-e <event|{event...}>] [--filter=<filter>] \
            [-F <freq>] [-c <count>]                    \
            [-g] [--call-graph fp|lbr|dwarf]            \
            [-o|--output <path>]                        \
            [-a] [-C|--cpu CPU...] [-p|--pid PID...]  [-t|--tid TID...] [--] [command]

where:

  • Line 1 options specify which events to track:

    • -e specifies the single event or brace-enclosed, comma-separated {event...} list to track

      If absent, the default event is cycles, i.e., this event occurs every CPU cycle.

    • --filter <filter> specifies the event/context filter predicate; if absent, all tracked events are counted

  • Line 2 options specify how often to track events (these are mutually exclusive):

    • -F specifies a desired frequency to track events
    • -c specifies that 1 out of every count events will be tracked

    If -F and -c are absent, I believe the default is -F 1000, but I'm not sure about this.

  • Line 3 options specify call graph tracking (this extends the execution context to include the stack trace and not just the currently executing function):

    • -g enables call graph tracking
    • --call-graph <unwind-mode> specifies how perf attempts to unwind call stacks:
      • fp - (default) efficient but based on frame pointers which doesn't work on code compiled with option -fomit-frame-pointer
      • lbr - an accurate and efficient mode that is only supported by modern CPUs
      • dwarf - accurate but slow, based on DWARF call frame information
  • Line 4 options specify where/how to record results:

    • -o - specifies path to output file where results are saved; if absent, default is perf.data
  • Line 5 options specify targets (in order of granularity):

    • -a - specifies all CPUs as targets (i.e., the entire system is profiled)
    • -C - specifies the named CPUs as targets
    • -p - specifies the named running processes as targets
    • -t - specifies the named threads as targets
    • command - executes command as a new process, specifies that process as a target, and automatically terminates when command process terminates

    Note: if command is not specified, perf will not terminate until the user presses Ctrl+C.

    To make perf record data for a fixed duration of time for a non-command target (for example, the entire system), you can use the following pattern:

    perf record -a sleep 5
    

    Since -a was passed, the entire system will be profiled; however, since <command> was passed, the profiling will terminate when the <command> process terminates (which, for sleep 5, will occur after 5 seconds). This also works for -C, -p, and -t

    If instead you did:

    perf record -a
    

    It will only terminate when Ctrl+C is pressed.

perf report

This tool consumes perf record data and visualizes it. The main options are listed below (see the man page or tutorial for more options):

perf report [-i|--input <path>] [--stdio|--tui] [-g]
  • -i specifies an input file; if absent, default is perf.data
  • --stdio generates a report file on stdout as text while --tui presents data using a terminal interface
  • -g tells the tool to visualize call graph hierarhcies

Additional Notes

  1. Since the perf command talks to low-level hardware counters (and can monitor the entire system), it typically must be run with administrator privileges, so use sudo if necessary.

  2. Note that perf is actually a part of the Linux kernel. This means, typically, you must install a kernel specific version. To do this on Ubuntu, one does:

    apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
    
  3. Data that is produced by perf record can go stale over time. I see two methods to deal with this:

    1. Generate a report using perf report ASAP after running perf record --- this trick avoids the staleness issue and is easy to do.

    2. However, there may be another possible approach that uses perf archive --- essentially, this tool scans your perf.data file, gathers debugging symbols for all of the libraries that it references, and dumps them into compressed archive. This archive can then be unpacked in the perf build-id directory, which by default, is $HOME/.debug.

  4. Running perf record in a Docker container requires building the Docker container with the --privileged flag which gives the container root-like permissions on the host system --- this means such containers should be run only when performing profiling.

Additional Resources

Of these resources, so far, I found the tutorial most helpful --- but the examples page is a nice quick reference if you know what you're doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment