Skip to content

Instantly share code, notes, and snippets.

@andrewhodel
Last active January 13, 2025 21:17
Show Gist options
  • Save andrewhodel/ed7625a14eb87404cafd37493849d1ba to your computer and use it in GitHub Desktop.
Save andrewhodel/ed7625a14eb87404cafd37493849d1ba to your computer and use it in GitHub Desktop.
Go profiling with pprof

Importing net/http/pprof adds HTTP endpoints to an existing HTTP or HTTPS server to serve profile files that can be viewed or charted with a command line tool.

import pprof into the go program

import _ "net/http/pprof"

If your application is not already running an http or https server, add "net/http" to the program imports and the following code to your main function:

go http.ListenAndServe(":8550", nil)

open profile files from the http server with pprof

pprof is a tool that runs on any machine with go and fetches the profile files from the server you placed in your program via http.

memory heap:

go tool pprof "https://127.0.0.1:8550/debug/pprof/heap"

goroutine stack:

go tool pprof "https://127.0.0.1:8550/debug/pprof/goroutine"

30-second CPU profile, after calling runtime.SetCPUProfileRate(27) in your program

# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL

# SetCPUProfileRate sets the CPU profiling rate to hz samples per second. If hz <= 0, SetCPUProfileRate turns off profiling.
# If the profiler is on, the rate cannot be changed without first turning it off.

runtime.SetCPUProfileRate(27)

go tool pprof "https://127.0.0.1:8550/debug/pprof/profile?seconds=30"

goroutine blocking, after calling runtime.SetBlockProfileRate(1) in your program:

# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL

# SetBlockProfileRate controls the fraction of goroutine blocking events that are reported in the blocking profile.
# The profiler aims to sample an average of one blocking event per rate nanoseconds spent blocked.

# To include every blocking event in the profile, pass rate = 1. To turn off profiling entirely, pass rate <= 0.

runtime.SetBlockProfileRate(1)

go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/block"

holders of contended mutexes, after calling runtime.SetMutexProfileFraction(1) in your program:

# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL

# SetMutexProfileFraction controls the fraction of mutex contention events that are reported in the mutex profile.
# On average 1/rate events are reported. The previous rate is returned.

# To turn off profiling entirely, pass rate 0. To just read the current rate, pass rate < 0. (For n>1 the details of sampling may change.)

# someone actually wrote, "To just read the current rate, pass rate < 0." in the documentation.  You need a rate greater than 0 to gather the contended mutex data.  Unbelievable.

# if you use a number greater than 1, not all events will be reported because the "fraction" will be less than 1/1.  A value of 1 means report 100% of the events at a 1:1 ratio of time.

# For example, if a caller holds a lock for 1s while 5 other goroutines are waiting for the entire second to acquire the lock, its unlock call stack will report 5s of contention.

runtime.SetMutexProfileFraction(1)

go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/mutex"

With the contended mutex report you will quickly know if there are contended mutexes by the time in the left column.

Showing nodes accounting for 86.11hrs, 99.49% of 86.55hrs total
Dropped 117 nodes (cum <= 0.43hrs)
      flat  flat%   sum%        cum   cum%
  51.76hrs 59.81% 59.81%   51.76hrs 59.81%  runtime.selectgo
  34.35hrs 39.69% 99.49%   34.35hrs 39.69%  sync.(*Cond).Wait

To find the actual contended mutex by line number, download the goroutine stack and look at every running goroutine with curl -o goroutine_stack.txt "https://monitor.xyzbots.com:8550/debug/pprof/goroutine?debug=2" then less goroutine_stack.txt.

Start at the top of the file and search for Lock, you will find something similar to goroutine 1242 [sync.Mutex.Lock, 1039 minutes]:. Near the end of that line there is a time showing how long the goroutine has been waiting. Beneath that line is the function trace with line numbers that allows you find the contended mutex.

The time that the mutex is locked is not available when you get the stack trace with hold ctrl and press \ to exit the running program.

go tool pprof

This is an example with the memory heap.

go tool is a command line tool, once it is open type help for more information.

------------------------------------------------------------------------------

IMPORTANT!

Use -sample_index=alloc_space, the default -sample_index=inuse_space does not show all memory leaks causing out of memory.

go tool pprof -sample_index=alloc_space "https://monitor.xyzbots.com:8550/debug/pprof/heap"

Remember that the GC can be greedy and that you can use swap space in Linux if your program has large data peaks.

cat /etc/fstab

------------------------------------------------------------------------------

This is example output from the memory heap profiling tool in pprof.

go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/heap"
Fetching profile over HTTP from https://monitor.xyzbots.com:8550/debug/pprof/heap
Saved profile in /Users/zip/pprof/pprof.listener.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: listener
Type: inuse_space
Time: Aug 5, 2023 at 7:18pm (CDT)
Entering interactive mode (type "help" for commands, "o" for options)

top - show functions using the most memory

(pprof) top
Showing nodes accounting for 159.41MB, 98.13% of 162.44MB total
Dropped 25 nodes (cum <= 0.81MB)
Showing top 10 nodes out of 33
      flat  flat%   sum%        cum   cum%
  155.61MB 95.80% 95.80%   155.61MB 95.80%  reflect.unsafe_NewArray
    3.29MB  2.03% 97.82%     3.29MB  2.03%  encoding/gob.(*encBuffer).writeByte
    0.50MB  0.31% 98.13%        1MB  0.62%  main.collectors_read_journal_entry
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).Decode
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).DecodeValue
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).decOpFor.func2
         0     0% 98.13%   155.61MB 95.80%  encoding/gob.(*Decoder).decOpFor.func3
         0     0% 98.13%   155.61MB 95.80%  encoding/gob.(*Decoder).decOpFor.func4
         0     0% 98.13%    74.01MB 45.56%  encoding/gob.(*Decoder).decodeArrayHelper
         0     0% 98.13%   156.11MB 96.11%  encoding/gob.(*Decoder).decodeMap

tree - show a text rendering of call graph

peek - show callers/callees of functions

(pprof) peek reflect.unsafe_NewArray
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                          155.61MB   100% |   reflect.MakeSlice
  155.61MB 95.80% 95.80%   155.61MB 95.80%                | reflect.unsafe_NewArray
----------------------------------------------------------+-------------
(pprof) peek reflect.MakeSlice
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context 	 	 
----------------------------------------------------------+-------------
                                          155.61MB   100% |   encoding/gob.(*Decoder).decodeSlice
         0     0%     0%   155.61MB 95.80%                | reflect.MakeSlice
                                          155.61MB   100% |   reflect.unsafe_NewArray
----------------------------------------------------------+-------------

go tool trace, data similar to strace's output

trace data with the go tool trace command. To collect a 5-second execution trace:

curl -o trace.out "https://monitor.xyzbots.com:8550/debug/pprof/trace?seconds=5"
go tool trace trace.out

end program and output goroutine stack

Hold ctrl and press \.

downloading the goroutine stack

download the full goroutine stack, similar to the output from a go program that crashes that lists every running goroutine.

curl -o goroutine_stack.txt "https://monitor.xyzbots.com:8550/debug/pprof/goroutine?debug=2"

This is useful for solving the error runtime: goroutine stack exceeds 1000000000-byte limit. Functions must use the go keyword when called from inside the same function.

# this fails with runtime: goroutine stack exceeds 1000000000-byte limit
func a() {
	if (queue_item_exists == true) {
		// process queue item
		a()
	} else {
		time.Sleep(time.Second)
		a()
	}
}

# this works
func a() {
	if (queue_item_exists == true) {
		// process queue item
		go a()
	} else {
		time.Sleep(time.Second)
		go a()
	}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment