Importing net/http/pprof
adds HTTP endpoints to an existing HTTP or HTTPS server to serve profile files that can be viewed or charted with a command line tool.
import _ "net/http/pprof"
If your application is not already running an http or https server, add "net/http" to the program imports and the following code to your main function:
go http.ListenAndServe(":8550", nil)
pprof
is a tool that runs on any machine with go
and fetches the profile files from the server you placed in your program via http
.
memory heap:
go tool pprof "https://127.0.0.1:8550/debug/pprof/heap"
goroutine stack:
go tool pprof "https://127.0.0.1:8550/debug/pprof/goroutine"
30-second CPU profile, after calling runtime.SetCPUProfileRate(27) in your program
# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL
# SetCPUProfileRate sets the CPU profiling rate to hz samples per second. If hz <= 0, SetCPUProfileRate turns off profiling.
# If the profiler is on, the rate cannot be changed without first turning it off.
runtime.SetCPUProfileRate(27)
go tool pprof "https://127.0.0.1:8550/debug/pprof/profile?seconds=30"
goroutine blocking, after calling runtime.SetBlockProfileRate(1) in your program:
# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL
# SetBlockProfileRate controls the fraction of goroutine blocking events that are reported in the blocking profile.
# The profiler aims to sample an average of one blocking event per rate nanoseconds spent blocked.
# To include every blocking event in the profile, pass rate = 1. To turn off profiling entirely, pass rate <= 0.
runtime.SetBlockProfileRate(1)
go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/block"
holders of contended mutexes, after calling runtime.SetMutexProfileFraction(1) in your program:
# enable with this early in func main() {}
# OR THIS DATA WILL NOT EXIST AT THE URL
# SetMutexProfileFraction controls the fraction of mutex contention events that are reported in the mutex profile.
# On average 1/rate events are reported. The previous rate is returned.
# To turn off profiling entirely, pass rate 0. To just read the current rate, pass rate < 0. (For n>1 the details of sampling may change.)
# someone actually wrote, "To just read the current rate, pass rate < 0." in the documentation. You need a rate greater than 0 to gather the contended mutex data. Unbelievable.
# if you use a number greater than 1, not all events will be reported because the "fraction" will be less than 1/1. A value of 1 means report 100% of the events at a 1:1 ratio of time.
# For example, if a caller holds a lock for 1s while 5 other goroutines are waiting for the entire second to acquire the lock, its unlock call stack will report 5s of contention.
runtime.SetMutexProfileFraction(1)
go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/mutex"
With the contended mutex report you will quickly know if there are contended mutexes by the time in the left column.
Showing nodes accounting for 86.11hrs, 99.49% of 86.55hrs total
Dropped 117 nodes (cum <= 0.43hrs)
flat flat% sum% cum cum%
51.76hrs 59.81% 59.81% 51.76hrs 59.81% runtime.selectgo
34.35hrs 39.69% 99.49% 34.35hrs 39.69% sync.(*Cond).Wait
To find the actual contended mutex by line number, download the goroutine stack and look at every running goroutine with curl -o goroutine_stack.txt "https://monitor.xyzbots.com:8550/debug/pprof/goroutine?debug=2"
then less goroutine_stack.txt
.
Start at the top of the file and search for Lock
, you will find something similar to goroutine 1242 [sync.Mutex.Lock, 1039 minutes]:
. Near the end of that line there is a time showing how long the goroutine has been waiting. Beneath that line is the function trace with line numbers that allows you find the contended mutex.
The time that the mutex is locked is not available when you get the stack trace with hold ctrl
and press \
to exit the running program.
This is an example with the memory heap.
go tool
is a command line tool, once it is open type help for more information.
------------------------------------------------------------------------------
IMPORTANT!
Use -sample_index=alloc_space
, the default -sample_index=inuse_space
does not show all memory leaks causing out of memory.
go tool pprof -sample_index=alloc_space "https://monitor.xyzbots.com:8550/debug/pprof/heap"
Remember that the GC can be greedy and that you can use swap space in Linux if your program has large data peaks.
cat /etc/fstab
------------------------------------------------------------------------------
This is example output from the memory heap profiling tool in pprof
.
go tool pprof "https://monitor.xyzbots.com:8550/debug/pprof/heap"
Fetching profile over HTTP from https://monitor.xyzbots.com:8550/debug/pprof/heap
Saved profile in /Users/zip/pprof/pprof.listener.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: listener
Type: inuse_space
Time: Aug 5, 2023 at 7:18pm (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
top - show functions using the most memory
(pprof) top
Showing nodes accounting for 159.41MB, 98.13% of 162.44MB total
Dropped 25 nodes (cum <= 0.81MB)
Showing top 10 nodes out of 33
flat flat% sum% cum cum%
155.61MB 95.80% 95.80% 155.61MB 95.80% reflect.unsafe_NewArray
3.29MB 2.03% 97.82% 3.29MB 2.03% encoding/gob.(*encBuffer).writeByte
0.50MB 0.31% 98.13% 1MB 0.62% main.collectors_read_journal_entry
0 0% 98.13% 156.11MB 96.11% encoding/gob.(*Decoder).Decode
0 0% 98.13% 156.11MB 96.11% encoding/gob.(*Decoder).DecodeValue
0 0% 98.13% 156.11MB 96.11% encoding/gob.(*Decoder).decOpFor.func2
0 0% 98.13% 155.61MB 95.80% encoding/gob.(*Decoder).decOpFor.func3
0 0% 98.13% 155.61MB 95.80% encoding/gob.(*Decoder).decOpFor.func4
0 0% 98.13% 74.01MB 45.56% encoding/gob.(*Decoder).decodeArrayHelper
0 0% 98.13% 156.11MB 96.11% encoding/gob.(*Decoder).decodeMap
tree - show a text rendering of call graph
peek - show callers/callees of functions
(pprof) peek reflect.unsafe_NewArray
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
155.61MB 100% | reflect.MakeSlice
155.61MB 95.80% 95.80% 155.61MB 95.80% | reflect.unsafe_NewArray
----------------------------------------------------------+-------------
(pprof) peek reflect.MakeSlice
Showing nodes accounting for 162.44MB, 100% of 162.44MB total
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
155.61MB 100% | encoding/gob.(*Decoder).decodeSlice
0 0% 0% 155.61MB 95.80% | reflect.MakeSlice
155.61MB 100% | reflect.unsafe_NewArray
----------------------------------------------------------+-------------
trace data with the go tool trace
command. To collect a 5-second execution trace:
curl -o trace.out "https://monitor.xyzbots.com:8550/debug/pprof/trace?seconds=5"
go tool trace trace.out
Hold ctrl
and press \
.
download the full goroutine stack, similar to the output from a go program that crashes that lists every running goroutine.
curl -o goroutine_stack.txt "https://monitor.xyzbots.com:8550/debug/pprof/goroutine?debug=2"
This is useful for solving the error runtime: goroutine stack exceeds 1000000000-byte limit
. Functions must use the go
keyword when called from inside the same function.
# this fails with runtime: goroutine stack exceeds 1000000000-byte limit
func a() {
if (queue_item_exists == true) {
// process queue item
a()
} else {
time.Sleep(time.Second)
a()
}
}
# this works
func a() {
if (queue_item_exists == true) {
// process queue item
go a()
} else {
time.Sleep(time.Second)
go a()
}
}