If I run 20 haywire processes using tcmalloc
haywire reaches 6.3 million requests/second
.
killall hello_world; for i in `seq 20`; do LD_PRELOAD="./lib/gperftools/.libs/libtcmalloc.so" ./build/hello_world --balancer reuseport & done
perf top
7.42% hello_world [.] http_request_buffer_pin
6.94% hello_world [.] http_request_buffer_reassign_pin
6.63% hello_world [.] http_parser_execute
6.23% libtcmalloc.so.4.3.0 [.] tc_deletearray_nothrow
4.94% hello_world [.] http_request_buffer_locate
3.60% libtcmalloc.so.4.3.0 [.] tc_malloc
2.77% libtcmalloc.so.4.3.0 [.] tc_calloc
2.33% [kernel] [k] native_queued_spin_lock_slowpath
2.31% [kernel] [k] tcp_sendmsg
1.76% [kernel] [k] _raw_spin_lock_bh
1.41% hello_world [.] http_request_on_message_complete
1.34% hello_world [.] set_header
1.22% [kernel] [k] copy_user_enhanced_fast_string
1.20% hello_world [.] create_response_buffer
If I run 1 process but 20 threads somewhere they are competing on something and only reach 3.2 million requests/second
.
LD_PRELOAD="./lib/gperftools/.libs/libtcmalloc.so" ./build/hello_world --threads 20 --balancer reuseport
perf top
10.68% hello_world [.] http_parser_execute
6.61% libtcmalloc.so.4.3.0 [.] tc_deletearray_nothrow
5.05% hello_world [.] http_request_buffer_pin
4.70% hello_world [.] http_request_buffer_reassign_pin
4.46% libtcmalloc.so.4.3.0 [.] tc_malloc
3.82% libc-2.21.so [.] 0x00000000001452a0
3.26% hello_world [.] http_request_buffer_locate
2.78% libtcmalloc.so.4.3.0 [.] tc_calloc
2.41% hello_world [.] uv_write2
2.30% libc-2.21.so [.] 0x000000000014d86c
2.13% libc-2.21.so [.] 0x000000000014d6b0
1.77% [kernel] [k] native_queued_spin_lock_slowpath
1.58% libc-2.21.so [.] strlen
1.44% hello_world [.] http_request_on_message_complete