Skip to content

Instantly share code, notes, and snippets.

@kellabyte
Last active November 25, 2016 04:49

Revisions

  1. kellabyte revised this gist Nov 25, 2016. 1 changed file with 14 additions and 13 deletions.
    27 changes: 14 additions & 13 deletions haywire_bottleneck.md
    Original file line number Diff line number Diff line change
    @@ -21,19 +21,20 @@ perf top

    Expanding `http_parser_execute`
    ```
    2.68% libtcmalloc_minimal.so.4.3.0 [.] tc_calloc ▒
    0.71% [kernel] [k] __srcu_read_unlock ▒
    0.63% hello_world [.] string_from_int ▒
    0.62% hello_world [.] uv__write ▒
    0.50% [kernel] [k] fsnotify ▒
    0.38% [kernel] [k] __skb_clone ▒
    0.33% [kernel] [k] tcp_ack ▒
    0.26% hello_world [.] http_request_on_url ▒
    0.24% [kernel] [k] __alloc_skb ▒
    0.20% libc-2.21.so [.] 0x000000000014d86c ▒
    0.18% hello_world [.] hw_strcmp ▒
    0.17% [kernel] [k] __inet_lookup_established ▒
    0.14% [kernel] [k] tcp_packet
    6.84% hello_world [.] http_request_buffer_pin ▒
    6.59% hello_world [.] http_parser_execute ▒
    6.21% hello_world [.] http_request_buffer_reassign_pin ▒
    6.08% libtcmalloc_minimal.so.4.3.0 [.] tc_deletearray_nothrow ▒
    4.47% hello_world [.] http_request_buffer_locate ▒
    3.62% libtcmalloc_minimal.so.4.3.0 [.] tc_malloc ▒
    2.26% [kernel] [k] tcp_sendmsg ▒
    1.82% [kernel] [k] _raw_spin_lock_bh ▒
    1.44% hello_world [.] http_request_on_message_complete ▒
    1.21% hello_world [.] create_response_buffer ▒
    1.08% hello_world [.] hw_route_compare_method ▒
    0.93% [kernel] [k] __srcu_read_lock ▒
    0.76% libtcmalloc_minimal.so.4.3.0 [.] tc_realloc ▒
    0.73% hello_world [.] free_http_request
    ```

    If I run 1 process but 20 threads somewhere they are competing on something and only reach `3.2 million requests/second`.
  2. kellabyte revised this gist Nov 25, 2016. 1 changed file with 34 additions and 0 deletions.
    34 changes: 34 additions & 0 deletions haywire_bottleneck.md
    Original file line number Diff line number Diff line change
    @@ -19,6 +19,23 @@ perf top
    1.20% hello_world [.] create_response_buffer
    ```

    Expanding `http_parser_execute`
    ```
    2.68% libtcmalloc_minimal.so.4.3.0 [.] tc_calloc ▒
    0.71% [kernel] [k] __srcu_read_unlock ▒
    0.63% hello_world [.] string_from_int ▒
    0.62% hello_world [.] uv__write ▒
    0.50% [kernel] [k] fsnotify ▒
    0.38% [kernel] [k] __skb_clone ▒
    0.33% [kernel] [k] tcp_ack ▒
    0.26% hello_world [.] http_request_on_url ▒
    0.24% [kernel] [k] __alloc_skb ▒
    0.20% libc-2.21.so [.] 0x000000000014d86c ▒
    0.18% hello_world [.] hw_strcmp ▒
    0.17% [kernel] [k] __inet_lookup_established ▒
    0.14% [kernel] [k] tcp_packet
    ```

    If I run 1 process but 20 threads somewhere they are competing on something and only reach `3.2 million requests/second`.

    ```
    @@ -39,4 +56,21 @@ perf top
    1.77% [kernel] [k] native_queued_spin_lock_slowpath
    1.58% libc-2.21.so [.] strlen
    1.44% hello_world [.] http_request_on_message_complete
    ```

    Expanding `http_parser_execute`.
    ```
    6.65% libtcmalloc_minimal.so.4.3.0 [.] tc_deletearray_nothrow ▒
    5.28% hello_world [.] http_request_buffer_pin ▒
    4.60% hello_world [.] http_request_buffer_reassign_pin ▒
    4.47% libtcmalloc_minimal.so.4.3.0 [.] tc_malloc ▒
    3.27% libc-2.21.so [.] 0x00000000001452a0 ▒
    3.26% hello_world [.] http_request_buffer_locate ▒
    2.71% libtcmalloc_minimal.so.4.3.0 [.] tc_calloc ▒
    2.12% libc-2.21.so [.] 0x000000000014d86c ▒
    2.07% libc-2.21.so [.] 0x000000000014d6b0 ▒
    1.95% hello_world [.] uv_write2 ▒
    1.53% hello_world [.] http_request_on_message_complete ▒
    1.44% [kernel] [k] tcp_sendmsg ▒
    1.39% hello_world [.] get_cached_request
    ```
  3. kellabyte created this gist Nov 25, 2016.
    42 changes: 42 additions & 0 deletions haywire_bottleneck.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,42 @@
    If I run 20 haywire processes using `tcmalloc` haywire reaches `6.3 million requests/second`.
    ```
    killall hello_world; for i in `seq 20`; do LD_PRELOAD="./lib/gperftools/.libs/libtcmalloc.so" ./build/hello_world --balancer reuseport & done
    perf top
    7.42% hello_world [.] http_request_buffer_pin
    6.94% hello_world [.] http_request_buffer_reassign_pin
    6.63% hello_world [.] http_parser_execute
    6.23% libtcmalloc.so.4.3.0 [.] tc_deletearray_nothrow
    4.94% hello_world [.] http_request_buffer_locate
    3.60% libtcmalloc.so.4.3.0 [.] tc_malloc
    2.77% libtcmalloc.so.4.3.0 [.] tc_calloc
    2.33% [kernel] [k] native_queued_spin_lock_slowpath
    2.31% [kernel] [k] tcp_sendmsg
    1.76% [kernel] [k] _raw_spin_lock_bh
    1.41% hello_world [.] http_request_on_message_complete
    1.34% hello_world [.] set_header
    1.22% [kernel] [k] copy_user_enhanced_fast_string
    1.20% hello_world [.] create_response_buffer
    ```

    If I run 1 process but 20 threads somewhere they are competing on something and only reach `3.2 million requests/second`.

    ```
    LD_PRELOAD="./lib/gperftools/.libs/libtcmalloc.so" ./build/hello_world --threads 20 --balancer reuseport
    perf top
    10.68% hello_world [.] http_parser_execute
    6.61% libtcmalloc.so.4.3.0 [.] tc_deletearray_nothrow
    5.05% hello_world [.] http_request_buffer_pin
    4.70% hello_world [.] http_request_buffer_reassign_pin
    4.46% libtcmalloc.so.4.3.0 [.] tc_malloc
    3.82% libc-2.21.so [.] 0x00000000001452a0
    3.26% hello_world [.] http_request_buffer_locate
    2.78% libtcmalloc.so.4.3.0 [.] tc_calloc
    2.41% hello_world [.] uv_write2
    2.30% libc-2.21.so [.] 0x000000000014d86c
    2.13% libc-2.21.so [.] 0x000000000014d6b0
    1.77% [kernel] [k] native_queued_spin_lock_slowpath
    1.58% libc-2.21.so [.] strlen
    1.44% hello_world [.] http_request_on_message_complete
    ```