Skip to content

Instantly share code, notes, and snippets.

@gjaldon
Created March 18, 2020 10:31
Show Gist options
  • Save gjaldon/4d026661666b3dfd76e96d334ff8c7ed to your computer and use it in GitHub Desktop.
Save gjaldon/4d026661666b3dfd76e96d334ff8c7ed to your computer and use it in GitHub Desktop.
Troubleshooting rate_limit_pool timeouts
Fix call timeouts in rate_limit_pool
- https://github.com/gjaldon/reproduce_call_timeouts
- run `iex -S mix` and do `Reproduce.Application.test()`
- check number of messages in queue with `Reproduce.Application.message_queue_len()`
Takeaways:
- Not an issue with poolboy (the checkout genserver call has very little overhead) or Redis
- since Redis is not the bottleneck, the number of connections is not an issue
- even with though the handle_call only takes 2ms to ‘execute’, sending 20K calls to the GenServer still leads to timeouts. (I got 2ms from running `:timer.tc fn -> RateLimit.fetch_riot_key(url, config) end)` in prod)
- you can try it with less GenServer calls and see the minimum number so get timeouts
- this means that the rate_limit_pool process/es just can’t keep up with all with GenServer calls it gets when we have peak traffic
To fix the timeout issues:
- lessen the number of http requests - one way to do this is to increase the cache ttl of data in RiotSource (quick and easy fix)
- increasing the number of pool processes. this means the number of rate_limit messages will be less per pool. (quick and easy fix but haven’t yet observed its tool on our web nodes’ resources)
- lessen the calls to Redis (RateLimit/RateLimitPool) in RiotApi. did a quick and found that we do at least 3 calls to RateLimit for every riot request. If we can bring that down to 1, the rate_limit_pool will be handling a lot less genserver calls (takes longer to do this but longer-term fix)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment