gjaldon · March 18, 2020 10:31
diff --git a/gistfile1.txt b/gistfile1.txt
 Fix call timeouts in rate_limit_pool
 - https://github.com/gjaldon/reproduce_call_timeouts
 - run `iex -S mix` and do `Reproduce.Application.test()`
 - check number of messages in queue with `Reproduce.Application.message_queue_len()`

 Takeaways:
 - Not an issue with poolboy (the checkout genserver call has very little overhead) or Redis
 - since Redis is not the bottleneck, the number of connections is not an issue
 - even with though the handle_call only takes 2ms to ‘execute’, sending 20K calls to the GenServer still leads to timeouts. (I got 2ms from running `:timer.tc fn ->  RateLimit.fetch_riot_key(url, config) end)` in prod)
 - you can try it with less GenServer calls and see the minimum number so get timeouts
 - this means that the rate_limit_pool process/es just can’t keep up with all with GenServer calls it gets when we have peak traffic

 To fix the timeout issues:
 - lessen the number of http requests - one way to do this is to increase the cache ttl of data in RiotSource (quick and easy fix)
 - increasing the number of pool processes. this means the number of rate_limit messages will be less per pool. (quick and easy fix but haven’t yet observed its tool on our web nodes’ resources)
 - lessen the calls to Redis (RateLimit/RateLimitPool) in RiotApi. did a quick and found that we do at least 3 calls to RateLimit for every riot request. If we can bring that down to 1, the rate_limit_pool will be handling a lot less genserver calls (takes longer to do this but longer-term fix)
	Fix call timeouts in rate_limit_pool
	- https://github.com/gjaldon/reproduce_call_timeouts
	- run `iex -S mix` and do `Reproduce.Application.test()`
	- check number of messages in queue with `Reproduce.Application.message_queue_len()`

	Takeaways:
	- Not an issue with poolboy (the checkout genserver call has very little overhead) or Redis
	- since Redis is not the bottleneck, the number of connections is not an issue
	- even with though the handle_call only takes 2ms to ‘execute’, sending 20K calls to the GenServer still leads to timeouts. (I got 2ms from running `:timer.tc fn -> RateLimit.fetch_riot_key(url, config) end)` in prod)
	- you can try it with less GenServer calls and see the minimum number so get timeouts
	- this means that the rate_limit_pool process/es just can’t keep up with all with GenServer calls it gets when we have peak traffic

	To fix the timeout issues:
	- lessen the number of http requests - one way to do this is to increase the cache ttl of data in RiotSource (quick and easy fix)
	- increasing the number of pool processes. this means the number of rate_limit messages will be less per pool. (quick and easy fix but haven’t yet observed its tool on our web nodes’ resources)
	- lessen the calls to Redis (RateLimit/RateLimitPool) in RiotApi. did a quick and found that we do at least 3 calls to RateLimit for every riot request. If we can bring that down to 1, the rate_limit_pool will be handling a lot less genserver calls (takes longer to do this but longer-term fix)