Core Idea:
The primary goal of this pattern is to allow the http.HandlerFunc
to always have access to the latest available set of []Company
data without blocking, even while a potentially long-running crawlCompanies
function is fetching new data in the background. It provides a form of non-blocking data exchange between a background worker and a request handler.
How it Works:
-
Two Buffered Channels (
ch1
,ch2
):- Two channels (
ch1
,ch2
) are created, each capable of holding exactly one[]Company
slice (make(chan []Company, 1)
). The buffer size of 1 is key. - They are initialized by sending
nil
into each (ch1 <- nil
,ch2 <- nil
). This means both channels start "full" with a placeholder value.
- Two channels (
-
The Background Crawler (
crawlCompanies
):- This function runs in a separate goroutine.
- It performs the potentially slow task of fetching company and job data.
- Crucially, at the end, it iterates through the channels provided (
chs
, which arech1
andch2
):<-ch
: It receives from the channel. This will block until it can take the value currently in the channel's buffer (initiallynil
, later the previously fetched data). This effectively acquires a "slot" or "lock" on that channel.ch <- out
: It sends the newly fetched and processed data (out
) back into the same channel. This updates the channel with the fresh data.
- It does this for both
ch1
andch2
, ensuring both eventually hold the same, latest data set.
-
The HTTP Handler (
http.HandlerFunc
):- When a request comes in, it first checks the
tick
channel non-blockingly (default:
clause) to potentially trigger a new background crawl if an hour has passed. - Then, it tries to get the company data using a
select
statement onch1
andch2
:case companies = <-ch1: ch1 <- companies
: It attempts to receive data fromch1
. If successful (meaningch1
contains data), it copies the data into thecompanies
variable. Critically, it immediately *sends the same data back intoch1
**.case companies = <-ch2: ch2 <- companies
: If receiving fromch1
would block (e.g., becausecrawlCompanies
is currently holding the "slot" while updating it), theselect
statement tries to receive fromch2
. If successful, it does the same: reads the data and immediately puts it back intoch2
.
- When a request comes in, it first checks the
Why "Double Channeling"?
- Non-Blocking Reads for Handler: The handler needs data now to serve the request. If
crawlCompanies
is in the middle of updatingch1
(it has done<-ch1
but notch1 <- out
yet), the handler's attempt to read fromch1
would block. However, becausech2
still holds the previous version of the data, the handler can successfully read fromch2
via theselect
statement. The handler gets slightly stale data, but it doesn't block. The immediate send-back (ch1 <- companies
orch2 <- companies
) ensures the data remains available for the next request or for the crawler to update. - Decoupling: The handler and the crawler operate mostly independently. The handler always reads whatever data is available in either channel, and the crawler updates both channels when it has new data.
- Availability: There are two copies (pointers to the same underlying slice, usually) of the data available. This increases the chances that the handler can grab one without waiting for the crawler.
In essence:
The two channels act like two containers holding the latest results. The handler can quickly borrow the results from either container, use them, and put them right back. The background crawler replaces the contents of both containers whenever it finishes fetching new results. This ensures the handler almost always gets data immediately, even if it's the slightly older version from the container the crawler isn't currently updating.