Go: Double Channeling

Core Idea:

The primary goal of this pattern is to allow the http.HandlerFunc to always have access to the latest available set of []Company data without blocking, even while a potentially long-running crawlCompanies function is fetching new data in the background. It provides a form of non-blocking data exchange between a background worker and a request handler.

How it Works:

Two Buffered Channels (ch1, ch2):
- Two channels (ch1, ch2) are created, each capable of holding exactly one []Company slice (make(chan []Company, 1)). The buffer size of 1 is key.
- They are initialized by sending nil into each (ch1 <- nil, ch2 <- nil). This means both channels start "full" with a placeholder value.
The Background Crawler (crawlCompanies):
- This function runs in a separate goroutine.
- It performs the potentially slow task of fetching company and job data.
- Crucially, at the end, it iterates through the channels provided (chs, which are ch1 and ch2):
  - <-ch: It receives from the channel. This will block until it can take the value currently in the channel's buffer (initially nil, later the previously fetched data). This effectively acquires a "slot" or "lock" on that channel.
  - ch <- out: It sends the newly fetched and processed data (out) back into the same channel. This updates the channel with the fresh data.
- It does this for both ch1 and ch2, ensuring both eventually hold the same, latest data set.
The HTTP Handler (http.HandlerFunc):
- When a request comes in, it first checks the tick channel non-blockingly (default: clause) to potentially trigger a new background crawl if an hour has passed.
- Then, it tries to get the company data using a select statement on ch1 and ch2:
  - case companies = <-ch1: ch1 <- companies: It attempts to receive data from ch1. If successful (meaning ch1 contains data), it copies the data into the companies variable. Critically, it immediately *sends the same data back into ch1**.
  - case companies = <-ch2: ch2 <- companies: If receiving from ch1 would block (e.g., because crawlCompanies is currently holding the "slot" while updating it), the select statement tries to receive from ch2. If successful, it does the same: reads the data and immediately puts it back into ch2.

Why "Double Channeling"?

Non-Blocking Reads for Handler: The handler needs data now to serve the request. If crawlCompanies is in the middle of updating ch1 (it has done <-ch1 but not ch1 <- out yet), the handler's attempt to read from ch1 would block. However, because ch2 still holds the previous version of the data, the handler can successfully read from ch2 via the select statement. The handler gets slightly stale data, but it doesn't block. The immediate send-back (ch1 <- companies or ch2 <- companies) ensures the data remains available for the next request or for the crawler to update.
Decoupling: The handler and the crawler operate mostly independently. The handler always reads whatever data is available in either channel, and the crawler updates both channels when it has new data.
Availability: There are two copies (pointers to the same underlying slice, usually) of the data available. This increases the chances that the handler can grab one without waiting for the crawler.

In essence:

The two channels act like two containers holding the latest results. The handler can quickly borrow the results from either container, use them, and put them right back. The background crawler replaces the contents of both containers whenever it finishes fetching new results. This ensures the handler almost always gets data immediately, even if it's the slightly older version from the container the crawler isn't currently updating.

wjkoh/double_channeling.md