- Make fetching a single page faster
- Fetch multiple pages in parallel
First measure relative overhead of different parts of page fetching process to understand where time is being spent:
- Measure RPC Time (from RPC start to RPC finish)
- Measure Page Fetch Time (from DB entry to DB exit)
- Subtract RPC Time - Page Fetch Time to obtain RPC overhead (includes, e.g., DB connection setup and result serialization)
If RPC overhead dominates, we should move to a more efficient RPC method. Otherwise, we should move to parallelize reads using the parallel page token API.
The current RPC using JSON to transmit an array of object records with lots of binary data. This has the following issues:
- it requires binary data to be encoded using formats like hexadecimal or base64 which have an overhead of 50% or 33% respectively;
- arrays of human-readable JSON objects have redundant field names which appear repeatedly and are pure overhead.
Some possible alternatives:
- gRPC with either flat buffers or protobuf;
- Cap'n Proto with its associated RPC mechanism.
To implement this, we will need to add a page index column to the DB for a fixed page size.
This page index will be defined as follows:
- keys are page numbers;
- values are keys in the transaction log column.
To properly build the page index, as transactions get inserted into the log, we will add them to the appropriate page. Once this page index is built, we will be able to offer an API based on page numbers instead of page tokens.
Since transaction keys are prefixed by the transaction timestamp and key-value pairs are stored in sorted order, transactions will generally be inserted near the end of the log (with some variance due to clock skew, etc...). Let's first handle this easy case. Assuming all transactions are appended to the end of the log, we will:
- maintain a counter of how many transactions have been appended;
- once the counter reaches the desired page size, reset the counter and insert add a new key-value pair into the page index with key set to
prev_max_page + 1and value set to the most recently inserted transaction key.
However, we need to handle the case wehre a transaction is not appended to the log but instead inserted somewhere near the end of the log.
To handle this, we will use one or more of the following approaches:
- Flexible Page Sizes: have an initial page size and a larger maximum page size; by allowing pages to grow, they can absorb out of order writes;
- Page Shifting: shift the page token of pages that occur after the modified page;
- Page Delay: delay providing access to the N most recent pages to allow for pages to "settle" before access;
- Fallback to Sequential Page Access: if a page grows beyond the maximum page size, instead of doing page shifting, we return the first page size records to the client but indicate that more records are available in the extended page.
Overtime, if the DB runs out of space, we can delete the pages at the beginning of the transaction log and their corresponding page indices. If a client requests a deleted page, we can detect this case (by the lack of entry in the page index) and report that their desired page was culled.