SwiftSync

Near-stateless, fully parallelizable validation of the Bitcoin blockchain with hints about which outputs remain unspent. All other inputs/outputs are efficiently crossed off inside a single hash aggregate that only reaches zero if validation was successful and the hints were correct.

Introduction

Validation is at the heart of Bitcoin. Any improvements in validation speed will have a direct impact on the scalability of the system, including everything that is built on top of it. For this reason improving validation performance may be one of the most important things we can do.

The generalized observation of SwiftSync is that if we have two sets (in this case the inputs and outputs), and we want to know the difference between these sets (the UTXO set), validating a given answer to this question (the hints) is significantly easier than calculating the answer itself. The novel part here (to my knowledge) is specifically how this can be applied to sets. It seems likely that this observation can also be useful in other blockchain-related contexts, and perhaps beyond that as well.

Regular validation as it occurs today in Bitcoin Core involves traversing the blockchain sequentially (with some more minor context independent checks being done in parallel), adding outputs to the UTXO set, and subsequently looking them up and removing them. Often the UTXO set doesn't fit in memory, slowing things down further with disk access.

SwiftSync changes this completely, without reducing the security assumptions. There are no database lookups, memory usage drops to near-zero, and everything can be done in parallel. This would essentially remove memory, disk IO, and single threaded performance from the list of potential bottlenecks. The remaining bottlenecks are CPU and bandwidth, and the more of it you have, the faster you'll be able to validate.

The statelessness of the protocol may also make it well-suited for memory-constrained scenarios such as validating the blockchain with specialized hardware (FPGAs, ASICs), building zero-knowledge proofs, or simply freeing up more memory for other operations such as batch validation of Schnorr signatures. You could also easily spread the validation workload over multiple devices.

We'll need further benchmarks to determine exactly how much faster SwiftSync is, but the very early proof-of-concept numbers (without any parallelization implemented) are already showing an impressive 5.28x improvement.

Protocol overview

For each output in the chain we require a hint of just a single bit indicating whether or not the output remains unspent at the tip of the chain (<100MB compressed).

Since validation fails if the hints are incorrect (DoS vector), the hints should originate from a reliable source (e.g. your full node software binary).¹

We'll be relying on assumevalid to skip certain checks. This is not a requirement, but does result in a more elegant and bandwidth-efficient protocol. The exact consequences of assumevalid, the non-assumevalid version, and how this relates to assumeutxo will be explained later.

We process all inputs and outputs in completely order-independent fashion (fully parallelizable). Our only required state is a 32-byte hash aggregate (no RAM usage, no random disk reads/writes). The protocol scales directly with available CPU and bandwidth.

For every output:

If the hint indicates the output will remain unspent:
- Write it to the UTXO set (append-only - we won't need this data until SwiftSync is done and normal validation resumes)
If the hint indicates the output will get spent:
- Hash its outpoint and add it to the hash aggregate

For every input:

Hash the outpoint (contained within the input) and remove it from the hash aggregate

Once we reach the end, the hash aggregate should be at 0, thus proving that the outputs that we wrote to the UTXO set are indeed unspent and every other output was spent exactly once (no double spends).

There are more details below, but this is basically the core of the protocol.

Details

Regarding the hash aggregate

All we need is the ability to know whether set A (the spent outputs) matches set B (the inputs), regardless of order. The order independence ensures removing an element prior to adding it is not a problem and makes it fully parallelizable.

Math-wise, simply hashing the pre-image and adding/subtracting the result (modular arithmetic) is almost enough - we also need to add a secret salt to ensure the data cannot be externally manipulated (generalized birthday problem).²

Example

Spent outputs: outpoint_A, outpoint_B

Inputs: outpoint_C,outpoint_D

If hash(outpoint_A||salt) + hash(outpoint_B||salt) - hash(outpoint_C||salt) - hash(outpoint_D||salt) == 0 this proves that our list of spent outputs must have been exactly equal to our list of inputs (i.e. (A==C && B==D) || (A==D && B==C)), and the remainder must be equal to the UTXO set.

Regarding assumevalid

As implemented today, assumevalid is only being used for skipping script validation. The underlying security argument is that users are already trusting that their software is correct, so they may as well trust a statement about validity of a chain. More things than just scripts could have been skipped, but most of the remaining checks were not expensive and/or complex to remove, so presumably that's why they were left in. If we made only slightly more broad use of assumevalid, we could reduce the amount of data required to validate inputs significantly. Let's go through it.

A full entry in the UTXO set contains the following data:

Outpoint
Output script
Coinbase flag
Block height
Amount

The straightforward cases

Without assumevalid we'd need to have all this data available when we check all the inputs. In the assumevalid-version of SwiftSync we only use the outpoints (1). This is convenient since this data is readily available inside of each input. Omitting the output scripts (2) with assumevalid is easy to comprehend - scripts aren't checked anyway - but what about the rest?

Skipping the coinbase flag (3) means we can no longer check if an output created in the coinbase hasn't been spent earlier than allowed - these transactions essentially have an implied relative timelock of 100 blocks. To check such a relative timelock we also need to know the block height (4). Similarly, the nsequence field in a transaction could also enable relative timelocks which we can't check.

The block height also indirectly helps with something else that only becomes relevant once you start doing validation in parallel: it ensures outputs get created before they get spent rather than after. Note that block height alone can be insufficient in the case where an output is created and spent in the same block. More on this later, as we'll need to explicitly deal with this later in the non-assumevalid version.

Another thing we'll discuss later is BIP30, which currently requires full access to the UTXO set. It's perfectly addressable, but for the assumevalid-version this shouldn't be needed.

It should be fairly uncontroversial to include all of the above under assumevalid, considering none of these checks are more impactful than the script checks that are currently being skipped.³

The harder cases

Skipping the amounts (5) is one that's debatable, since without those you're no longer actively checking for inflation bugs. Luckily there is a solution - we can add up the amounts from the UTXO set and check if it doesn't exceed the total coin issuance. Note this should also include op_return amounts.

One limitation of this is that there is no straightforward way of calculating the coins that miners didn't claim since we're implicitly skipping fee calculations. Theoretically, if someone claimed those unclaimed coins we wouldn't notice. In terms of severity, this is no worse than the worst-case assumevalid scenario of falsely claiming coins through an invalid script. Furthermore, the amount of coins to which this can apply is quite limited.⁴

There is one final consequence that is a clear downside, though not a major one. Since we don't have the full previous output data when we check the inputs of a block, we also won't be able to generate the so-called undo data - the data that is required to roll back the blockchain in case of a reorg. This means we won't be able to reorg back beyond the assumevalid SwiftSync point. This is the exact same limitation that also occurs when you run Bitcoin Core in pruned mode. There is a practical way to deal with this: simply don't use assumvalid SwiftSync for the last X blocks, where X is the number at which you feel confident no reorgs would occur.

In short, by relying on a minimally extended definition of assumevalid and accepting that deep reorgs won't be possible without a resync (just like pruned nodes), we end up with a very clean and simple protocol. Still, the ability to validate without assumevalid is important as well, so let's consider that next.

Without assumevalid

SwiftSync without assumevalid can fully validate all consensus rules, but we need some additional steps. Rather than just adding and removing hashed outpoints, we need to hash in everything that is required to check an output when it's spent (see the previous 5 point list).

Adding this to the hash aggregate is easy, as the data is available at that moment (when we process the output). But at the time we want to remove it again (when we process the input), all we have is the outpoint - we'll have to re-download the missing data. This is probably somewhere under 15% additional data compared to regular full validation, with a lot of room for efficient encoding.⁵

We could let users download this data externally, but we could also consider serving it over the peer-to-peer network. As luck may have it, most nodes already have it - the aforementioned undo data that allows nodes to reorg is exactly the data that we need.⁶ Note that we'll also need the same source that provided us the hints to also provide a hash of the undo data to ensure a random peer doesn't give us invalid data, causing validation to fail.⁷

Since we now check amounts again, the separate inflation check on the UTXO set is no longer needed, but we do have a new problem to deal with that we alluded to in the previous section.

In-block transaction order

Due to parallelization being the default in SwiftSync, we won't necessarily know whether outputs were created prior to being spent for instances where outputs are created and spent in the same block (in all other cases checking the block height will suffice).

While the existence of the issue is undeniable, the consequences seem surprisingly mild when you think it through,⁸ but we can certainly address it, and it's a lot easier to argue about security if simply we dealt with it.

The most straightforward and practical solution is to use a per-block cache. You can think of it is as a per-block mini UTXO set (technically it'd only need enough data to prove the order) that we build up sequentially. We'd only have to look up inputs when their block height matches the current block, though there may be reasons to do more.⁹ There is some loss of elegance in the sense that we're adding back some state and doing sequential lookups, but this is limited to a per-block context - we're still completely free to process all blocks in parallel.

If we wanted to persist with a cacheless strategy and fully avoid lookups and sequential checks, there are more options,¹⁰ but for running a full node on regular hardware the per-block cache seems good enough.

BIP30 validation

For SwiftSync to be equivalent to regular full validation, it needs to somehow respect BIP30 despite not being able to access the UTXO set. BIP30 didn't activate until March 2012, but it is enforced from genesis until the moment BIP34 activated at block height 227931 in March 2013. BIP30 avoids duplicate outputs in the UTXO set by invalidating blocks with unspent outputs that already exist. In theory BIP30 does not fully prevent duplicate transactions (BIP34 does, barring some unaddressed issues), as creating, spending, then re-creating is still allowed, but in practice this never happened.

There are two instances prior to BIP30 activation where the same coinbase output got created twice. The second output overwrote the first, effectively rendering the coins in the first output unspendable. These instances are treated as exceptions in BIP30. More info on this can be found here.

SwiftSync has no concept of a UTXO set, so we cannot check if an output already exists. Luckily, it's possible to devise an alternative method that gives us equivalent results. We can simply keep a cache of every coinbase txid and check it for uniqueness. This only takes up 227931 x 32 bytes = ~7MB of RAM, and we can optimize this quite a bit further or even make it stateless if we'd like.¹¹ Not only is this equivalent to BIP30, it is also a faster drop-in replacement for how we do it today.

There is also a hypothetical scenario where there's a massive reorg to 2013, causing BIP34 to never activate and BIP30 to remain active. Rather than solving for this implausible situation, the more practical solution would be to simply fall back to non-SwiftSync validation if this ever occurred.

Relating this to assumeutxo

Assumeutxo and SwiftSync are completely orthogonal. Assumeutxo starts with a UTXO set that is assumed to be correct and then performs validation in the background to verify this assumption. This latter part is where SwiftSync neatly fits in.

One of the resulting complexities of assumeutxo is that it requires two states: one from the assumeutxo point to the current tip, and one from genesis until the assumeutxo point. Once background validation is finished we are once again left with a single state. Since SwiftSync is near-stateless, using this for background validation could potentially make things simpler.

One important but very neatly resolvable difference is that while we no longer need to build a UTXO set, we do need to validate that the assumeutxo UTXO set we're starting with is equivalent to the UTXO set that would have resulted from the SwiftSync hints. We can achieve this with the same hash aggregate trick.

Every time we process an output that is hinted to be in the UTXO set we just add it into the hash aggregate (a hash of the full UTXO set data - not just the outpoint). Similarly, we hash every entry of the assumeutxo UTXO set and remove them from the hash aggregate. If everything was indeed correct, we should once again end up at 0.

Satisfyingly, the one-bit-per-output hints won't even be required if the non-assumevalid version of SwiftSync is used, since the data you add into the hash aggregate will be the same regardless of whether the output will end up in the UTXO set or not (i.e. outputs - inputs - UTXO == 0). This level of elegance pleases me greatly.

Acknowledgements

Optimization of validation has been an on-going topic with a lot of discussion over the years. Many have gone before me and explored related ideas. Cory Fields' UHS is perhaps the most similar, which in turn is an evolution of Bram Cohen's bitfields. Tadge Dryja's utreexo also has overlap in terms of state reduction, parallelization, and IBD hints related to caching.

Thanks to theStack and 0xb10c for the initial benchmarks and numbers. I'd also like to thank the Core developers who have been actively working on IBD improvements - davidgumberg, l0rinc, and andrewtoth - and whose discussions were helpful, as well as the many others who provided feedback.

What other options do we have for receiving hints? One downside of relying on a value inside your software is that the software can be many months old. Validation can slow down significantly once we get to the point without hints. Another possibility would be to receive non-assumevalid SwiftSync hints from a trusted third party. Worst case, the third party lied to you and validation fails. Importantly, you can never be made to think that an invalid chain is valid. A hybrid is also possible (even to catch up when your node was offline), but then we need a bit more logic to remove UTXOs from the set in between SwiftSync sessions. Finally, the hints could be committed into the chain, but that would be a soft fork. ↩
How does MuHash or XOR stack up against a hash aggregate with secret salt? MuHash is a viable alternative if we wanted to remove the requirement of a secret salt (e.g. to allow state comparisons between clients), but this seems excessive for this use case. XOR won't suffice since elements that appear twice as either an input or output would cancel each other out. There might be additional checks you could do to compensate for this, but the difference would most likely be negligible. ↩
What's the worst that could happen when we ignore timelocks and transaction ordering? An unenforced timelock or an out-of-order transaction merely means some money got spent too soon or something weird happened with the transaction order where coins got spent out from a non-existing output (inflation) and later the corresponding output was created (correcting the inflation) - if this hadn't been corrected, SwiftSync validation would have failed. Crucially, this isn't worse than assumevalid's worst case scenario of spending coins without a valid script. ↩
What if we did want a more precise check for coins that weren't claimed by miners? In theory we could include amount hints for all the inputs of blocks where miners left money on the table, as well as provide hints for which outputs will get spent in those blocks. For these specific instances we would include the amounts in the hash. Miners could grief this by consistently leaving 1 satoshi on the table. A soft fork committing to the fee total or disallowing unclaimed coins would be a more thorough solution. That said, considering the worst-case is still not worse than what assumevalid already enables, simply accepting the limitation seems perfectly reasonable. Disabling assumevalid is the way to go for those who care. ↩
What can we do to compress the required data for the non-assumevalid version? There is plenty of room for efficient encoding and deduplication. A lot of repeated patterns (e.g. standard transactions), hashes for which we know the pre-image (e.g. p2sh), and frequently occurring values (e.g. most inputs have a recent block height). Some of the ideas from BIP337 can likely also be repurposed. ↩
Are there other use cases that serving the undo data over the p2p network enables? It could enable pruned nodes to reorg beyond their prune point, provided they kept a hash of the pruned undo data (assumevalid SwiftSync can't do this, unfortunately) and peers are willing to serve undo data for blocks they forked away from. Since the undo data basically enables the ability to fully validate blocks out of context, protocols such as Proof-of-Work Fraud Proofs (low-bandwidth chain validation by selectively validating fork blocks) could also benefit. ↩
Are there other ways of serving the undo data in a DoS resistant manner? The most straightforward approach would be to hash this data into each block and making it part of consensus, but soft forks are difficult. There is also a theoretical peer-to-peer approach of comparing the state of the undo data between peers and finding out which peer is lying if there is a discrepancy (this works as long as one peer is honest), but this is relatively intricate. An example of this (albeit in a different context) is described here. ↩
Let's say we didn't check the in-block order - how could an attacker abuse this? Accepting a chain with incorrect ordering wouldn't have any impact on coin ownership or inflation. The only issue is that regular full nodes reject the alternative order, so it'd be a hard fork. However, this fork wouldn't happen unless the alternative order is present in the most-proof-of-work chain. It's surprisingly hard to think of a scenario where an attacker could effectively abuse this. Here's an attempt: 1.) Overtake the most-proof-of-work chain and mine a block with alternative ordering, 2.) get your victim(s) to perform SwiftSync validation all the way to the tip so they overlook the alternative order (maybe instigated by some corruption bug?), 3.) Get your victim(s) to purchase coins from you on the 'wrong' chain. Eventually the 'good' chain overtakes the 'bad' chain again and everyone is back in consensus. All in all, it doesn't seem like this attack is more effective than just selling coins and then reorging out the transaction. Even if just causing chaos was the goal, it's not clear that a large reorg wouldn't have roughly the same effect. ↩
Could we perhaps save bandwidth with caching by not having to re-download every UTXO? Yes, and you could be pretty smart about it too, but it does add more complexity and makes things more sequential. You could have the cache span over multiple blocks and you could even go as far as to provide hints about which outputs will be spent soon. This is perhaps something to come back to if we feel bandwidth is a limiting factor. ↩
How can we check in-block order without a cache? We could add an additional element to the UTXO set: a canonical transaction number (output numbering works too, and is enticing for other reasons, but more precise than we need here). Now if an output is being spent and the block height is the same, we look at the transaction number to verify that it was created before the transaction we're currently examining. The downside is that this increases the size of the entire UTXO set (~2 bytes per entry). The more interesting approach may be to introduce additional hints for every output that we know will get spent in the same block. Specifically, we hint at the transaction number at which it will get spent (must be higher than the transaction number containing the output - we can just encode the offset) and include it in the hash. Now when we encounter an input for which the block height matches that of the block, we similarly include its transaction number in the hash prior to removing it from the aggregate. Unfortunately it's reasonably common for transactions to get spent in the same block, so the necessary hint data would probably increase by quite a bit. While caching is the more obvious approach, it would still be interesting to see comparison benchmarks of the cacheless alternative. ↩
The BIP30 check is yet another area with cache - what can we do to optimize it? We can reduce the data set by truncating the txid. Normally we'd be concerned with collisions, but we can ensure there won't be any collisions by pre-computing the exact bits that need to be kept in order to guarantee a collision-free result for this specific chain (rehashing with a salt works too but is slower). Information theoretically this could get us down to 18 bits per txid, though this would be computationally infeasible. In practice we would certainly be able to get it under 4 bytes (maybe even 3). Now we're down to <1MB of data. We can also go even further and nullify the state and lookups with the help of the hash aggregate. For this we'd supply an ordered list of the truncated hashes. We'd go through them one by one and check that each element is larger than the previous one (proving uniqueness), while simultaneously adding them to the hash aggregate. We remove them again when we encounter them in the block (proving set equivalence if we reach 0). ↩

RubenSomsen/SwiftSync.md

SwiftSync

Introduction

Protocol overview

Details

Regarding the hash aggregate

Example

Regarding assumevalid

The straightforward cases

The harder cases

Without assumevalid

In-block transaction order

BIP30 validation

Relating this to assumeutxo

Acknowledgements

RubenSomsen commented Apr 9, 2025

JoseSK999 commented Apr 14, 2025 •

edited

Loading

RubenSomsen commented Apr 14, 2025

RubenSomsen/SwiftSync.md

SwiftSync

Introduction

Protocol overview

Details

Regarding the hash aggregate

Example

Regarding assumevalid

The straightforward cases

The harder cases

Without assumevalid

In-block transaction order

BIP30 validation

Relating this to assumeutxo

Acknowledgements

Footnotes

RubenSomsen commented Apr 9, 2025

JoseSK999 commented Apr 14, 2025 • edited Loading

RubenSomsen commented Apr 14, 2025

JoseSK999 commented Apr 14, 2025 •

edited

Loading