Finding potential matches for a single log value

Rows of the filter maps are encoded as list of column indices in strictly ascending order. These column indices are 24 bits long according to the current EIP-7745 specs, with the higher 16 bits representing the lowest 16 bits of the log value index of the matching log value entry while the lower 8 bits are calculated from the FNV-1a hash of the log value index and the log value hash itself, providing an additional probabilistic filter strength to the row index assignment that is also calculated from the hash of the position and the log value hash (though this hash changes less frequently as it only depends on the map index, and in the most typical case it is remapped once per epoch, which consists of 1024 maps). Row mapping can also change not only depending on horizontal position but also on row length as if the number of entries in a row reaches a certain threshold, the entry is placed on a higher mapping layer which is also part of the row index mapping hash. On higher layers the row length threshold is higher and also the row mapping changes more frequently. Note that row length thresholds for each mapping layer are independent constants from the ProgressiveList subtree sizes.

In order to find a certain log value hash, the verifier should look into the rows on mapping layer 0, and if the length of the row reaches the given threshold then is should also look into the higher layer rows. Note that the rows mapped to a given log value at low mapping layers can be longer than the length threshold as they might be also used by another log value in a higher layer mapping but those entries are not relevant for the given search. If the rows are fully available in the proof up to the given threshold then the verifier should only check whether the lower 8 bits of the entries match the value based on the FNV-1a hash. If higher layers need to be accessed then the verifier can also assume that entries of the next layer smaller than the last relevant entry of the previous row are irrelevant.

Note that in order to simplify encoding and proving, entries are encoded as uint32 little endian even though with the current constants they are only 24 bits long. In the future the leaf encoding efficiency can be increased by storing the entries in a more tightly packed form (also including the row length which takes up a full uint256 chunk) but the hashing format ensures that each leaf contains 8 entries without the binary encoding of individual entries crossing leaf boundaries, thus simplifying the lookup of and specific row entry list index in the tree.

Pattern matching and final result lookup

Search patterns can specify different log values allowed at different relative positions, and also multiple different values allowed at the same position. Once the verifier determines the potential matches for one value, it can do the pattern matching based on these sets of potentially matching log value indices. In the geth PoC implementation these are implemented in the potentialMatches function of the prover interface. singleProver.potentialMatches returns the potentially matching set for a single log value. matchAnyProver gives the results for a single position of the search pattern (either the address or one of the topics) by running the single provers of each allowed value at the given position, then returning the union of these sets. matchSequenceProver evaluates the matchAnyProver of each search pattern position, shifts them left according to the position (for example a match for topic2 at position 10000 indicates a possible match for the entire log event at position 9998), then returns the intersection of the resulting sets.

Finally, for each potential match in the final intersection, the proof should contain a merkle path for the individual log entry in the corresponding log_entries tree, even if the match is a false positive, which can be determined by looking into the address and topics fields of the proven log entry. Note that in case of a false positive the rest of the log data might not be proven and also only a subset of the address / topics fields might be proven that prove that it is not a real match. If the log entry at the potentially match is entirely missing from the proof or the available fields do not prove that it is not a match and yet the entire log data is not available then the proof is invalid. Also note that a false positive match can point to a position of the log_entries tree that does not contain an entry at all (logs are stored at the first log value index of the log, the address position). In this case the root of the log entry subtree is a zero hash and the proof should be expected to contain a path to this zero hash. It is also possible that it points to a block delimiter, in which case the generalized tree index 15 of the subtree is proven and contains a 2**64-1 value.

Processing partially proven filter rows

Long rows belonging to higher layer mappings of very popular log values are not fully encoded if the potential match sets of other search pattern positions ensure that any result coming from certain regions of the long row would be irrelevant as the final intersection would not contain any matches from that region anyways. For example if address is very popular but topic2 only has a single potential match at position 15220 then a single leaf from the address row containing [14844, 14967, 14988, 15023, 15503, 15755, 15899, 16010] proves that there is no match and most of the long row data does not need to be proven. It is important to know though that the leaf containing the last entry of previous mapping layers also needs to be proven because if that entry is unknown then any position higher than the last proven entry of the lower layer row is a potential match.

In order to avoid dealing with the complex interactions between the proofs of the search pattern positions, the verifier can operate on sets of potential matches that can include continuous ranges as well as individual indices. When processing each individual row, the verifier can iterate on the proven leafs of the row, and whenever there is an unproven gap in the series of ProgressiveList data chunks, it can assume that anything between the last proven entry before the gap and the first proven entry after the gap are potential matches. If union and intersection operations on these sets are calculated then these continuous unproven sections should fall out in the end. If the end result contains part of such a range then the proof is invalid (which does not need to be explicitly checked as the corresponding log_entries will be missing anyways).

Note that the verifier can assume that if any part of a specific row is proven then the count field of the ProgressiveList is always proven. If no part of the row list is proven then the proof might still be valid and the returned potential matches should cover the entire 0..2**16-1 subrange belonging to the given map index (and it should be expected that the results from the other search pattern positions will prove that no match is possible at all in the given map). The available ProgressiveList data chunks should be iterated up to either the actual row length or the row length threshold of the given layer, proven entries should be filtered based on their lower 8 bits, while unproven gaps should be filled with continuous potential match ranges.

If the length of the row is lower than the threshold and the last entry is proven then there is no need to look into higher mapping layers as no further matches are possible. If the length is at least as high as the threshold and the last entry before the threshold is proven then the verifier should progress to the next mapping layer. In either case if the last relevant entry is not proven then it is also not necessary to look into the next layer but any index after the last proven entry, up to the 2**16-1 map subindex range boundary should be considered a potential match.

If the verifier progresses to the next layer, assuming that the last entry of the previous layer row before the threshold was X, it should check whether the first entry in the next layer row that is larger than X is proven. If this is not the first entry of the next layer row then in order to verify this condition the last entry smaller than X should also be proven. In this case it can proceed processing entries from there without adding a continuous potential match range. If this condition is not met then it is possible that there are further matches between X and the first known entry larger than X, and therefore it is necessary to consider this range as potential matches.

zsfelfoldi/proof verification.md

Finding potential matches for a single log value

Pattern matching and final result lookup

Processing partially proven filter rows

Test vectors