Filter maps (EIP-7745)

EIP-7745 defines a sequential log value index that enables correctness and completeness proofs for eth_getLogs responses. However, the server would still have to send the entire log value index for the queried range, resulting in gigabytes of data even for the most basic queries. Therefore, EIP-7745 further defines the concept of filter maps.

Filter maps

A filter map can be visualized as a 2D bitmap.

MAP_WIDTH: $2^{24}$
MAP_HEIGHT: $2^{16}$

For every log value exactly a single bit is set within a filter map. To achieve a bounded false positive rate even at high log frequencies, a new map is initialized periodically.

VALUES_PER_MAP: $2^{16}$

The filter maps are further grouped into epochs, matching the pruning mechanism for the log value index. From the type definitions in EIP-7745:

type FilterRow = ProgressiveByteList

class LogIndexEpoch(Container):
    filter_maps: Vector[Vector[FilterRow, MAPS_PER_EPOCH], MAP_HEIGHT]
    log_entries: Vector[LogEntry, MAPS_PER_EPOCH * VALUES_PER_MAP]
        # LogEntry containers at the first index of each log event,
        # default(LogEntry) otherwise

class LogIndex(Container):
    epochs: Vector[LogIndexEpoch, MAX_EPOCH_HISTORY]
    next_index: uint64  # next log value index to be added

Filter rows

Within each of the rows of the filter map, one can visualize that the full MAP_WIDTH is further subdivided into VALUES_PER_MAP groups of 8 bits (1 byte) each. For each log value index, the corresponding bit is set in the corresponding group, i.e., log value index 0 sets a bit within 0 ..< 256, log value index 1 sets a bit within 256 ..< 512, and so on. Therefore, there cannot be any collisions where a bit is set multiple times.

The log value itself, namely the address or an individual topic, is then used to determine the exact bit position within the group. Vice versa, this means that if a different bit is set than the one pertaining to any given query that the corresponding log entry is irrelevant and, hence, does not have to be sent.

Further, the log value index itself is also incorporated into the process to determine the bit position. This avoids repeated false positives when the query bit happens to collide with the bit for a popular event such as the ERC-20 token transfer.

From the EIP-7745 logic:

def get_column_index(log_value_index, log_value):
    log_value_width = MAP_WIDTH // VALUES_PER_MAP                      # constant
    column_hash = fnv_1a(to_binary64(log_value_index) + log_value)     # 64-bit FNV-1A hash
    collision_filter = (
        column_hash // (2**64 // log_value_width) +
        column_hash // (2**32 // log_value_width)
    ) % log_value_width
    return log_value_index % VALUES_PER_MAP * log_value_width + collision_filter

This uses the FNV-1a non-cryptographic hash function, similar to the one used by Python. Further note that the input log_value is additionally hashed with sha2(log_value) before calling get_column_index.

def address_value(address):
    return sha2(address)

def topic_value(topic):
    return sha2(topic)

Row mapping

A filter row spanning VALUES_PER_MAP still uses 64 KB, which is still a lot of data for queries spanning a large number of blocks, especially since it is also required to send for all negative search results (to prove completeness). Therefore, each filter map consists of MAP_HEIGHT filter rows; each log event only sets a bit in one of them.

To enable efficient proving, the row assignment should be somewhat stable: if there are multiple entries matching a query, they should all be in the same row. To avoid the collision problem when a popular event such as the ERC-20 token transfer is assigned to the same row, a maximum length is also introduced after which the row assignment is changed. The mapping layer index indicates the number of times that the assignment was changed, and the maximum length increases exponentially on every reassignment until a maximum.

MAX_BASE_ROW_LENGTH: $2^{3}$
LAYER_COMMON_RATIO: $2^{4}$

Layer index	Row capacity
0	`MAX_BASE_ROW_LENGTH * LAYER_COMMON_RATIO**0` (= $2^{3}$ )
1	`MAX_BASE_ROW_LENGTH * LAYER_COMMON_RATIO**1` (= $2^7$ )
2+	`MAX_BASE_ROW_LENGTH * MAPS_PER_EPOCH` (= $2^{10}$ )

Within a log epoch, it is further ensured that the row assignment is stable across multiple maps. The initial row assignment (mapping layer 0) for a log value is only changed once per log epoch, while the row assignment at higher mapping layers is changed more frequently, making proofs even more efficient. EIP-7745 also contains a diagram that visualizes how the row mapping for a particular log value evolves over time at mapping layers 0, 1, and 2.

def get_row_index(map_index, log_value, layer_index):
    layer_factor = MIN(LAYER_COMMON_RATIO ** layer_index, MAPS_PER_EPOCH)
    row_frequency = MAPS_PER_EPOCH // layer_factor
    return from_binary32(sha2(
        log_value +
        to_binary32(map_index - map_index % row_frequency) +
        to_binary32(layer_index)
    )[0:4]) % MAP_HEIGHT

Note that the input log_value is additionally hashed with sha2(log_value) before calling get_row_index.

Representation in the data structure

Filter maps are sparse, i.e., they only have a very small number of bits set relative to their full capacity. To represent them more efficiently, each filter row is encoded as the list of column indices for which the bit has been set. Given the grouping from get_column_index, this means that for each log value, a 24-bit column index is computed, with the top 16 bits indicating the log value index, and the bottom 8 bits indicating the FNV-1a anti-collision hash value. This sequence of 24-bit values is then stored in the log index, in place of the bitmask. From EIP-7745:

# Mark a single log value on the filter maps
def add_log_value(log_index, log_value):
    bytes_per_column = log2(MAP_WIDTH) // 8   # assumes that map width is a power of 256

    map_index = log_index.next_index // VALUES_PER_MAP
    epoch_index = map_index // MAPS_PER_EPOCH
    map_subindex = map_index % MAPS_PER_EPOCH
    column_index = get_column_index(log_index.next_index, log_value)

    row = []
    layer_index = 0
    while True:
        layer_factor = MIN(LAYER_COMMON_RATIO ** layer_index, MAPS_PER_EPOCH)
        max_row_length = MAX_BASE_ROW_LENGTH * layer_factor
        row_index = get_row_index(map_index, log_value, layer_index)
        row = log_index.epochs[epoch_index].filter_maps[row_index][map_subindex]
        if len(row) < max_row_length * bytes_per_column:
            break
        layer_index += 1
        max_row_length = MIN(
            max_row_length * LAYER_COMMON_RATIO,
            MAX_BASE_ROW_LENGTH * MAPS_PER_EPOCH
        )
    row.append(to_binary32(column_index)[:bytes_per_column])

    log_index.next_index += 1

This replaces the stub from the log value index section.