Skip to content

Instantly share code, notes, and snippets.

@AdvaithD
Created April 8, 2026 22:48
Show Gist options
  • Select an option

  • Save AdvaithD/da4decfe224bc3128971e5c3a783043a to your computer and use it in GitHub Desktop.

Select an option

Save AdvaithD/da4decfe224bc3128971e5c3a783043a to your computer and use it in GitHub Desktop.
Prod-v2 memetics agentic tab runbook

Prod-v2 Memetics Agentic Empty Tab Runbook

Date: 2026-04-08

Summary

The memetics "agentic" tab is empty in prod-v2 because prod Talos state-service is missing v4 pool mappings for agentic pools that do exist in ClickHouse. Nonprod works because it materializes those v4 pools into Talos state correctly.

This is not a frontend bug.

The strongest current theory is:

  1. Prod-v2 has v4 topic/config drift between state-service API and writer.
  2. Prod-v2 writer either cannot consume the intended v4 live topic path or is consuming the wrong one.
  3. The state-service bootstrap path is likely insufficient for v4 pools because the affected pool exists in v4 ClickHouse views but not in prod_hermaus_evm_pools.

Concrete Example Pool

  • Chain: Base (8453)
  • Pool: 0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9
  • Token: 0x5f980dcfc4c0fa3911554cf5ab288ed0eb13dba3

Behavior:

  • Prod-v2 state-service:
    • token-for-pool returns null
    • /stats/v2 returns pool token mapping not found
  • Nonprod state-service:
    • token-for-pool resolves
    • /stats/v2 returns stats with "agentic": true

What Exists In Source Data

The source data exists in both envs for the example pool:

  • evm_v4_pools_latest_view
    • prod-v2: 1 row
    • nonprod: 1 row
  • azura.evm_v4_token_state_view
    • prod-v2: about 9005 rows
    • nonprod: about 9084 rows
    • both envs have the same first timestamp window for this pool

Important:

  • The example pool does not appear to come from prod_hermaus_evm_pools.
  • That matters because state-service bootstrap uses prod_hermaus_evm_pools for its incremental EVM pool bootstrap in bootstrap/pools.rs.

How Nonprod Bootstraps V4

There are two separate bootstrap/materialization paths involved.

Trade Normalizer V4 bootstrap

In bootstrap.rs:

  • bootstrap_pool_metadata() reads evm_v4_pools_last_48hr_view
  • bootstrap_clickhouse_liquidity_state() reads azura.evm_v4_token_state_view
  • then it replays Kafka liquidity-state input

Pool lookup for v4 metadata comes from evm_v4_pools_latest_view in clickhouse.rs and enricher/mod.rs.

State-service bootstrap

In bootstrap/pools.rs, state-service bootstraps pool mappings from:

  • S3 snapshot: backups/evm-pools.arrow
  • ClickHouse incremental table: prod_hermaus_evm_pools
  • live topics handled in direct_source_domains.rs

For v4 live repair/materialization, the relevant handlers are:

  • project_evm_v4_pool
  • project_evm_v4_liquidity_state

Those are driven by:

  • l1.evm.v4.new_pools
  • l1.evm.v4.liquidity_state.v1

Config Diff: Prod-v2 vs Nonprod

Most important live secret differences

Secret names:

  • prod-v2: prod-v2-state-service-secrets
  • nonprod: nonprod-talos-state-service-secrets

Observed diffs:

  • SOURCE_TOPIC_OVERRIDE_EVM_V4_LIQUIDITY_STATE

    • prod-v2: l3.market_stats.evm_v4_pool_liquidity_state.v1
    • nonprod: not set
  • INIT

    • prod-v2: true
    • nonprod: false
  • TALOS_EVM_V4_HOOK_ALLOWLIST

    • same count
    • same hash
    • not the root cause
  • ENABLE_BOOTSTRAP_METADATA

    • both false
  • STATS_BOOTSTRAP_REPLAY_HOURS

    • both 48

Effective runtime difference

Because runtime_config.rs gives topic overrides precedence:

  • nonprod API and writer are both effectively on canonical v4 topics

    • l1.evm.v4.new_pools
    • l1.evm.v4.liquidity_state.v1
  • prod-v2 API is effectively on

    • l1.evm.v4.new_pools
    • l3.market_stats.evm_v4_pool_liquidity_state.v1
  • prod-v2 writer is effectively on

    • l1.evm.v4.new_pools
    • l1.evm.v4.liquidity_state.v1

This API/writer split does not exist in nonprod.

Pod/rollout drift

Prod-v2 API pods restarted around 2026-04-08T21:48Z and picked up a newer secret revision.

Prod-v2 writer pod was older at 2026-04-08T21:11Z and did not show the same secret revision annotation.

This means prod-v2 is not just "different from nonprod". It is internally inconsistent right now.

Additional runtime diff

API secondary RocksDB path:

  • prod-v2: /tmp/talos-state-service-secondary/store.rocksdb
  • nonprod: /var/lib/talos-state-service/secondary/store.rocksdb

This is real drift, but it is not the primary explanation for the missing v4 mappings.

Strongest Evidence From Logs

Prod-v2 writer logs show:

  • INIT requested; removed existing RocksDB state before bootstrap
  • bootstrap ran from:
    • backups/evm-pools.arrow
    • prod_hermaus_evm_pools
  • then the writer attempted to use canonical v4 topics
  • previous checks also showed UnknownTopicOrPartition for:
    • l1.evm.v4.new_pools
    • l1.evm.v4.liquidity_state.v1

Prod-v2 API logs show:

  • API started with source_topics including l3.market_stats.evm_v4_pool_liquidity_state.v1

Nonprod API logs show:

  • API started with canonical l1.evm.v4.liquidity_state.v1

Working Theory

The current best explanation is:

  1. Prod-v2 ClickHouse has the v4 pool and liquidity-state source data.
  2. Prod-v2 Talos state-service does not materialize that data into PoolToToken and TokenToPoolsEdge.
  3. The prod-v2 live topic path for v4 is misconfigured or split.
  4. Bootstrap may also be incomplete for v4 because prod_hermaus_evm_pools is not enough for these pools.

Action Plan

Phase 1: Fix config drift first

  1. Decide the intended prod-v2 v4 liquidity-state source topic. Options:

    • canonical l1.evm.v4.liquidity_state.v1
    • derived l3.market_stats.evm_v4_pool_liquidity_state.v1
  2. Make API and writer use the same topic. Current prod-v2 split must be removed.

  3. Remove SOURCE_TOPIC_OVERRIDE_EVM_V4_LIQUIDITY_STATE from prod-v2 if canonical l1 is the intended source.

  4. Roll both:

    • deployment/talos-state-service
    • statefulset/talos-state-service-writer

Phase 2: Verify broker/topic reality

If canonical l1 is supposed to be used:

  1. Confirm these topics exist in prod-v2 broker:

    • l1.evm.v4.new_pools
    • l1.evm.v4.liquidity_state.v1
  2. Confirm they are receiving records.

If prod-v2 intentionally moved to l3.market_stats.evm_v4_pool_liquidity_state.v1:

  1. Confirm the writer is updated to consume that topic too.
  2. Do not leave API and writer on different topic sources.

Phase 3: Rebuild writer state after topic fix

After the topic source is corrected:

  1. Rebuild the writer state.
  2. Let state-service bootstrap and replay complete.
  3. Re-check the concrete missing pool mapping.

Phase 4: Decide whether a bootstrap backfill fix is still needed

If the corrected writer still cannot materialize the example pool:

  1. Inspect whether backups/evm-pools.arrow includes that pool.
  2. If not, update the bootstrap source for v4 pool mappings.

Likely fix options:

  • regenerate backups/evm-pools.arrow with v4 pool coverage
  • or add an explicit v4 bootstrap source from:
    • evm_v4_pools_latest_view
    • or another v4-specific pool table/view

Commands To Run

Check live source-topic config

kubectl --context prod-v2 -n talos-v2 logs deploy/talos-state-service --since=2h \
  | rg 'starting talos-state-service runtime|source_topics'

kubectl --context prod-v2 -n talos-v2 logs statefulset/talos-state-service-writer --since=2h \
  | rg 'starting talos-state-service runtime|source_topics|UnknownTopicOrPartition'

kubectl --context nonprod -n talos-v2 logs deploy/talos-state-service --since=24h \
  | rg 'starting talos-state-service runtime|source_topics'

Check the missing mapping directly

curl -sS http://127.0.0.1:18085/v1/read/token-for-pool \
  -H 'content-type: application/json' \
  --data '{"chain_id":8453,"pool_address":"0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9"}'
curl -sS http://127.0.0.1:18085/stats/v2 \
  -H 'content-type: application/json' \
  --data '{"chainId":8453,"pool":"0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9"}'

Verify the market output

curl -sS http://127.0.0.1:18081/api/agentic/1h \
  -H 'content-type: application/json' \
  --data '{"chains":[8453],"sortBy":"volume","sortOrder":"desc","page":1,"limit":10,"filters":{}}'

ClickHouse Queries

Use these in staging and prod.

Does the v4 pool exist?

SELECT count()
FROM evm_v4_pools_latest_view
WHERE chain_id = 8453
  AND lower(toString(pool_id)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';

Does the v4 liquidity-state data exist?

SELECT count(), min(timestamp), max(timestamp)
FROM azura.evm_v4_token_state_view
WHERE chain_id = 8453
  AND lower(toString(pool_id)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';

Is it missing from the state-service bootstrap incremental table?

SELECT count()
FROM prod_hermaus_evm_pools FINAL
WHERE chain_id = 8453
  AND lower(toString(address)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';

Interpretation:

  • first query should be nonzero in both envs
  • second query should be nonzero in both envs
  • third query may be zero, which supports the bootstrap-gap theory

Success Criteria

The incident is fixed when all of these are true in prod-v2:

  1. token-for-pool resolves the example v4 pool.
  2. /stats/v2 no longer returns pool token mapping not found.
  3. /api/agentic/1h returns non-empty results.
  4. The memetics agentic tab repopulates in prod.

Important Caution

Do not assume the generic azura-infra Talos Doppler secret is the one backing state-service.

The live secret under investigation here is:

  • prod-v2-state-service-secrets in namespace talos-v2

Nonprod state-service secrets are created by the Talos Terraform module in secrets.tf. Before editing prod config, confirm the source of truth for the prod-v2 state-service secret and do not patch the wrong Doppler config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment