Date: 2026-04-08
The memetics "agentic" tab is empty in prod-v2 because prod Talos state-service is missing v4 pool mappings for agentic pools that do exist in ClickHouse. Nonprod works because it materializes those v4 pools into Talos state correctly.
This is not a frontend bug.
The strongest current theory is:
- Prod-v2 has v4 topic/config drift between state-service API and writer.
- Prod-v2 writer either cannot consume the intended v4 live topic path or is consuming the wrong one.
- The state-service bootstrap path is likely insufficient for v4 pools because the affected pool exists in v4 ClickHouse views but not in
prod_hermaus_evm_pools.
- Chain: Base (
8453) - Pool:
0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9 - Token:
0x5f980dcfc4c0fa3911554cf5ab288ed0eb13dba3
Behavior:
- Prod-v2 state-service:
token-for-poolreturnsnull/stats/v2returnspool token mapping not found
- Nonprod state-service:
token-for-poolresolves/stats/v2returns stats with"agentic": true
The source data exists in both envs for the example pool:
evm_v4_pools_latest_view- prod-v2:
1row - nonprod:
1row
- prod-v2:
azura.evm_v4_token_state_view- prod-v2: about
9005rows - nonprod: about
9084rows - both envs have the same first timestamp window for this pool
- prod-v2: about
Important:
- The example pool does not appear to come from
prod_hermaus_evm_pools. - That matters because state-service bootstrap uses
prod_hermaus_evm_poolsfor its incremental EVM pool bootstrap in bootstrap/pools.rs.
There are two separate bootstrap/materialization paths involved.
In bootstrap.rs:
bootstrap_pool_metadata()readsevm_v4_pools_last_48hr_viewbootstrap_clickhouse_liquidity_state()readsazura.evm_v4_token_state_view- then it replays Kafka liquidity-state input
Pool lookup for v4 metadata comes from evm_v4_pools_latest_view in clickhouse.rs and enricher/mod.rs.
In bootstrap/pools.rs, state-service bootstraps pool mappings from:
- S3 snapshot:
backups/evm-pools.arrow - ClickHouse incremental table:
prod_hermaus_evm_pools - live topics handled in direct_source_domains.rs
For v4 live repair/materialization, the relevant handlers are:
project_evm_v4_poolproject_evm_v4_liquidity_state
Those are driven by:
l1.evm.v4.new_poolsl1.evm.v4.liquidity_state.v1
Secret names:
- prod-v2:
prod-v2-state-service-secrets - nonprod:
nonprod-talos-state-service-secrets
Observed diffs:
-
SOURCE_TOPIC_OVERRIDE_EVM_V4_LIQUIDITY_STATE- prod-v2:
l3.market_stats.evm_v4_pool_liquidity_state.v1 - nonprod: not set
- prod-v2:
-
INIT- prod-v2:
true - nonprod:
false
- prod-v2:
-
TALOS_EVM_V4_HOOK_ALLOWLIST- same count
- same hash
- not the root cause
-
ENABLE_BOOTSTRAP_METADATA- both
false
- both
-
STATS_BOOTSTRAP_REPLAY_HOURS- both
48
- both
Because runtime_config.rs gives topic overrides precedence:
-
nonprod API and writer are both effectively on canonical v4 topics
l1.evm.v4.new_poolsl1.evm.v4.liquidity_state.v1
-
prod-v2 API is effectively on
l1.evm.v4.new_poolsl3.market_stats.evm_v4_pool_liquidity_state.v1
-
prod-v2 writer is effectively on
l1.evm.v4.new_poolsl1.evm.v4.liquidity_state.v1
This API/writer split does not exist in nonprod.
Prod-v2 API pods restarted around 2026-04-08T21:48Z and picked up a newer secret revision.
Prod-v2 writer pod was older at 2026-04-08T21:11Z and did not show the same secret revision annotation.
This means prod-v2 is not just "different from nonprod". It is internally inconsistent right now.
API secondary RocksDB path:
- prod-v2:
/tmp/talos-state-service-secondary/store.rocksdb - nonprod:
/var/lib/talos-state-service/secondary/store.rocksdb
This is real drift, but it is not the primary explanation for the missing v4 mappings.
Prod-v2 writer logs show:
INIT requested; removed existing RocksDB state before bootstrap- bootstrap ran from:
backups/evm-pools.arrowprod_hermaus_evm_pools
- then the writer attempted to use canonical v4 topics
- previous checks also showed
UnknownTopicOrPartitionfor:l1.evm.v4.new_poolsl1.evm.v4.liquidity_state.v1
Prod-v2 API logs show:
- API started with
source_topicsincludingl3.market_stats.evm_v4_pool_liquidity_state.v1
Nonprod API logs show:
- API started with canonical
l1.evm.v4.liquidity_state.v1
The current best explanation is:
- Prod-v2 ClickHouse has the v4 pool and liquidity-state source data.
- Prod-v2 Talos state-service does not materialize that data into
PoolToTokenandTokenToPoolsEdge. - The prod-v2 live topic path for v4 is misconfigured or split.
- Bootstrap may also be incomplete for v4 because
prod_hermaus_evm_poolsis not enough for these pools.
-
Decide the intended prod-v2 v4 liquidity-state source topic. Options:
- canonical
l1.evm.v4.liquidity_state.v1 - derived
l3.market_stats.evm_v4_pool_liquidity_state.v1
- canonical
-
Make API and writer use the same topic. Current prod-v2 split must be removed.
-
Remove
SOURCE_TOPIC_OVERRIDE_EVM_V4_LIQUIDITY_STATEfrom prod-v2 if canonicall1is the intended source. -
Roll both:
deployment/talos-state-servicestatefulset/talos-state-service-writer
If canonical l1 is supposed to be used:
-
Confirm these topics exist in prod-v2 broker:
l1.evm.v4.new_poolsl1.evm.v4.liquidity_state.v1
-
Confirm they are receiving records.
If prod-v2 intentionally moved to l3.market_stats.evm_v4_pool_liquidity_state.v1:
- Confirm the writer is updated to consume that topic too.
- Do not leave API and writer on different topic sources.
After the topic source is corrected:
- Rebuild the writer state.
- Let state-service bootstrap and replay complete.
- Re-check the concrete missing pool mapping.
If the corrected writer still cannot materialize the example pool:
- Inspect whether
backups/evm-pools.arrowincludes that pool. - If not, update the bootstrap source for v4 pool mappings.
Likely fix options:
- regenerate
backups/evm-pools.arrowwith v4 pool coverage - or add an explicit v4 bootstrap source from:
evm_v4_pools_latest_view- or another v4-specific pool table/view
kubectl --context prod-v2 -n talos-v2 logs deploy/talos-state-service --since=2h \
| rg 'starting talos-state-service runtime|source_topics'
kubectl --context prod-v2 -n talos-v2 logs statefulset/talos-state-service-writer --since=2h \
| rg 'starting talos-state-service runtime|source_topics|UnknownTopicOrPartition'
kubectl --context nonprod -n talos-v2 logs deploy/talos-state-service --since=24h \
| rg 'starting talos-state-service runtime|source_topics'curl -sS http://127.0.0.1:18085/v1/read/token-for-pool \
-H 'content-type: application/json' \
--data '{"chain_id":8453,"pool_address":"0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9"}'curl -sS http://127.0.0.1:18085/stats/v2 \
-H 'content-type: application/json' \
--data '{"chainId":8453,"pool":"0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9"}'curl -sS http://127.0.0.1:18081/api/agentic/1h \
-H 'content-type: application/json' \
--data '{"chains":[8453],"sortBy":"volume","sortOrder":"desc","page":1,"limit":10,"filters":{}}'Use these in staging and prod.
SELECT count()
FROM evm_v4_pools_latest_view
WHERE chain_id = 8453
AND lower(toString(pool_id)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';SELECT count(), min(timestamp), max(timestamp)
FROM azura.evm_v4_token_state_view
WHERE chain_id = 8453
AND lower(toString(pool_id)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';SELECT count()
FROM prod_hermaus_evm_pools FINAL
WHERE chain_id = 8453
AND lower(toString(address)) = '0xec33256bf1ded407a57fd3c1965e7556e42ac14db09bc4e6fef57d5e2eb0b0b9';Interpretation:
- first query should be nonzero in both envs
- second query should be nonzero in both envs
- third query may be zero, which supports the bootstrap-gap theory
The incident is fixed when all of these are true in prod-v2:
token-for-poolresolves the example v4 pool./stats/v2no longer returnspool token mapping not found./api/agentic/1hreturns non-empty results.- The memetics agentic tab repopulates in prod.
Do not assume the generic azura-infra Talos Doppler secret is the one backing state-service.
The live secret under investigation here is:
prod-v2-state-service-secretsin namespacetalos-v2
Nonprod state-service secrets are created by the Talos Terraform module in secrets.tf. Before editing prod config, confirm the source of truth for the prod-v2 state-service secret and do not patch the wrong Doppler config.