Map Key Errors in Clojure and Python: A Silent Bug Category

Why This Matters

I've been compiling this document for a couple of weeks, with the intent of publishing it, to at least advertise this problem I've been struggling with. One that has been bothering me enough where I've done a ton of little experiments to see if I can ameliorate it, even by a little!

Then just now, I was talking with my friend Scott Prugh who was just complaining about something in Python, and he was lamenting how he wanted to do things in C#. I asked him why, and he was complaining about the same problem that I've been having!

The problem: LLMs will often hallucinate map keys that don't exist. He and I commiserated about how we both spent hours trying to hunt down a problem that would have been solved if there were a data structure more like a class or a struct in C, where you could generate an immediate error if trying to get or set a value that doesn't actually exist.

This is antithetical to so much of Rich Hickey's philosophies around open maps. This document attempts to outline the problem that I've been struggling with for going on the last year when vibe coding or using LLMs on coding problems.

This document is an attempt to chronicle this category of errors and some ongoing experiments I've been trying to conduct to create a little more rigor in detecting these problems much earlier, or ideally even preventing them.

The Problem

Clojure's dynamic map access can silently fail when key names are wrong. This creates a class of bugs that are particularly insidious when working with LLM-generated code, as LLMs often hallucinate map key names.

Real-World Examples from This Project

Example 1: Reaction Keys Mismatch (2025-01)

Bug: Missing emojis in message reactions display

Root Cause: Code destructured using wrong keys:

;; WRONG - keys don't exist in data
(for [{:keys [emoji-name reaction-count]} (:reactions message)
      :when (pos? reaction-count)]
  ...)

;; Data actually has -- :name, not :emoji-name.  DOH!
{:name "thumbsup" :count 5 :users [...]}

Consequence:

emoji-name → nil
reaction-count → nil
(pos? nil) → NullPointerException
All reactions silently disappeared from UI

Location: src/slack_archive/web/views/core.clj:187

Example 2: Reply Users Key Format Inconsistency (2025-01)

Bug: Display showed "1 reply from 0 users" (impossible state)

Root Cause: Inconsistent key naming between data sources: snake-case from JSON, not kabob-case from EDN!

;; Code looked for hyphenated key
(:reply-users message)  ;; → nil

;; But top-posts.edn data used underscored key
{:reply_users ["U123"] :reply_users_count 1}

Consequence:

reply-users → nil
(count nil) → 0
Display showed "1 reply from 0 users"

Location: src/slack_archive/web/views/core.clj:212

Example 3: Date vs Day Field Confusion (2025-01)

Bug: User profile pages showed zero message counts in both user-only and all-messages modes

Root Cause: Wrong field name used when computing date frequencies -- :date, not :day. DOH!

;; WRONG - :date field doesn't exist in messages
(frequencies (map :date all-messages))  ;; → {nil 703}

;; Canonical schema uses :day field
(frequencies (map :day all-messages))   ;; → {"2025-09-15" 2, "2025-09-22" 5, ...}

Consequence:

(map :date messages) → (nil nil nil ...)
(frequencies ...) → {nil 703} (all messages counted under nil key)
(get date-counts "2025-09-15") → nil (actual date strings returned nil)
Sidebar showed dates but no message counts

Location: src/slack_archive/web/server.clj:349,356

Why It Happened: LLM saw db/all-dates function and assumed messages use :date field to match the function name, but the canonical schema actually uses :day field. The function name all-dates was misleading - it returns dates but extracts them from :day field.

Why This Happens

Clojure's Dynamic Nature: (get {:a 1} :b) returns nil without error
LLM Hallucination: LLMs guess key names based on convention, not reality
Multiple Data Sources: Different parts of codebase use different conventions
- Canonical schema: :reply-users (hyphenated)
- Slack API raw: reply_users (underscored)
- Transformed data: :emoji-name (descriptive)
Silent Failures: No compile-time or runtime errors until nil is used

Countermeasures

1. Defensive Key Access (Implemented)

Use or to check multiple possible key names:

;; Handle both hyphenated and underscored variants
(let [reply-users (or (:reply-users message) (:reply_users message))]
  ...)

;; Handle multiple data formats
(let [emoji-name (or (:emoji-name reaction) (:name reaction))
      count (or (:reaction-count reaction) (:count reaction))]
  ...)

Pros: Works immediately, handles legacy data Cons: Verbose, doesn't prevent new errors

2. Schema Validation (Future Consideration)

Use Malli or Clojure Spec to validate data shapes:

(require '[malli.core :as m])

(def Reaction
  [:map
   [:name string?]
   [:count pos-int?]
   [:users [:vector string?]]])

(defn validate-reactions [reactions]
  (m/validate [:vector Reaction] reactions))

Pros: Catches errors at data boundaries Cons: Runtime overhead, requires schema maintenance

3. Accessor Functions (Recommended for Critical Paths)

Create getter functions with clear error messages:

(defn get-reaction-name
  "Get reaction emoji name, handling both canonical and enriched formats"
  [reaction]
  (or (:emoji-name reaction) 
      (:name reaction)
      (throw (ex-info "Reaction missing name key" {:reaction reaction}))))

(defn get-reply-users
  "Get reply users list, handling both hyphenated and underscored keys"
  [message]
  (or (:reply-users message)
      (:reply_users message)
      []))  ;; Default to empty list

Pros:

Self-documenting
Clear error messages
Single source of truth for key access logic

Cons: More boilerplate

4. defrecord with Protocols (Most Type-Safe)

For critical data structures, use records:

(defrecord Reaction [name count users])

(defprotocol IReaction
  (reaction-name [this])
  (reaction-count [this]))

(extend-protocol IReaction
  Reaction
  (reaction-name [this] (:name this))
  (reaction-count [this] (:count this))
  
  ;; Handle legacy maps
  clojure.lang.IPersistentMap
  (reaction-name [this] (or (:emoji-name this) (:name this)))
  (reaction-count [this] (or (:reaction-count this) (:count this))))

Pros:

Compile-time checking for records
Protocol dispatch for different types
Clear boundaries

Cons:

More complex
Requires converting between maps and records

5. Comprehensive Testing (Currently Implemented)

Write tests that use actual data formats:

(deftest handles-both-key-formats
  (testing "canonical format"
    (let [reaction {:name "thumbsup" :count 5}]
      (is (= "thumbsup" (get-reaction-name reaction)))))
  
  (testing "enriched format"
    (let [reaction {:emoji-name "thumbsup" :reaction-count 5}]
      (is (= "thumbsup" (get-reaction-name reaction))))))

Pros: Catches regressions, documents expected formats Cons: Only catches what you test

Recommendations

Short Term (Current Approach):
- Use defensive (or ...) patterns for known inconsistencies
- Add comprehensive tests for both data formats
- Document expected key formats in docstrings
Medium Term:
- Create accessor functions for frequently accessed nested data
- Add schema validation at data input boundaries (API responses, file loads)
- Use clojure.spec.alpha/instrument in development
Long Term (If problem persists):
- Consider defrecord for core data structures
- Implement protocols for polymorphic access
- Add compile-time checking via type hints

Detection Strategies

Code Review Checklist

Are map keys accessed directly in multiple places?
Do multiple data sources provide the same logical data?
Are there nil checks immediately after map access?
Could a misspelled key cause silent failure?

REPL Inspection

;; Check actual keys in data
(keys message)

;; Find nil values that might be wrong keys
(->> message
     (filter (fn [[k v]] (nil? v)))
     (into {}))

;; Compare expected vs actual keys
(def expected-keys #{:reply-users :reaction-count})
(def actual-keys (set (keys message)))
(clojure.set/difference expected-keys actual-keys)

Runtime Monitoring

;; Add defensive assertions in development
(defn dev-assert-keys [m expected-keys context]
  (when (System/getenv "DEV_MODE")
    (let [actual-keys (set (keys m))
          missing (clojure.set/difference expected-keys actual-keys)]
      (when (seq missing)
        (println "WARNING: Missing keys in" context ":" missing)))))

;; Usage
(dev-assert-keys message #{:reply-users :user :ts} "message data")

Related Issues

LLM hallucination of field names
Inconsistent naming conventions (hyphen vs underscore)
Multiple data transformation layers
Clojure's permissive nil handling

Case Study: Date vs Day Bug - Cost Analysis

Problem Discovery Process

Initial Symptom: "Show all messages show zero messages!!!" - User reported no message counts displaying

Debugging Path:

Initial assumption: Feature not working after server restart
Browser inspection: HTML rendering but no counts showing
REPL inspection: Checked what field messages actually have
Root cause: Used :date field which returns nil, not :day field

Time to Root Cause: ~10 minutes with REPL inspection Time to Fix: 2 minutes (change :date → :day in two places) Time to Verify: 3 minutes (tests + browser verification)

Why This Was Hard to Catch

Tests Passed Initially: Test data had BOTH :day and :date fields because we were being defensive
Silent Failure: (frequencies (map :date messages)) → {nil 703} - no error, just wrong data
Function Name Misleading: db/all-dates suggests using :date field, but actually uses :day
LLM Pattern Matching: I saw "dates" function and assumed :date key without checking

Impact Assessment

Cost:

Development time: ~15 minutes total
User friction: Feature appeared broken after implementation
Confidence loss: "Why doesn't this work?"

Could Have Been Worse:

If deployed to production: Users would see broken feature
If data had been corrupted: Would need rollback
If tests hadn't caught it: Would have shipped broken code

What Made This Easy to Fix

REPL-driven development: Could immediately check (:day msg) vs (:date msg)
Isolated change: Only two lines needed fixing
Good test coverage: Tests caught the issue once test data was corrected
Clear error boundary: Problem was localized to one function

Long-Term Solution Considerations

Option 1: Better Naming (Low effort, immediate value)

Rename all-dates → all-days to match field name
Add docstring: "Returns sorted list of :day field values"
Pros: Self-documenting, prevents confusion
Cons: Breaking change for existing code

Option 2: Accessor Functions (Medium effort, high value)

(defn message-day
  "Get the day string from a message.
  Returns: String in YYYY-MM-DD format"
  [message]
  (or (:day message)
      (throw (ex-info "Message missing :day field" {:message message}))))

Pros: Single source of truth, clear errors
Cons: More boilerplate, need migration

Option 3: Schema Validation (High effort, highest value)

(def Message
  [:map
   [:day string?]  ;; Required: YYYY-MM-DD format
   [:ts string?]   ;; Required: Slack timestamp
   [:user string?] ;; Required: User ID
   ;; ... more fields
   ])

Pros: Catches errors at boundaries, documents schema
Cons: Runtime overhead, requires Malli/Spec setup

Recommended Immediate Action

Rename function all-dates → all-days (or keep name but add clear docstring)

Add assertion in development mode:

(when (dev-mode?)
  (assert (every? :day messages) "Messages missing :day field"))

Update CLAUDE.md with common field names:

## Message Schema Quick Reference
- `:day` - Date string (YYYY-MM-DD) - NOT :date
- `:ts` - Slack timestamp
- `:user` - User ID string

Cost-Benefit Analysis

Current Approach (Defensive or patterns):

Cost: Low (already implemented)
Benefit: Handles multiple formats
Weakness: Silent failures, no prevention

Accessor Functions:

Cost: Medium (need to write ~10-15 functions)
Benefit: Clear errors, self-documenting
Weakness: Boilerplate, need adoption

Schema Validation:

Cost: High (Malli setup, schema definitions, performance testing)
Benefit: Catches ALL schema errors, prevents new bugs
Weakness: Runtime overhead, requires expertise

Recommendation: Hybrid Approach

Now (0 effort): Update this document with field name reference
Next (Low effort): Add development-mode assertions at data boundaries
Soon (Medium effort): Create accessor functions for most-used fields
Eventually (High effort): Add Malli validation when schema stabilizes

The :date vs :day bug shows that even with good test coverage, map key errors slip through when test data is overly defensive. The fix was trivial once found, but discovery took longer than it should have. Investment in schema validation or accessor functions would pay off if these bugs continue to occur frequently.

realgenekim/open-map-key-errors.md

Select an option

No results found

Select an option

No results found

Map Key Errors in Clojure and Python: A Silent Bug Category

Why This Matters

The Problem

Real-World Examples from This Project

Example 1: Reaction Keys Mismatch (2025-01)

Example 2: Reply Users Key Format Inconsistency (2025-01)

Example 3: Date vs Day Field Confusion (2025-01)

Why This Happens

Countermeasures

1. Defensive Key Access (Implemented)

2. Schema Validation (Future Consideration)

3. Accessor Functions (Recommended for Critical Paths)

4. defrecord with Protocols (Most Type-Safe)

5. Comprehensive Testing (Currently Implemented)

Recommendations

Detection Strategies

Code Review Checklist

REPL Inspection

Runtime Monitoring

Related Issues

See Also

Case Study: Date vs Day Bug - Cost Analysis

Problem Discovery Process

Why This Was Hard to Catch

Impact Assessment

What Made This Easy to Fix

Long-Term Solution Considerations

Recommended Immediate Action

Cost-Benefit Analysis

Recommendation: Hybrid Approach