Skip to content

Instantly share code, notes, and snippets.

@realgenekim
Last active November 4, 2025 04:45
Show Gist options
  • Select an option

  • Save realgenekim/b09f4eb8bcf8eb36ef640ff8a8de5809 to your computer and use it in GitHub Desktop.

Select an option

Save realgenekim/b09f4eb8bcf8eb36ef640ff8a8de5809 to your computer and use it in GitHub Desktop.
LLMs have problems with Clojure open maps -- and so does Python

Map Key Errors in Clojure and Python: A Silent Bug Category

Why This Matters

I've been compiling this document for a couple of weeks, with the intent of publishing it, to at least advertise this problem I've been struggling with. One that has been bothering me enough where I've done a ton of little experiments to see if I can ameliorate it, even by a little!

Then just now, I was talking with my friend Scott Prugh who was just complaining about something in Python, and he was lamenting how he wanted to do things in C#. I asked him why, and he was complaining about the same problem that I've been having!

The problem: LLMs will often hallucinate map keys that don't exist. He and I commiserated about how we both spent hours trying to hunt down a problem that would have been solved if there were a data structure more like a class or a struct in C, where you could generate an immediate error if trying to get or set a value that doesn't actually exist.

This is antithetical to so much of Rich Hickey's philosophies around open maps. This document attempts to outline the problem that I've been struggling with for going on the last year when vibe coding or using LLMs on coding problems.

This document is an attempt to chronicle this category of errors and some ongoing experiments I've been trying to conduct to create a little more rigor in detecting these problems much earlier, or ideally even preventing them.

The Problem

Clojure's dynamic map access can silently fail when key names are wrong. This creates a class of bugs that are particularly insidious when working with LLM-generated code, as LLMs often hallucinate map key names.

Real-World Examples from This Project

Example 1: Reaction Keys Mismatch (2025-01)

Bug: Missing emojis in message reactions display

Root Cause: Code destructured using wrong keys:

;; WRONG - keys don't exist in data
(for [{:keys [emoji-name reaction-count]} (:reactions message)
      :when (pos? reaction-count)]
  ...)

;; Data actually has -- :name, not :emoji-name.  DOH!
{:name "thumbsup" :count 5 :users [...]}

Consequence:

  • emoji-namenil
  • reaction-countnil
  • (pos? nil)NullPointerException
  • All reactions silently disappeared from UI

Location: src/slack_archive/web/views/core.clj:187

Example 2: Reply Users Key Format Inconsistency (2025-01)

Bug: Display showed "1 reply from 0 users" (impossible state)

Root Cause: Inconsistent key naming between data sources: snake-case from JSON, not kabob-case from EDN!

;; Code looked for hyphenated key
(:reply-users message)  ;; → nil

;; But top-posts.edn data used underscored key
{:reply_users ["U123"] :reply_users_count 1}

Consequence:

  • reply-usersnil
  • (count nil)0
  • Display showed "1 reply from 0 users"

Location: src/slack_archive/web/views/core.clj:212

Example 3: Date vs Day Field Confusion (2025-01)

Bug: User profile pages showed zero message counts in both user-only and all-messages modes

Root Cause: Wrong field name used when computing date frequencies -- :date, not :day. DOH!

;; WRONG - :date field doesn't exist in messages
(frequencies (map :date all-messages))  ;; → {nil 703}

;; Canonical schema uses :day field
(frequencies (map :day all-messages))   ;; → {"2025-09-15" 2, "2025-09-22" 5, ...}

Consequence:

  • (map :date messages)(nil nil nil ...)
  • (frequencies ...){nil 703} (all messages counted under nil key)
  • (get date-counts "2025-09-15")nil (actual date strings returned nil)
  • Sidebar showed dates but no message counts

Location: src/slack_archive/web/server.clj:349,356

Why It Happened: LLM saw db/all-dates function and assumed messages use :date field to match the function name, but the canonical schema actually uses :day field. The function name all-dates was misleading - it returns dates but extracts them from :day field.

Why This Happens

  1. Clojure's Dynamic Nature: (get {:a 1} :b) returns nil without error
  2. LLM Hallucination: LLMs guess key names based on convention, not reality
  3. Multiple Data Sources: Different parts of codebase use different conventions
    • Canonical schema: :reply-users (hyphenated)
    • Slack API raw: reply_users (underscored)
    • Transformed data: :emoji-name (descriptive)
  4. Silent Failures: No compile-time or runtime errors until nil is used

Countermeasures

1. Defensive Key Access (Implemented)

Use or to check multiple possible key names:

;; Handle both hyphenated and underscored variants
(let [reply-users (or (:reply-users message) (:reply_users message))]
  ...)

;; Handle multiple data formats
(let [emoji-name (or (:emoji-name reaction) (:name reaction))
      count (or (:reaction-count reaction) (:count reaction))]
  ...)

Pros: Works immediately, handles legacy data Cons: Verbose, doesn't prevent new errors

2. Schema Validation (Future Consideration)

Use Malli or Clojure Spec to validate data shapes:

(require '[malli.core :as m])

(def Reaction
  [:map
   [:name string?]
   [:count pos-int?]
   [:users [:vector string?]]])

(defn validate-reactions [reactions]
  (m/validate [:vector Reaction] reactions))

Pros: Catches errors at data boundaries Cons: Runtime overhead, requires schema maintenance

3. Accessor Functions (Recommended for Critical Paths)

Create getter functions with clear error messages:

(defn get-reaction-name
  "Get reaction emoji name, handling both canonical and enriched formats"
  [reaction]
  (or (:emoji-name reaction) 
      (:name reaction)
      (throw (ex-info "Reaction missing name key" {:reaction reaction}))))

(defn get-reply-users
  "Get reply users list, handling both hyphenated and underscored keys"
  [message]
  (or (:reply-users message)
      (:reply_users message)
      []))  ;; Default to empty list

Pros:

  • Self-documenting
  • Clear error messages
  • Single source of truth for key access logic

Cons: More boilerplate

4. defrecord with Protocols (Most Type-Safe)

For critical data structures, use records:

(defrecord Reaction [name count users])

(defprotocol IReaction
  (reaction-name [this])
  (reaction-count [this]))

(extend-protocol IReaction
  Reaction
  (reaction-name [this] (:name this))
  (reaction-count [this] (:count this))
  
  ;; Handle legacy maps
  clojure.lang.IPersistentMap
  (reaction-name [this] (or (:emoji-name this) (:name this)))
  (reaction-count [this] (or (:reaction-count this) (:count this))))

Pros:

  • Compile-time checking for records
  • Protocol dispatch for different types
  • Clear boundaries

Cons:

  • More complex
  • Requires converting between maps and records

5. Comprehensive Testing (Currently Implemented)

Write tests that use actual data formats:

(deftest handles-both-key-formats
  (testing "canonical format"
    (let [reaction {:name "thumbsup" :count 5}]
      (is (= "thumbsup" (get-reaction-name reaction)))))
  
  (testing "enriched format"
    (let [reaction {:emoji-name "thumbsup" :reaction-count 5}]
      (is (= "thumbsup" (get-reaction-name reaction))))))

Pros: Catches regressions, documents expected formats Cons: Only catches what you test

Recommendations

  1. Short Term (Current Approach):

    • Use defensive (or ...) patterns for known inconsistencies
    • Add comprehensive tests for both data formats
    • Document expected key formats in docstrings
  2. Medium Term:

    • Create accessor functions for frequently accessed nested data
    • Add schema validation at data input boundaries (API responses, file loads)
    • Use clojure.spec.alpha/instrument in development
  3. Long Term (If problem persists):

    • Consider defrecord for core data structures
    • Implement protocols for polymorphic access
    • Add compile-time checking via type hints

Detection Strategies

Code Review Checklist

  • Are map keys accessed directly in multiple places?
  • Do multiple data sources provide the same logical data?
  • Are there nil checks immediately after map access?
  • Could a misspelled key cause silent failure?

REPL Inspection

;; Check actual keys in data
(keys message)

;; Find nil values that might be wrong keys
(->> message
     (filter (fn [[k v]] (nil? v)))
     (into {}))

;; Compare expected vs actual keys
(def expected-keys #{:reply-users :reaction-count})
(def actual-keys (set (keys message)))
(clojure.set/difference expected-keys actual-keys)

Runtime Monitoring

;; Add defensive assertions in development
(defn dev-assert-keys [m expected-keys context]
  (when (System/getenv "DEV_MODE")
    (let [actual-keys (set (keys m))
          missing (clojure.set/difference expected-keys actual-keys)]
      (when (seq missing)
        (println "WARNING: Missing keys in" context ":" missing)))))

;; Usage
(dev-assert-keys message #{:reply-users :user :ts} "message data")

Related Issues

  • LLM hallucination of field names
  • Inconsistent naming conventions (hyphen vs underscore)
  • Multiple data transformation layers
  • Clojure's permissive nil handling

See Also

  • test/slack_archive/web/views/core_test.clj - Examples of testing both formats
  • src/slack_archive/web/views/core.clj - Defensive key access patterns
  • src/slack_archive/web/views/top_engagement.clj - Data transformation layer

Case Study: Date vs Day Bug - Cost Analysis

Problem Discovery Process

Initial Symptom: "Show all messages show zero messages!!!" - User reported no message counts displaying

Debugging Path:

  1. Initial assumption: Feature not working after server restart
  2. Browser inspection: HTML rendering but no counts showing
  3. REPL inspection: Checked what field messages actually have
  4. Root cause: Used :date field which returns nil, not :day field

Time to Root Cause: ~10 minutes with REPL inspection Time to Fix: 2 minutes (change :date:day in two places) Time to Verify: 3 minutes (tests + browser verification)

Why This Was Hard to Catch

  1. Tests Passed Initially: Test data had BOTH :day and :date fields because we were being defensive
  2. Silent Failure: (frequencies (map :date messages)){nil 703} - no error, just wrong data
  3. Function Name Misleading: db/all-dates suggests using :date field, but actually uses :day
  4. LLM Pattern Matching: I saw "dates" function and assumed :date key without checking

Impact Assessment

Cost:

  • Development time: ~15 minutes total
  • User friction: Feature appeared broken after implementation
  • Confidence loss: "Why doesn't this work?"

Could Have Been Worse:

  • If deployed to production: Users would see broken feature
  • If data had been corrupted: Would need rollback
  • If tests hadn't caught it: Would have shipped broken code

What Made This Easy to Fix

  1. REPL-driven development: Could immediately check (:day msg) vs (:date msg)
  2. Isolated change: Only two lines needed fixing
  3. Good test coverage: Tests caught the issue once test data was corrected
  4. Clear error boundary: Problem was localized to one function

Long-Term Solution Considerations

Option 1: Better Naming (Low effort, immediate value)

  • Rename all-datesall-days to match field name
  • Add docstring: "Returns sorted list of :day field values"
  • Pros: Self-documenting, prevents confusion
  • Cons: Breaking change for existing code

Option 2: Accessor Functions (Medium effort, high value)

(defn message-day
  "Get the day string from a message.
  Returns: String in YYYY-MM-DD format"
  [message]
  (or (:day message)
      (throw (ex-info "Message missing :day field" {:message message}))))
  • Pros: Single source of truth, clear errors
  • Cons: More boilerplate, need migration

Option 3: Schema Validation (High effort, highest value)

(def Message
  [:map
   [:day string?]  ;; Required: YYYY-MM-DD format
   [:ts string?]   ;; Required: Slack timestamp
   [:user string?] ;; Required: User ID
   ;; ... more fields
   ])
  • Pros: Catches errors at boundaries, documents schema
  • Cons: Runtime overhead, requires Malli/Spec setup

Recommended Immediate Action

  1. Rename function all-datesall-days (or keep name but add clear docstring)
  2. Add assertion in development mode:
    (when (dev-mode?)
      (assert (every? :day messages) "Messages missing :day field"))
  3. Update CLAUDE.md with common field names:
    ## Message Schema Quick Reference
    - `:day` - Date string (YYYY-MM-DD) - NOT :date
    - `:ts` - Slack timestamp
    - `:user` - User ID string

Cost-Benefit Analysis

Current Approach (Defensive or patterns):

  • Cost: Low (already implemented)
  • Benefit: Handles multiple formats
  • Weakness: Silent failures, no prevention

Accessor Functions:

  • Cost: Medium (need to write ~10-15 functions)
  • Benefit: Clear errors, self-documenting
  • Weakness: Boilerplate, need adoption

Schema Validation:

  • Cost: High (Malli setup, schema definitions, performance testing)
  • Benefit: Catches ALL schema errors, prevents new bugs
  • Weakness: Runtime overhead, requires expertise

Recommendation: Hybrid Approach

  1. Now (0 effort): Update this document with field name reference
  2. Next (Low effort): Add development-mode assertions at data boundaries
  3. Soon (Medium effort): Create accessor functions for most-used fields
  4. Eventually (High effort): Add Malli validation when schema stabilizes

The :date vs :day bug shows that even with good test coverage, map key errors slip through when test data is overly defensive. The fix was trivial once found, but discovery took longer than it should have. Investment in schema validation or accessor functions would pay off if these bugs continue to occur frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment