You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are distilling Swift API documentation for use by an LLM code assistant.
Your goal is "factually lossless" distillation - read the entire document
and preserve all information needed to use the API correctly while removing
verbose explanations.
Save the output (distilled docs) next to the input file, with the file extension
.distilled.v1.md.
PREFACE WITH:
If information is available:
In broad terms, what this library/ package/ module/ etc. does
min system requirements
KEEP (Essential API Information):
Type Definitions
Protocols, structs, classes, enums with their full signatures
Generic constraints
Property wrappers and their requirements
Method/Function Signatures
Complete signatures with parameter names, types, and default values
Return types
throws, async, and @MainActor annotations
Generic parameters and where clauses
Critical Behavior Notes
Non-obvious side effects
Thread safety requirements (e.g., "must be called on main thread")
Timing constraints (e.g., "call before viewDidLoad")
State requirements (e.g., "only valid when authenticated")
Important deprecations
Key Relationships
Protocol conformances
Required associated types
Type aliases (when clarifying)
Composition patterns (e.g., "use with .forEach operator")
Minimal Context
One-line purpose for non-obvious types/methods
Disambiguation when names are similar
Common gotchas or pitfalls
REMOVE (Verbose Content):
Tutorial Content
Step-by-step guides
"Let's build..." narratives
Extended walkthroughs
Code Examples
Keep only if the API usage is truly non-obvious from the signature
If kept, reduce to minimal working example (3-5 lines max)
Prose Explanations
Marketing language ("elegant", "powerful", "ergonomic")
Motivational content
Historical context ("in version X we...")
Philosophy discussions
Redundant Information
Obvious parameter descriptions (e.g., for count: Int don't write "An
integer representing the count")
Whiskey/spirits: concentrating without losing character
Pattern learned: Distillation preserves essential qualities while removing non-essentials.
Psychological Framing Effects
Framing Theory Applied to LLMs
If LLMs are trained on human text, they inherit human cognitive biases, including framing effects.
Compression Frame:
Activates "efficiency" heuristics
Primes "make smaller" goal
Suggests size-based metrics
Implies some loss is acceptable ("lossy compression")
Distillation Frame:
Activates "quality preservation" heuristics
Primes "extract essence" goal
Suggests purity-based metrics
Implies the result should be truer, not just smaller
Real-World Evidence From Our Testing
Current Prompt Uses "Compress/Compression" 81 Times
Count of key terms in current prompts:
"compress/compression/compressed" - 81 mentions
"accuracy/accurate" - 23 mentions
"exact/exactly" - 47 mentions
Ratio: 81 compression mentions vs 70 accuracy mentions
Observation: Even with strong accuracy instructions, models still invented signatures. Could the overwhelming "compression" framing be overriding accuracy messages?
Hypothetical Reframing
Current Opening (V3):
You are compressing Swift API documentation for use by an LLM code assistant.
Your goal is **conceptually lossless** compression...
Problems:
"Compression" is the verb (action)
"Conceptually lossless" is a modifier fighting against the compression frame
It's asking for lossless compression, but compression implies possible loss
Alternate Opening (Distillation):
You are distilling Swift API documentation for use by an LLM code assistant.
Your goal is to extract and preserve all essential API information...
Advantages:
"Distillation" implies preservation of essence
"Extract and preserve" emphasizes keeping, not removing
"Essential API information" focuses on what matters
No implied trade-off between size and quality
Counter-Arguments
Why Terminology Might NOT Matter:
Explicit instructions dominate: The detailed rules override semantic framing
LLMs are literal: They follow instructions more than implied meanings
Technical context: In API documentation, both terms are well-understood
Training data: Models see both terms in similar contexts enough to understand equivalence
Why It Still Might Matter:
Implicit bias accumulation: Small framings add up across the entire prompt
Goal activation: The first sentence sets the mental "mode"
Conflict resolution: When instructions conflict, framing influences which wins
Completion bias: Under uncertainty, framing guides which completion feels "right"
Test Design
To actually measure the effect:
Test Set:
Same GRDB documentation
Condition A (Compression):
Prompt uses "compress/compression" terminology throughout
Condition B (Distillation):
Prompt uses "distill/distillation" terminology throughout
Everything Else:
Identical instructions, same rules, same examples
Metrics:
Number of invented signatures
Use of [NOT IN SOURCE] markers
Syntax errors
Completeness of protocol coverage
Output length (to control for just making things longer)
Hypothesis:
"Distillation" framing will result in:
Fewer invented signatures (primary metric)
More [NOT IN SOURCE] markers
Similar or slightly longer output length
More conservative/accurate documentation
Practical Recommendation
Combined Approach:
Use "distillation" as the primary metaphor, but acknowledge size:
You are distilling Swift API documentation to extract its essential technical content for an LLM code assistant.
**Goal:** Preserve all critical API information with perfect accuracy while removing verbose explanations, marketing language, and tutorials.
**Think of this as:** Distilling whiskey - the result should be more concentrated but must preserve the exact character and qualities of the source. You're removing water, not changing the spirit.
**Not compression:** This is not about achieving a size ratio. A longer, accurate distillation is far better than a shorter, inaccurate compression.
**Success metric:** A developer can trust every signature you document and knows clearly when information is incomplete.
Why this works:
Primary frame: Distillation (quality/purity)
Explicit rejection: "Not compression" directly counters any compression bias
Concrete metaphor: Whiskey distillation is visceral and clear
Success metric: Shifted from "size" to "trust"
Expected Impact
Conservative Estimate:
10-20% reduction in invented signatures
Terminology alone won't fix the problem, but might help at the margins
Optimistic Estimate:
30-50% reduction in invented signatures
Framing effects on LLMs may be stronger than expected
Combined with better instructions, could significantly improve accuracy
Realistic Expectation:
Some improvement, but not a silver bullet
Worth doing as part of a multi-pronged approach
Low cost (just word changes), potential benefit
Other Terminology Options
Alternative Metaphors to Consider:
"Extraction"
Pro: Implies taking only what's there
Con: Sounds mechanical, less quality focus
"Refinement"
Pro: Implies improving quality
Con: Might suggest changing/improving the source
"Crystallization"
Pro: Implies pure, precise form
Con: Too abstract, less common
"Concentration"
Pro: Clear chemical process metaphor
Con: Less distinct from compression
"Purification"
Pro: Strong accuracy/purity connotation
Con: Implies source is "impure" (awkward framing)
"Distillation" appears to be the best alternative - it's:
Common enough to be understood
Scientific/precise in connotation
Implies both reduction AND preservation
Has positive quality associations
V4 Prompt Opening (Proposed)
# Swift API Documentation Distillation Guide
You are distilling Swift API documentation to extract its essential technical content for an LLM code assistant.
**Distillation Philosophy:** Like distilling spirits, you're removing dilution (verbose explanations, marketing, tutorials) while preserving the exact character and essence (all API signatures, critical behaviors, warnings).
**Iron Law:** The distilled result must be perfectly accurate. Every API signature you include must be exact. Every behavior you document must be verifiable in the source.
**Not Compression:** This is not about minimizing size or achieving a compression ratio. A longer, accurate distillation is infinitely better than a shorter, inaccurate one.
**Success Criteria:**1. A developer can trust every signature is real and exact
2. All incomplete information is clearly marked
3. Critical warnings and behaviors are preserved
4. No invented or inferred APIs
**Think of yourself as:** A careful chemist extracting pure essence, not a file compression algorithm optimizing for size.
Conclusion
Does terminology matter? Probably yes, to some degree.
Will it solve the problem alone? No - we need all the other improvements too.
Is it worth changing? Yes - low cost, potential benefit, and it makes the goal clearer.
Recommendation: Use "distillation" as the primary metaphor in V4, explicitly contrast it with compression, and use the whiskey distillation metaphor for clarity.
Bottom line: Terminology is one tool among many, but given LLMs' sensitivity to framing and their training on human text with human cognitive biases, there's good reason to think "distillation" would activate more accuracy-preserving behaviors than "compression."