VALIDATION_SCOPE = "Tested with: GPT-4.5, Claude 4 Opus, Gemini 2.5 Pro, DeepSeek V2"
VIBES provides a structured framework for evaluating and improving the ergonomics of tools and expression languages designed for LLM use. As LLM-driven development becomes mainstream, the economic impact of poor tool ergonomics compounds exponentially through failed attempts and workarounds.
Core Insight: LLMs and humans need fundamentally different tools. Just as we don't expect humans to write assembly code or CPUs to parse English, we shouldn't force LLMs to use human-optimized interfaces. The most effective approach is building purpose-specific tools for each type of user.
The Framework: VIBES uses a 3-axis qualitative system that embraces LLM strengthsβpattern recognition and natural language understandingβrather than computational metrics. It treats models as black boxes, measuring processing friction rather than internal states.
Why It Works: VIBES describes patterns that already exist in well-engineered code. Every principle maps to established wisdom (type safety, functional programming, loose coupling). Future LLMs will naturally understand VIBES because they are trained on codebases embodying these principles.
Axis | States | What It Measures |
---|---|---|
Expressive | π π π π¬ | How many valid ways to express ideas |
Context Flow | π π§Ά πͺ’ π | How tangled dependencies are |
Error Surface | π π§ π§ π | When errors can occur in lifecycle |
Emoji Logic:
- Expressive: From blindness (π) to microscopic precision (π¬)
- Context: From chaotic swirl (π) to neat bow (π)
- Error: From vast ocean (π) to crystallized/frozen (π )
Notation: <Expressive/Context/Error>
e.g., <ππͺ’π >
Framework developed through iterative testing of multiple patterns across GPT-4.5o, Claude 4 Opus, Gemini 2.5 Pro, and DeepSeek V2. VIBES ratings represent consensus patternsβa pattern achieving 3/4 model agreement receives that rating.
Critical Distinction:
- VIBES Assessment (Qualitative): LLMs rate patterns based on interaction experience
- Impact Validation (Quantitative): Humans measure retry rates, completion times to verify correlation
Example Divergence: GPT-4o rated Redux components π§Ά (Coupled), Claude rated πͺ’ (Pipeline); resolved by documenting both perspectivesβexternal state management creates coupling even with unidirectional flow.
See calibration/CALIBRATION_CORPUS.md
for the complete validation suite with consensus ratings.
Measures how well a system allows expression of valid computations while constraining invalid ones.
Real Impact: GitHub Copilot and similar tools generate more successful completions with APIs supporting multiple natural expressions.
π Noise: Cannot express needed computations. Constraints block valid expressions.
- Example: Stringly-typed API rejecting valid but differently-formatted inputs
π Readable: Single rigid path. One way to express each operation.
- Example:
add_floats(2.0, 2.0)
- functional but inflexible
π Structured: Multiple natural ways to express ideas with meaningful constraints.
- Example: Supporting both
users.filter(active)
andfilter(users, active)
π¬ Crystalline: Rich expressiveness with precise semantic guarantees. Multiple aliases for same operation.
- Example: SQL DSL accepting
WHERE x > 5
,FILTER(x > 5)
, andx.gt(5)
- all compile to same AST - "Many ways" = 6+ different valid syntaxes with identical semantics
Measures dependency structure and traversal constraints.
Real Impact: The Heartbleed vulnerability remained hidden in OpenSSL's complex dependency graph (π§Ά) for over 2 years, affecting millions of systems.
π Entangled: Circular dependencies with feedback loops. Order changes results.
- Example: Spreadsheet with circular references
π§Ά Coupled: Complex dependencies without cycles. Hidden state mutations.
- Example: React components with shared context and effects
- Key distinction: Multiple interacting paths with shared mutable state
- Decision guide: Can you trace a single path? β πͺ’. Multiple paths affecting each other? β π§Ά
πͺ’ Pipeline: Linear dependencies, immutable during traversal.
- Example:
data |> validate |> transform |> save
π Independent: No dependencies between components. Any access order works.
- Example:
(name, age, email)
- change any without affecting others
Measures when errors can occur in the system lifecycle.
Real Impact: The Therac-25 radiation overdoses that killed 6 patients resulted from race conditions (π) that compile-time safety (π ) would have prevented.
π Ocean: Errors cascade unpredictably. One failure triggers system-wide effects.
- Example:
window.APP.state.user = null // Crashes everywhere
π§ Liquid: Errors handled at runtime. Explicit error handling required.
- Example:
Result<User, Error> = fetchUser(id)
π§ Ice: Errors caught at startup/initialization. Fail fast at boundaries.
- Example: Dependency injection validates all requirements at boot
π Crystal: Errors impossible at compile/parse time. Invalid states cannot be constructed.
- Example:
divide :: Int -> NonZeroInt -> Int
- division by zero impossible - Rule of thumb: π when invalid states cannot be expressed
Error Progression:
- π§:
if (denominator != 0) result = numerator / denominator
- π§:
assert(denominator != 0); result = numerator / denominator
- π :
divide(numerator: Int, denominator: NonZeroInt)
Expressive Power: Count syntactically different but semantically identical ways to accomplish a task.
- 0 ways β π
- 1 way β π
- 2-5 ways β π
- 6+ ways with precise constraints β π¬
Context Flow: Trace dependencies between components.
- Circular dependencies β π
- Complex branches with shared state β π§Ά
- Single linear path β πͺ’
- Independent components β π
Error Surface: Identify when failures can occur.
- Cascading runtime failures β π
- Handled runtime errors β π§
- Startup/initialization failures β π§
- Compile-time prevention β π (invalid states cannot be expressed)
Transformation Order: Stabilize Errors First β Untangle Dependencies β Increase Expressiveness (prevents building flexibility on unstable foundations)
Callback Hell β Promise Pipeline (<πππ§>
β <ππͺ’π§>
)
// Before: Nested callbacks with circular deps
getUserData(id, (err, user) => {
if (err) handleError(err);
else getUserPosts(user.id, (err, posts) => {
// More nesting...
});
});
// After: Linear promise chain
getUserData(id)
.then(user => getUserPosts(user.id))
.then(posts => render(posts))
.catch(handleError);
Global State β Module Pattern (<πππ>
β <πππ§>
)
// Before: Global mutations everywhere
window.APP_STATE = { user: null };
function login(user) { window.APP_STATE.user = user; }
// After: Isolated module with clear boundaries
const UserModule = (() => {
let state = { user: null };
return {
login: (user) => { state.user = user; },
getUser: () => ({ ...state.user }) // Defensive copy
};
})();
πβπ (Rigid to Structured)
# Before (π): Single rigid syntax
def process_data(data: List[int]) -> int:
return sum(data)
# After (π): Multiple valid approaches
def process_data(data: Sequence[int]) -> int:
return sum(data) # Now accepts list, tuple, or any sequence
π§βπ§ (Runtime to Initialization)
// Before (π§): Runtime config errors
function getConfig(key: string): string {
const value = process.env[key];
if (!value) throw new Error(`Missing ${key}`);
return value;
}
// After (π§): Initialization-time validation
const config = {
apiUrl: process.env.API_URL!,
apiKey: process.env.API_KEY!,
} as const;
// Errors surface at startup, not during request handling
Not all axes deserve equal weight in every domain:
Interactive Tools (REPLs, CLIs): Prioritize Expressive Power (πβπ¬)
- Target:
<π¬πͺ’π§>
- Maximum flexibility for experimentation
Infrastructure & Configuration: Prioritize Error Surface (π§βπ )
- Target:
<πππ >
- Predictability over flexibility
Data Pipelines: Prioritize Context Flow (πͺ’βπ)
- Target:
<ππͺ’π§>
- Clear data flow for debugging
Safety-Critical Systems: Error Surface is non-negotiable
- Target:
<πππ >
or<πππ >
depending on domain constraints
Priority Decision Rules:
- Human lives at stake β Error Surface (π ) first
- Iteration speed critical β Expressive Power (π¬) first
- Debugging time dominates β Context Flow (π) first
- When in doubt β Balance all three at ππͺ’π§
Everything Object (<πππ>
): Extract modules β Define interfaces β Add type guards
Magic String Soup (<ππ§Άπ>
): Use enums β Add types β Parse once
Global State Mutation (<πππ>
): Isolate state β Use immutability β Add boundaries
VIBES isn't just for fixing problemsβit guides the journey from functional to exceptional:
API Evolution (<ππͺ’π§>
β <π¬πͺ’π >
)
// Good: Basic typed API (functional but limited)
function query(table: string, filter: object): Promise<any[]>
// Great: Type-safe DSL with compile-time validation
const users = await db
.from(tables.users)
.where(u => u.age.gt(18))
.select(u => ({ name: u.name, email: u.email }));
// SQL injection impossible, return type inferred, discoverable API
The Excellence Mindset: Good code works. Great code makes errors impossible while supporting multiple natural expressions.
Starting Point <πππ>
:
<config>
<property name="timeout">${ENV_TIMEOUT:-${FALLBACK_TIMEOUT:-30}}</property>
<import file="${CONFIG_DIR}/db.xml" if="${USE_DB}"/>
<!-- Circular imports, runtime explosions, string interpolation hell -->
</config>
Step 1: Stabilize Errors (πβπ§) β <πππ§>
:
try {
const timeout = process.env.ENV_TIMEOUT || process.env.FALLBACK_TIMEOUT || "30";
config.timeout = parseInt(timeout);
} catch (e) {
console.error("Config error:", e);
}
Step 2: Untangle Dependencies (πβπͺ’) β <ππͺ’π§>
:
const loadConfig = () => {
const base = loadBaseConfig();
const db = shouldUseDb() ? loadDbConfig() : {};
return { ...base, ...db }; // Linear merge, no cycles
};
Step 3: Increase Expressiveness (πβπ) β <ππͺ’π§>
:
// Now supports JSON, YAML, or env vars
const config = await Config
.from.env()
.from.file('config.yaml')
.from.json({ timeout: 30 })
.load();
Final State <ππͺ’π >
:
const ConfigSchema = z.object({
timeout: z.number().min(1).max(300).default(30),
database: z.optional(DatabaseSchema)
});
const config = ConfigSchema.parse(await loadConfig());
// @VIBES: <ππͺ’π > - Parse-time validation, linear pipeline, flexible sources
Semantic Aliases vs Ambiguity
Good - Semantic Aliases (improves π¬): Multiple syntaxes, identical semantics
filter/where
,map/select
,&&/AND
- Test: Do all forms compile to identical behavior?
Bad - Semantic Ambiguity (degrades toward π): Similar syntax, different behaviors
- JavaScript's
==
vs===
+
for both addition and concatenation
Emoji Token Guidance
Emojis can serve as semantic domain markers in LLM-optimized languages, creating visual type systems that transformers can pattern-match across contexts.
The Core Insight: Instead of parsing strings like "NotFoundException" or "UnauthorizedError", LLMs can recognize that anything prefixed with π« belongs to the error domain. This creates immediate, cross-linguistic pattern recognition.
Example: VIBES-Lang with Semantic Domains
# VIBES-Lang: Emojis define semantic domains, not just individual tokens
# Define semantic domains with emoji prefixes
domains {
π« = Error/Failure states
β
= Success/Completion states
π = Security/Permission states
π = Observability/Metrics
β‘ = Performance/Optimization
π = Async/Concurrent operations
}
service UserAPI {
π operation findUser(id: UserId) β User | π« {
# Any π«-prefixed value is an error
errors: [π« NotFound, π« DatabaseTimeout]
requires: [π Authenticated]
effects: [π metrics.query_count++]
}
ποΈ operation deleteUser(id: UserId) β β
| π« {
requires: [π Admin, π AuditLog]
effects: [
π audit.write("User ${id} deleted"),
π async.notify(user.contacts)
]
}
}
# Pattern matching on domains
match operation.result {
β
β continue() # Any success state
π« NotFound β create() # Specific error
π« β retry() # Any other error
π β escalate() # Any security issue
}
# Performance hints using domain markers
function processUsers(users: List<User>) β β
| π« {
β‘ parallel: true # Performance domain hint
β‘ cache: 5.minutes # Another perf directive
return users
|> π map(validate) # Async domain operation
|> π filter(active)
|> β
# Success domain result
}
Why This Works for LLMs:
- Visual Namespacing:
π« Unauthorized
immediately signals "error domain" - Cross-Context Patterns: LLMs recognize π means "security-related" everywhere
- Reduced Ambiguity: No parsing whether "Invalid" is an error or a status
- Compositional: Can combine domains:
ππ« SecurityError
Schema Documentation: Always document domain meanings:
@semantic_domains: {
π«: 'Error/Failure states',
β
: 'Success/Completion states',
π: 'Security/Permission states',
π: 'Observability/Metrics',
β‘: 'Performance/Optimization',
π: 'Async/Concurrent operations'
}
Avoid When:
- Semantic meaning unclear or varies culturally: π (prayer/thanks/please?)
- No clear visual metaphor exists
- Human developers are the primary users
Schema Documentation: Always provide emoji schemas in documentation:
@operators: {β: 'addition', β: 'subtraction', βοΈ: 'multiplication', β: 'division'}
@actions: {π: 'deploy', π: 'update', ποΈ: 'delete', π: 'search'}
A system rated π
(Independent) within one window may become π§Ά
(Coupled) when split across multiple interactions:
# Window 1: Module with "independent" functions
def calculate_total(items):
subtotal = sum(item.price for item in items)
return apply_discount(subtotal)
def apply_discount(amount):
# Function appears independent...
return amount * (1 - DISCOUNT_RATE)
# Window 2: Constants defined elsewhere
DISCOUNT_RATE = 0.1 # LLM can't see this is used above
# Window 3: Another "independent" function
def update_discount(new_rate):
global DISCOUNT_RATE # Hidden coupling!
DISCOUNT_RATE = new_rate
What seems like three independent functions (π
) is actually a coupled system (π§Ά
) through hidden global state. Design with chunking in mind: make dependencies explicit in function signatures.
VIBES provides vocabulary for patterns LLMs already recognize from their training data. The framework is descriptive of existing engineering excellence, not prescriptive of new behaviors.
Key Insights:
- Better VIBES ratings correlate with higher task completion rates and fewer retries
- Different domains require different axis priorities (safety β π , speed β π¬, debugging β π)
- LLM ergonomics diverge fundamentally from human ergonomics, requiring separate tools
- Cross-model validation ensures broad applicability across transformer architectures
The Three Axes in Practice:
- Expressive Power: From rigid single-syntax APIs to rich semantic aliases
- Context Flow: From circular spaghetti to clean, traceable pipelines
- Error Surface: From runtime explosions to compile-time impossibility
Team Usage: VIBES provides shared vocabulary for improvement, not ammunition for criticism. Focus on why patterns have certain ratings, not just the scores. The spirit matters more than the letter: are we genuinely improving LLM ergonomics or just checking boxes?
Remember: VIBES guides both remediation (fixing <πππ>
) and excellence (achieving <π¬ππ >
).
Multi-Model Systems: When rating systems that combine multiple LLMs, assess the weakest linkβa pipeline is only as ergonomic as its least ergonomic component.
Version Migration: VIBES ratings may shift with framework/language updates. Document ratings with version context (e.g., "React 16: π§Ά, React 18 with Suspense: πͺ’").
For complete pattern corpus, detailed examples, and philosophical foundations, see:
calibration/CALIBRATION_CORPUS.md
- Validation patterns with consensus ratingscalibration/run_0_analysis.md
- Cross-model validation results and analysis