Skip to content

Instantly share code, notes, and snippets.

@snoble
Created June 20, 2025 23:24
Show Gist options
  • Save snoble/0d22c9409f69b5d8c37cfbd0216942c1 to your computer and use it in GitHub Desktop.
Save snoble/0d22c9409f69b5d8c37cfbd0216942c1 to your computer and use it in GitHub Desktop.
Vibe Coding Best Practices

Vibe Coding Best Practices

This document contains practices specifically designed for AI-first development. Browse through it, find the sections that resonate with your workflow, and copy them into your repository for your AI assistant to follow.

🎯 The Most Important Thing: Choose Your Loop

When you're vibe coding, you're moving fast with AI assistance. The key to maintaining quality at speed is choosing the right feedback loop for your current situation.

Many developers struggle because they're in the wrong loop at the wrong time. They're optimizing performance when CI is broken. They're polishing UI when users can't log in. They're adding features when existing bugs are driving users away.

This document describes 9 different feedback loops, each designed for a specific situation. Understanding when to use each one helps maintain momentum without sacrificing quality.

📋 What's Covered Here

  • 🔄 9 Feedback Loops: From 5-minute CI fixes to 90-minute performance deep dives
  • 🤖 AI-First Development: Practices designed for working with AI pair programmers
  • 🏃‍♂️ Loop Transitions: When to switch from one loop to another
  • 🔬 Race Condition Testing: Tools for making non-deterministic bugs reproducible
  • 🛡️ Type Safety: TypeScript settings that prevent entire categories of bugs

🗺️ Quick Navigation

Jump Based on Your Current Situation:


📚 Table of Contents

🚀 Start Here

🎯 Core Content

💡 Special Sections

🔧 Reference


🔄 Feedback Iteration Loops

Modern development is about choosing the right feedback loop for your current objective. Here are the key loops to master:

When vibe coding with AI, feedback loops become even more critical. Each loop serves a different purpose and operates on a different timescale. The skill is recognizing which loop you need to be in right now and executing it efficiently.

Think of these loops as different lenses through which to view your codebase. Sometimes you need the microscope of the CI Green Loop to fix immediate breakages. Other times you need the telescope of the User Story Loop to see where you're headed. The key is knowing when to switch lenses and having the discipline to complete each loop before moving to the next.

🟢 The CI Green Loop

Goal: Get all CI checks passing and keep them green Cycle Time: 5-15 minutes per iteration

When CI is red, development stalls. A broken build blocks deployments and creates cascading problems. In vibe coding, where you're moving fast with AI assistance, maintaining green CI is essential for confidence in your changes.

The Loop

  1. Check Status → 2. Identify Failures → 3. Fix Locally → 4. Push & Monitor → 5. Repeat

Success Metrics

  • ✅ All CI checks green within 15 minutes
  • ✅ Zero flaky tests
  • ✅ Fast feedback on every commit

When to Use

  • After making any code changes
  • Before merging PRs
  • When onboarding new team members
  • Daily health checks

🎯 The User Story → Test → UI Loop

Goal: Build features that actually solve user problems Cycle Time: 30-60 minutes per iteration

The User Story Loop helps ensure you're building features that solve real problems. When vibe coding, it's easy to get caught up in technical solutions. This loop starts with the user's actual need and validates that your solution works for them.

AI's Dual Role: Your AI assistant plays both user and developer here. First, it helps generate realistic user stories from a user's perspective. Then it switches to developer mode to help implement them.

The Loop

  1. AI Generates User Story → 2. Create E2E Test → 3. Watch Test Fail → 4. Build UI → 5. Test Passes → 6. AI Reviews as User → 7. Iterate
// Example: AI as user, then coder
Human: "We need a way to see file changes"
AI (as user): "As a developer reviewing code, I want to see which 
    files have uncommitted changes so I can quickly navigate to 
    files I'm actively working on"
Human: "How would you test this?"
AI (as coder): "Create an E2E test that modifies a file, then 
    verifies the UI shows an indicator next to that file"
// Later...
Human: "Here's the implementation" [screenshot]
AI (as user): "The orange dot is too subtle. On a bright screen 
    I might miss it. Consider adding a text badge with count"

Success Metrics

  • ✅ Tests accurately represent user workflows
  • ✅ UI solves the actual user problem
  • ✅ Stories evolve based on UI insights

When to Use

  • Building new features
  • Improving existing workflows
  • Fixing UX issues
  • Understanding user needs

🐛 The Bug Investigation Loop

Goal: Not just fix bugs, but prevent similar bugs systematically Cycle Time: 15-45 minutes per iteration

The Bug Investigation Loop turns debugging into systematic problem-solving. Instead of making random changes, you follow the evidence, form hypotheses, and test them methodically. This is especially important in vibe coding where you need to understand code you didn't personally write.

The Loop

  1. Reproduce Bug → 2. Trace Root Cause → 3. Fix Immediate Issue → 4. Improve Safety → 5. Add Tests → 6. Document Learning

Success Metrics

  • ✅ Bug fixed and prevention measures added
  • ✅ Type/safety system strengthened
  • ✅ Tests prevent regression

When to Use

  • Any time you encounter a bug
  • Code review findings
  • Production issues
  • Improving code quality

🚀 The Performance Optimization Loop

Goal: Identify bottlenecks and improve them systematically Cycle Time: 45-90 minutes per iteration

The Performance Loop emphasizes measurement over guesswork. Profile first, identify actual bottlenecks, fix them systematically, and measure the impact. This prevents wasting time optimizing code that isn't actually slow.

The Loop

  1. Measure Baseline → 2. Identify Bottleneck → 3. Hypothesize Fix → 4. Implement → 5. Measure Impact → 6. Repeat or Revert

Success Metrics

  • ✅ Measurable performance improvements
  • ✅ User-perceived speed increases
  • ✅ Resource usage reduction

When to Use

  • App feels slow
  • CI takes too long
  • Memory usage high
  • Resource constraints

🎨 The UI Polish Loop

Goal: Make interfaces intuitive and delightful Cycle Time: 20-40 minutes per iteration

The UI Polish Loop is about discovering better interfaces through use. You implement something functional, use it yourself, notice friction points, and improve them iteratively.

Useful technique: Your AI assistant can act as a fresh-eyes test user. Show it screenshots or describe workflows. The AI hasn't developed your muscle memory, so it can spot confusing elements you've gotten used to.

The Loop

  1. Use the Feature → 2. Note Friction → 3. Get AI Feedback → 4. Design Improvement → 5. Implement → 6. Iterate
// Example AI feedback session:
Human: "Here's our new file upload UI" [screenshot]
AI: "I see three potential friction points:
1. The drop zone isn't visually distinct from the rest of the page
2. There's no indication of supported file types until after an error
3. The upload progress shows percentage but not time remaining"
Human: "Which would frustrate you most as a user?"
AI: "Not knowing supported file types upfront - users will waste time 
     trying to upload invalid files"

Success Metrics

  • ✅ Reduced friction in common workflows
  • ✅ Positive user feedback
  • ✅ Fewer support questions

When to Use

  • After core functionality works
  • Based on user feedback
  • Improving daily-use features
  • Making complex features accessible

🧪 The Test-Driven Feature Loop

Goal: Build robust features with comprehensive test coverage Cycle Time: 20-30 minutes per iteration

The Test-Driven Feature Loop uses tests as a design tool. Writing tests first helps you think about interfaces before implementations, naturally leading to more modular code. This is particularly valuable in vibe coding where you need clear specifications for your AI assistant.

The Loop

  1. Red (Write failing test) → 2. Green (Make it pass) → 3. Refactor (Improve code) → 4. Repeat

Success Metrics

  • ✅ 100% test coverage (excluding unreachable)
  • ✅ Tests guide design decisions
  • ✅ Refactoring is safe and fast

When to Use

  • Building new features
  • Fixing complex bugs
  • Refactoring existing code
  • Learning new APIs

🔍 The Code Quality Loop

Goal: Maintain high code standards and prevent technical debt Cycle Time: 10-20 minutes per iteration

The Code Quality Loop helps prevent technical debt accumulation. By frequently running quality checks and fixing issues immediately, you prevent small problems from becoming larger ones. This is crucial when vibe coding at high velocity.

The Loop

  1. Run Quality Checks → 2. Fix Issues → 3. Improve Tooling → 4. Repeat

Success Metrics

  • ✅ Zero linting errors
  • ✅ All types strict
  • ✅ No security vulnerabilities

When to Use

  • Before every commit
  • Code review preparation
  • Scheduled maintenance
  • Onboarding new contributors

🎯 Loop Selection Strategy

Choose your loop based on current priorities:

🔥 High Priority Loops

  1. CI Green Loop - Always maintain green CI
  2. Bug Investigation Loop - Fix issues systematically
  3. User Story Loop - Build what users actually need

📈 Medium Priority Loops

  1. Performance Loop - When app feels slow
  2. UI Polish Loop - Improve daily-use features
  3. Mobile UX Loop - Ensure cross-platform quality

🛠️ Maintenance Loops

  1. Test-Driven Loop - When building complex features
  2. Code Quality Loop - Regular maintenance
  3. Bug Investigation Loop - Turn bugs into type improvements

🚀 Meta-Loop: Improving Your Loops

Goal: Make your feedback loops faster and more effective Cycle Time: Weekly reflection

The Meta-Loop is about improving your process itself. Weekly reflection on which loops are working and which need adjustment helps optimize your development flow.

Questions to Ask

  • Which loops are taking too long?
  • Where are the bottlenecks?
  • What tools could speed things up?
  • Are we measuring the right things?
  • How can AI assist in these loops?

Continuous Improvement

  • Automate routine checks (pre-commit hooks, CI gates)
  • Improve tooling (faster builds, better error messages)
  • Share knowledge (document successful patterns)
  • Optimize environment (powerful hardware, good network)

Remember: The goal isn't to be in every loop all the time, but to choose the right loop for your current objective and execute it efficiently.


💡 Key Concepts

1. Make Bugs Impossible, Not Just Unlikely

// Before: Bugs are possible
const item = items[index].name; // 💥 Crashes if index out of bounds

// After: Bugs are impossible
// With noUncheckedIndexedAccess: true in tsconfig.json
const item = items[index]?.name; // TypeScript FORCES you to handle undefined

💡 Note: The TypeScript flag noUncheckedIndexedAccess was controversial because it requires handling undefined cases, but it prevents an entire category of runtime errors.

2. Turn Race Conditions into Deterministic Bugs

// This test WILL find your race condition and give you a seed to reproduce it
it('detects race conditions deterministically', async () => {
  await fc.assert(
    fc.asyncProperty(fc.scheduler(), async (s) => {
      // Your async operations here
      // Fast-check will try EVERY possible execution order
    })
  );
  // Output: "Failed with seed: 1337" - now you can debug deterministically!
});

With deterministic testing, race conditions become as debuggable as simple logic errors. You can reproduce the exact failure case consistently.

3. AI as Your Fresh-Eyes Test User

Your AI isn't just a coder—it's your always-available test user who brings fresh eyes to your UI whenever you need them. This transforms how you think about UI development.

// In the UI Polish Loop:
Human: "Look at this screenshot of our new feature"
AI: "I notice the 'Submit' button is grayed out but there's no 
     indication why. Users might think it's broken."
Human: "Good catch! What else do you see?"
AI: "The error message appears 200px below the form. On mobile,
     users would need to scroll to see why their submission failed."

💡 Advantage: AI doesn't develop muscle memory for your UI quirks, so it can consistently spot usability issues you've gotten used to.

4. Your AI Gets Smarter Over Time

By maintaining living specs, your AI assistant learns your system's architecture, past decisions, and design patterns. Each bug fixed makes the AI better at preventing similar bugs.

# In PROJECT_SPEC.md
## Decision: Use Event Sourcing (2024-01-15)
**Why**: Need audit trail and time-travel debugging
**Trade-off**: More complex, but provides complete history
**Revisit**: When we reach 1M events/day

# AI now knows this context for all future suggestions

5. The 15-Minute CI Rule

The "CI must be green" practice has roots in manufacturing quality control. When CI takes longer than 15 minutes, developers stop running it, and quality suffers. Fast CI enables tight feedback loops.

6. Every Bug Should Improve Your Types

// Bug: User with null email crashed the system
// Don't just fix it - make it impossible:

// Before
type User = {
  email: string | null;
  name: string;
}

// After - use branded types
type VerifiedEmail = string & { _brand: 'VerifiedEmail' };
type User = {
  email: VerifiedEmail; // Can't be null, must be verified
  name: string;
}

Quick test: You can check how many potential array access bugs exist in your code:

echo '{"compilerOptions":{"noUncheckedIndexedAccess":true}}' > tsconfig.strict.json
npx tsc --project tsconfig.strict.json --noEmit

Each error represents a potential runtime crash.


⚡ Quick Wins: Implement These in Under 5 Minutes

Before diving deep, here are changes you can make RIGHT NOW that will immediately improve your code:

1. Enable TypeScript's Strictest Setting

// Add to tsconfig.json
{
  "compilerOptions": {
    "noUncheckedIndexedAccess": true  // Prevents 90% of "Cannot read property of undefined"
  }
}

2. Set Up AI-Friendly Development

# Create a CLAUDE.md file for your AI assistant
echo "# Project Context for AI

## Key Decisions
- We use yarn, not npm
- We use Vitest, not Jest
- All arrays must be accessed safely

## Current Focus
- [Add your current task here]
" > CLAUDE.md

3. Install This Pre-Commit Hook

# Save as .git/hooks/pre-commit
#!/bin/sh
yarn typecheck && yarn lint || {
  echo "❌ Fix type/lint errors before committing"
  exit 1
}

4. Add VS Code Quick Fix Settings

// .vscode/settings.json
{
  "editor.formatOnSave": true,
  "editor.codeActionsOnSave": {
    "source.fixAll.eslint": true
  },
  "typescript.preferences.includePackageJsonAutoImports": "on"
}

5. Create Your First Feedback Loop Tracker

# Today's Loops (add to your README)
- [ ] Morning: CI Green Loop (get everything passing)
- [ ] Feature: User Story Loop (what are we building?)
- [ ] Afternoon: UI Polish Loop (get AI feedback on screenshots)
- [ ] Evening: Code Quality Loop (clean up for tomorrow)

These five changes take minutes to implement but provide immediate value in your vibe coding workflow.


🌶️ Strong Opinions

1. Delete All .skip() Tests Immediately

A skipped test is worse than no test. It gives false confidence and hides broken functionality. If a test is skipped, either fix it right now or delete it. No exceptions.

// This is a lie to yourself and your team
it.skip('should handle user logout', () => {
  // "TODO: fix this later" = never
});

2. Your AI Should Have Commit Access

Not to main branch, but to feature branches. Let your AI fix lint errors, update tests, and make small improvements directly. Review the commits, but let it work autonomously for mechanical tasks.

3. 100% Code Coverage is the Bare Minimum

When using AI to write code, you need strong verification methods. 100% coverage (excluding explicitly unreachable code) is your baseline, not your goal. It's the foundation that lets you confidently accept AI-generated code and refactor rapidly.

// 100% coverage is necessary but not sufficient
test('user service', () => {
  const user = new UserService();
  user.getUser('123'); // No assertions!
  expect(true).toBe(true); // This passes coverage but tests nothing
});

// Real testing goes beyond coverage
test('user service handles all edge cases', () => {
  // Property-based tests
  // Race condition tests  
  // Error scenarios
  // Performance boundaries
});

Why this matters in vibe coding: When AI writes most of your code, you need comprehensive tests to verify behavior. 100% coverage is your safety net.

4. UI Tests Are More Important Than Unit Tests

Users interact with UI, not your perfectly isolated functions. A working UI with poor unit tests ships value. Perfect unit tests with broken UI ships nothing.

5. Comments Are Usually a Code Smell

If you need a comment to explain what code does, the code is too complex. The only good comments explain WHY, not WHAT.

// Bad: explains what
// Increment the counter by one
counter++;

// Good: explains why
// We retry 3 times because the API has intermittent failures on Mondays
const MAX_RETRIES = 3;

These practices may seem extreme, but they address real problems in modern AI-assisted development where you need strong guardrails to maintain quality at speed.


💥 Lessons from Production

The Array Access That Took Down Production for 2 Hours

What Happened: In FileViewer component, highlightedTokens[index] returned undefined when the syntax highlighter was slower than the render cycle.

The Code:

// The killer line
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
// highlightedTokens had 97 items, but we were rendering 100 lines

Cost:

  • 2 hours of downtime
  • 3 engineers debugging
  • Hundreds of error reports

Prevention: noUncheckedIndexedAccess: true would have caught this at compile time Lesson: Race conditions often manifest as array access errors

The Git Status That Crashed on Mondays

What Happened: Git status component worked perfectly... except on Mondays when devs had 200+ changed files from weekend work.

Root Cause: UI rendered synchronously, blocking the main thread for 3+ seconds Solution: Virtualized list rendering + pagination Prevention: The Performance Loop would have caught this with realistic test data

The "Simple" Refactor That Broke Everything

What Happened: Developer changed yarn build to use parallel compilation. CI stayed green. Production builds were missing critical files.

Why CI Missed It: CI used cached build artifacts Fix: Added production build verification to CI Lesson: Your CI isn't testing what you think it's testing

Common patterns in production failures:

  • Array access without bounds checking
  • Race conditions in async operations
  • Performance cliffs with realistic data volumes
  • CI environment differs from production

🌍 Universal Practices (All Languages)

🎯 Core Philosophy

Always Fix, Never Delete

When encountering broken code, fix it rather than deleting it. That broken code often contains valuable business logic and edge case handling that took time to develop.

  • Fix issues at their root cause
  • Don't skip tests or remove functionality because it's difficult
  • Maintain all existing features while improving the codebase

This principle becomes especially important in vibe coding where your AI assistant might suggest removing complex code rather than understanding and fixing it.

Vibe Coding Needs Guardrails

When you're coding through conversation with AI, you need strong safety nets to verify the generated code:

  • 100% code coverage - Comprehensive tests verify the AI-generated code works as intended
  • CI always green - Broken builds block everyone and break momentum
  • No skipped tests - Every test documents expected behavior
  • Type safety - Let the compiler catch errors the AI might introduce

These guardrails enable speed, not restrict it. With comprehensive tests and type checking, you can accept AI suggestions confidently.

Code Quality Standards

Code quality isn't about perfectionism—it's about sustainability. These standards emerge from decades of collective experience showing what makes code maintainable over time. When functions grow too complex, they become impossible to understand. When parameter lists grow too long, the function is trying to do too much. When we allow silent failures, we create systems that fail mysteriously in production.

Maximum Function Complexity Examples

The key to maintainable code is keeping functions simple enough that you can understand them at a glance. Complex functions hide bugs, resist testing, and terrify other developers (including future you). Here's how different languages encourage simplicity:

Go Example:

// Good: Low complexity, single responsibility
func calculateDiscount(price float64, customerType string) float64 {
    discountRates := map[string]float64{
        "premium": 0.20,
        "regular": 0.10,
        "new":     0.15,
    }
    
    rate, exists := discountRates[customerType]
    if !exists {
        rate = 0.0
    }
    
    return price * rate
}

// Bad: High complexity, multiple responsibilities
func processOrderBad(order Order) (Result, error) {
    // Too many nested conditions and responsibilities
    // Split into smaller functions
}

Go's simplicity forces you to be explicit about error handling and avoid clever abstractions. The calculateDiscount function does one thing well - it maps customer types to discounts. No hidden complexity, no surprising behavior.

Kotlin Example:

// Good: Clear, focused functions
sealed class Result<out T> {
    data class Success<T>(val value: T) : Result<T>()
    data class Error(val message: String) : Result<Nothing>()
}

fun parseConfig(json: String): Result<Config> =
    try {
        Result.Success(Json.decodeFromString<Config>(json))
    } catch (e: Exception) {
        Result.Error("Invalid configuration: ${e.message}")
    }

// Use small, composable functions
fun validateConfig(config: Config): Result<Config> =
    when {
        config.timeout <= 0 -> Result.Error("Timeout must be positive")
        config.retries < 0 -> Result.Error("Retries cannot be negative")
        else -> Result.Success(config)
    }

Kotlin's sealed classes and expression-based functions make error handling elegant and type-safe. The Result type forces callers to handle both success and failure cases explicitly. Notice how parseConfig and validateConfig are small, focused functions that compose together - each does exactly one thing.

Immutable Data Structures

Mutability is the root of countless bugs. When data can change anywhere, anytime, reasoning about program behavior becomes impossible. Immutable data structures force you to be explicit about state changes, making programs easier to understand and debug.

Scala Example:

// Good: Immutable case classes and collections
case class User(
    id: UserId,
    name: String,
    email: Email,
    preferences: Set[Preference]
)

def updateUserPreferences(user: User, newPrefs: Set[Preference]): User =
    user.copy(preferences = user.preferences ++ newPrefs)

// Working with immutable collections
val users = List(user1, user2, user3)
val premiumUsers = users.filter(_.isPremium)
val updatedUsers = users.map(u => u.copy(lastSeen = Instant.now()))

Scala's case classes are immutable by default. The copy method creates a new instance with selected fields changed, leaving the original untouched. This makes it impossible to accidentally modify shared state - a common source of bugs in concurrent programs.

F# Example:

// F# - Immutable by default
type Customer = {
    Id: CustomerId
    Name: string
    Orders: Order list
    TotalSpent: decimal
}

// Pure functions with immutable data
let addOrder customer order =
    { customer with 
        Orders = order :: customer.Orders
        TotalSpent = customer.TotalSpent + order.Total }

// Pipelining with immutable transformations
let processCustomers customers =
    customers
    |> List.filter (fun c -> c.TotalSpent > 1000m)
    |> List.map (fun c -> { c with Status = Premium })
    |> List.sortByDescending (fun c -> c.TotalSpent)

F# takes immutability even further - everything is immutable by default. The with syntax creates a new record with specific fields updated. The pipeline operator (|>) makes data transformations read like a story: take customers, filter them, update their status, sort them. Each step produces a new collection, leaving the original untouched.

Explicit Error Handling

Silent failures are time bombs in your codebase. When errors are hidden or ignored, they surface at the worst possible moments - usually in production, usually at 3 AM. Explicit error handling forces you to consider and handle failure cases at compile time, not debug time.

Rust Example:

// Rust - Explicit error handling with Result type
#[derive(Debug, thiserror::Error)]
enum ConfigError {
    #[error("File not found: {0}")]
    FileNotFound(String),
    #[error("Parse error: {0}")]
    ParseError(#[from] serde_json::Error),
    #[error("Invalid value: {field} must be {requirement}")]
    InvalidValue { field: String, requirement: String },
}

fn load_config(path: &str) -> Result<Config, ConfigError> {
    let content = std::fs::read_to_string(path)
        .map_err(|_| ConfigError::FileNotFound(path.to_string()))?;
    
    let config: Config = serde_json::from_str(&content)?;
    
    validate_config(&config)?;
    
    Ok(config)
}

fn validate_config(config: &Config) -> Result<(), ConfigError> {
    if config.timeout_ms == 0 {
        return Err(ConfigError::InvalidValue {
            field: "timeout_ms".to_string(),
            requirement: "greater than 0".to_string(),
        });
    }
    
    Ok(())
}

Rust makes error handling impossible to ignore. The Result type forces you to handle both success and failure cases. The ? operator provides convenient error propagation while maintaining explicitness. Custom error types with thiserror make errors self-documenting - when something fails, you know exactly what went wrong and why.

Swift Example:

// Swift - Explicit error handling with typed errors
enum ValidationError: Error {
    case emptyInput
    case invalidFormat(String)
    case outOfRange(Int, min: Int, max: Int)
}

struct EmailValidator {
    static func validate(_ email: String) throws -> ValidatedEmail {
        guard !email.isEmpty else {
            throw ValidationError.emptyInput
        }
        
        let emailRegex = #"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$"#
        let predicate = NSPredicate(format: "SELF MATCHES[c] %@", emailRegex)
        
        guard predicate.evaluate(with: email) else {
            throw ValidationError.invalidFormat("Invalid email format")
        }
        
        return ValidatedEmail(email)
    }
}

// Usage with proper error handling
func registerUser(email: String, password: String) -> Result<User, Error> {
    do {
        let validEmail = try EmailValidator.validate(email)
        let hashedPassword = try PasswordHasher.hash(password)
        let user = User(email: validEmail, passwordHash: hashedPassword)
        return .success(user)
    } catch {
        return .failure(error)
    }
}

Swift's error handling combines the best of exceptions and return values. The throws keyword makes error possibilities explicit in function signatures, while pattern matching on Result types provides ergonomic error handling. The type system ensures you can't accidentally ignore errors—they must be handled with try, try?, or try!.

📋 Collaborative Specification Development

Living Documentation Philosophy

Traditional software development often treats specifications as contracts written in stone before coding begins. This approach fails because our understanding of the problem evolves as we build the solution. Living documentation embraces this reality.

Specifications should be living documents that evolve with your project, not static requirements written once and forgotten. They serve as both a guide and a historical record of decisions. When you discover that a feature needs to work differently than originally specified, you update the spec alongside the code. When you learn why a particular approach doesn't work, you document that learning in the spec. This way, future developers (including yourself in six months) understand not just what the system does, but why it does it that way.

Key Specification Documents

1. Project Vision & Goals (PROJECT_SPEC.md)

# Project Name Specification

## Vision
One paragraph describing what success looks like for this project.

## Core Principles
- Principle 1: Explanation
- Principle 2: Explanation

## Success Metrics
- Metric 1: How we measure it
- Metric 2: How we measure it

## Non-Goals
Things we explicitly choose NOT to do.

2. Technical Architecture (ARCHITECTURE.md)

# Architecture Specification

## System Overview
High-level architecture diagram and description.

## Key Design Decisions
### Decision: Use Event Sourcing
**Context**: Need audit trail and time-travel debugging
**Decision**: Implement event sourcing for state management
**Consequences**: More complex, but provides complete history
**Date**: 2024-01-15
**Revisit**: When we reach 1M events/day

## Component Specifications
### Component Name
- **Purpose**: What it does
- **Interfaces**: How it connects
- **Invariants**: What must always be true
- **Example Usage**: Code example

3. Process Specification (PROCESS_SPEC.md)

In AI-assisted development, your process documentation helps the AI understand how you prefer to work and what patterns to follow.

# Process Specification

## Our Development Philosophy
We practice AI-assisted development - iterative, feedback-driven coding where the AI writes most of the implementation based on our specifications.

## Feedback Loops We Use
1. **CI Green Loop** (5-15 min) - Our default state
2. **Bug Investigation Loop** (15-45 min) - When issues arise
3. **UI Polish Loop** (20-40 min) - After features work
4. **Performance Loop** (45-90 min) - When things feel slow

## Iteration Cadence
- **Micro**: Every commit (5-15 minutes)
- **Minor**: Every feature (2-4 hours)
- **Major**: Every week (retrospective)

## How We Learn
1. **Test First**: Write tests that describe what we want
2. **Implement**: Make the tests pass
3. **Reflect**: What did we learn? Update specs
4. **Iterate**: Apply learnings to next cycle

## UI Improvement Process
Following UI_IMPROVEMENT_LOOP.md:
1. Test current UI with real use cases
2. Identify friction points
3. Design improvements
4. Implement with tests
5. Validate with users
6. Document learnings

## Measuring Success
- CI stays green >95% of time
- Features ship within estimated loops
- Bug fix includes prevention measure
- Each iteration improves velocity

## Process Evolution
This process itself is a living document. We update it when:
- A loop consistently takes longer than expected
- We discover a new effective pattern
- Team feedback suggests improvements
- Metrics show process bottlenecks

Collaborative Spec Development Process

Initial Creation

Human + AI Collaboration Pattern:

// Human provides context
"I need a file sync system that handles conflicts"

// AI asks clarifying questions
"What types of conflicts? How should they be resolved?"

// Human provides constraints
"Last-write-wins for now, but log all conflicts"

// AI drafts initial spec
interface FileSyncSpec {
    conflictResolution: "last-write-wins" | "manual" | "merge";
    conflictLog: ConflictEvent[];
    syncStrategy: "immediate" | "batched" | "scheduled";
}

// Human refines
"Add offline support and partial sync"

// Iterate until complete

Spec Evolution Examples

Java Example - Evolving API Spec:

// Version 1.0 - Initial spec
public interface UserService {
    User createUser(String email, String password);
    User getUser(Long id);
}

// Version 1.1 - After discovering auth needs
public interface UserService {
    User createUser(String email, String password);
    User getUser(Long id);
    // ADDED: v1.1 - Need for API authentication
    User getUserByToken(String authToken);
}

// Version 2.0 - After performance issues
public interface UserService {
    CompletableFuture<User> createUser(String email, String password);
    CompletableFuture<User> getUser(Long id);
    // CHANGED: v2.0 - Made async for better performance
    CompletableFuture<User> getUserByToken(String authToken);
    // ADDED: v2.0 - Batch operations for efficiency
    CompletableFuture<List<User>> getUsers(List<Long> ids);
}

TypeScript Example - Growing Feature Spec:

// specs/search-feature.spec.ts - Version 1
export interface SearchSpec {
    capabilities: {
        textSearch: boolean;
        filters: string[];
        maxResults: number;
    };
    
    requirements: {
        responseTime: "<100ms for 95% of queries";
        accuracy: "90%+ relevance score";
    };
}

// Version 2 - After user feedback
export interface SearchSpec {
    capabilities: {
        textSearch: boolean;
        fuzzySearch: boolean;        // ADDED: Users need typo tolerance
        filters: string[];
        maxResults: number;
        pagination: boolean;         // ADDED: Large result sets
    };
    
    requirements: {
        responseTime: "<100ms for 95% of queries";
        accuracy: "90%+ relevance score";
        typoTolerance: "1-2 character errors"; // ADDED
    };
    
    // ADDED: Specific examples to clarify behavior
    examples: {
        fuzzySearch: [
            { input: "teh", expected: ["the", "tea", "tech"] },
            { input: "pythn", expected: ["python"] }
        ];
    };
}

Specification Best Practices

1. Include Both What and Why

Go Example:

// specs/rate-limiter.md
/*
## Rate Limiter Specification

### What
- Limit API calls to 100 requests per minute per user
- Use sliding window algorithm
- Return 429 status when limit exceeded

### Why
- Prevent API abuse (we had DDoS in Q3 2023)
- Ensure fair resource usage across customers
- Sliding window prevents burst exploitation

### Implementation Notes
*/

type RateLimiter interface {
    // Check returns true if request is allowed
    // Spec: Must be O(1) operation for performance
    Check(userID string) bool
    
    // Reset clears limits for testing
    // Spec: Only available in test builds
    Reset(userID string)
}

C# Example:

// Specs/CacheSpec.cs
namespace ProjectSpecs
{
    /// <summary>
    /// Cache Specification v2.1
    /// 
    /// Purpose: Reduce database load by 80%
    /// Strategy: Two-tier cache (memory + Redis)
    /// 
    /// History:
    /// - v1.0: Memory only (failed at scale)
    /// - v2.0: Added Redis tier
    /// - v2.1: Added cache warming
    /// </summary>
    public interface ICacheSpec
    {
        // Requirement: 99.9% cache availability
        TimeSpan DefaultExpiration { get; }
        
        // Requirement: <10ms read latency
        Task<T?> GetAsync<T>(string key);
        
        // Requirement: Write-through to database
        Task SetAsync<T>(string key, T value, TimeSpan? expiration = null);
    }
}

2. Track Decisions and Trade-offs

Python Example:

# specs/data_pipeline_spec.py
"""
Data Pipeline Specification

## Decision Log

### 2024-01: Chose Batch over Streaming
- **Options Considered**: 
  1. Real-time streaming (Kafka + Flink)
  2. Micro-batching (Spark Streaming) 
  3. Traditional batch (Airflow + Spark)
- **Decision**: Traditional batch
- **Rationale**: 
  - 15-minute data freshness acceptable
  - Simpler operations (team expertise)
  - 70% lower infrastructure cost
- **Revisit When**: 
  - Need <5 minute freshness
  - Team gains streaming expertise
  
### 2024-03: Added Incremental Processing
- **Problem**: Full reprocessing taking 6+ hours
- **Solution**: Track high watermarks, process only new data
- **Trade-off**: More complex state management
"""

from dataclasses import dataclass
from typing import Protocol, List
from datetime import datetime

class DataPipelineSpec(Protocol):
    """Specification for data pipeline components"""
    
    def process_batch(
        self, 
        start_time: datetime, 
        end_time: datetime
    ) -> BatchResult:
        """Process data within time window"""
        ...
    
    def get_watermark(self) -> datetime:
        """Get last successfully processed timestamp"""
        ...

3. Use Specs as Test Contracts

Rust Example:

// specs/reliability_spec.rs

/// Reliability Specification
/// 
/// This spec defines the reliability guarantees our system provides.
/// All implementations MUST pass these tests.

pub trait ReliabilitySpec {
    type Error;
    
    /// Messages must be delivered exactly once
    async fn deliver_message(&self, msg: Message) -> Result<DeliveryReceipt, Self::Error>;
    
    /// System must auto-recover from transient failures
    async fn handle_failure(&self, error: Self::Error) -> RecoveryAction;
}

#[cfg(test)]
mod spec_tests {
    use super::*;
    
    /// Any implementation of ReliabilitySpec must pass this test
    async fn test_exactly_once_delivery<T: ReliabilitySpec>(system: &T) {
        let msg = Message::new("test");
        
        // Send same message twice
        let receipt1 = system.deliver_message(msg.clone()).await.unwrap();
        let receipt2 = system.deliver_message(msg.clone()).await.unwrap();
        
        // Must get same receipt (idempotent)
        assert_eq!(receipt1.id, receipt2.id);
        
        // Must have delivered exactly once
        assert_eq!(get_delivery_count(msg.id), 1);
    }
}

Maintaining Specs with AI

Regular Review Pattern

Kotlin Example:

// Weekly spec review with AI
class SpecReview {
    fun reviewWithAI() {
        """
        Human: "Review our search spec against last week's bug reports"
        
        AI: "Found 3 issues:
        1. Spec doesn't cover empty query behavior (Bug #123)
        2. No mention of special character handling (Bug #125)  
        3. Performance requirement unrealistic for fuzzy search (Bug #130)"
        
        Human: "Update spec to address these"
        
        AI: "Here's the updated spec with additions marked..."
        """.trimIndent()
    }
}

// Spec evolves based on real-world learning
interface SearchSpecV3 {
    fun handleEmptyQuery(): SearchResult  // ADDED: Based on Bug #123
    fun escapeSpecialChars(query: String): String  // ADDED: Bug #125
    
    companion object {
        // UPDATED: Relaxed for fuzzy search based on Bug #130
        const val FUZZY_SEARCH_TARGET_LATENCY = "200ms"
        const val EXACT_SEARCH_TARGET_LATENCY = "100ms"
    }
}

Spec Organization in Repository

project-root/
├── README.md                 # Points to specs
├── specs/
│   ├── README.md            # Spec overview & index
│   ├── PROJECT_SPEC.md      # Overall vision
│   ├── PROCESS_SPEC.md      # How we work & iterate
│   ├── ARCHITECTURE.md      # Technical architecture
│   ├── API_SPEC.md          # API contracts
│   ├── features/
│   │   ├── search.spec.md
│   │   ├── auth.spec.md
│   │   └── sync.spec.md
│   └── decisions/
│       ├── 2024-01-database-choice.md
│       ├── 2024-02-caching-strategy.md
│       └── 2024-03-api-versioning.md
├── src/
│   └── [implementation following specs]
└── tests/
    └── spec-compliance/     # Tests that verify spec compliance

The Feedback Loop

The specification feedback loop transforms documentation from a chore into a powerful development tool. This isn't about bureaucracy—it's about learning faster and building better software.

  1. Write initial spec (Human + AI collaboration)
  2. Implement against spec (With AI assistance)
  3. Discover gaps/issues (Through usage)
  4. Update spec (Document learning)
  5. Refactor if needed (Maintain alignment)
  6. Repeat

This creates a virtuous cycle where specifications improve based on real-world experience, and implementations stay aligned with evolved understanding. Each iteration makes the spec more accurate and the code more purposeful. The AI assistant becomes more helpful over time because it has access to your accumulated wisdom in the specs. New team members onboard faster because the specs explain not just the what, but the why and the why-not.

🏗️ Environment & Tooling

Development Environment

What is nix-shell?

Many large companies use Nix to ensure developers have identical environments, eliminating "works on my machine" problems.

nix-shell is a tool that creates isolated, reproducible development environments. Think of it as a more powerful version of Python's virtualenv that works for ANY language and tool.

Why use it?

  • Consistency: Everyone gets the exact same versions of all tools
  • No "works on my machine": If it works in nix-shell, it works everywhere
  • Clean system: Doesn't pollute your global system with dependencies
  • Easy onboarding: New developers just run nix-shell and have everything

How to use it:

# Install Nix (one-time setup)
curl -L https://nixos.org/nix/install | sh

# Enter the development environment
nix-shell

# Now you have all project tools available
which node  # Specific Node.js version for this project
which cargo # Specific Rust version for this project

Example shell.nix file:

{ pkgs ? import <nixpkgs> {} }:

pkgs.mkShell {
  buildInputs = with pkgs; [
    nodejs-18_x
    yarn
    rustc
    cargo
    python311
    git
  ];
  
  shellHook = ''
    echo "Welcome to the project dev environment!"
    echo "Node $(node --version), Yarn $(yarn --version)"
  '';
}

Note: Some challenges with Python due to virtual environment conflicts - use poetry or pipenv inside nix-shell for Python projects.

Package Management

Package management is where good intentions meet harsh reality. Every npm install or yarn add is a trust decision—you're inviting someone else's code into your project, along with all their dependencies, and their dependencies' dependencies. A single compromised package can take down thousands of projects, as we've seen with incidents like left-pad and event-stream.

Core Principles:

  • Consistency is non-negotiable: Choose yarn or npm at project start and stick with it. Mixing package managers creates subtle bugs that waste hours of debugging time.
  • Lock files are your safety net: yarn.lock or package-lock.json ensures everyone gets exactly the same versions. Commit these files always—they're as important as your source code.
  • Audit regularly, update thoughtfully: Run yarn audit weekly, but don't blindly update everything. Each update is a potential breaking change. Update security patches immediately, minor versions carefully, major versions with full testing.
  • Document everything: Your README should tell a new developer exactly how to get from zero to running code. If it takes more than three commands, you're doing it wrong.

GitHub Actions CI Setup

What is GitHub Actions?

GitHub Actions provides 2,000 free minutes per month for private repos, which is sufficient for most small teams.

GitHub Actions is GitHub's built-in CI/CD platform. It runs your tests, builds, and deployments automatically when you push code.

Key Concepts:

  • Workflow: A complete CI/CD process (defined in .github/workflows/)
  • Job: A set of steps that run on the same runner
  • Step: Individual task (run tests, build, deploy)
  • Runner: Virtual machine that executes your jobs
  • Action: Reusable unit of code (like a function)

Basic CI Workflow Example

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          
      - name: Install dependencies
        run: yarn install --frozen-lockfile
        
      - name: Run tests
        run: yarn test
        
      - name: Run type checks
        run: yarn typecheck
        
      - name: Run linter
        run: yarn lint

Caching for Faster CI

Why cache?

  • Speed: Avoid re-downloading dependencies every run
  • Cost: Fewer API calls to package registries
  • Reliability: Less dependent on external services

Yarn/NPM Caching Example:

- name: Cache node modules
  uses: actions/cache@v4
  with:
    path: |
      ~/.cache/yarn
      node_modules
    key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
    restore-keys: |
      ${{ runner.os }}-yarn-

Rust/Cargo Caching Example:

- name: Cache cargo registry
  uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/registry
      ~/.cargo/git
      target
    key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

Nix Caching with Cachix

What is Cachix? Cachix is a binary cache service for Nix that dramatically speeds up CI builds by caching compiled packages.

Setting up Cachix (Free Tier):

- uses: cachix/install-nix-action@v24
  with:
    nix_path: nixpkgs=channel:nixos-unstable
    
- uses: cachix/cachix-action@v14
  with:
    name: your-cache-name  # Use the public cache
    # No authToken needed for public caches
    
- run: nix-shell --run "yarn test"

Benefits of Cachix:

  • Fast builds: Download pre-built binaries instead of compiling
  • Free tier: Public caches are free
  • Shared cache: Team members benefit from each other's builds

Complete CI Example with Caching

name: Complete CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      # Nix with Cachix for reproducible environment
      - uses: cachix/install-nix-action@v24
      - uses: cachix/cachix-action@v14
        with:
          name: nix-community  # Using public community cache
          
      # Node.js caching
      - name: Cache node modules
        uses: actions/cache@v4
        with:
          path: |
            ~/.cache/yarn
            node_modules
          key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
          
      # Run everything in nix-shell
      - name: Install dependencies
        run: nix-shell --run "yarn install --frozen-lockfile"
        
      - name: Run tests
        run: nix-shell --run "yarn test"
        
      - name: Type check
        run: nix-shell --run "yarn typecheck"
        
      - name: Lint
        run: nix-shell --run "yarn lint"
        
      # Upload test results
      - name: Upload coverage
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

  docker-e2e:
    runs-on: ubuntu-latest
    needs: test  # Only run after tests pass
    
    steps:
      - uses: actions/checkout@v4
      
      # Docker layer caching
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        
      - name: Build and test
        run: |
          docker-compose -f docker-compose.test.yml build
          docker-compose -f docker-compose.test.yml up --abort-on-container-exit
          
      - name: Upload test artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: e2e-results
          path: test-results/

CI Performance Tips

1. Parallel Jobs:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps: [...]
    
  test:
    runs-on: ubuntu-latest
    steps: [...]
    
  typecheck:
    runs-on: ubuntu-latest
    steps: [...]
    
  # These run in parallel!

2. Matrix Builds:

strategy:
  matrix:
    node: [16, 18, 20]
    os: [ubuntu-latest, macos-latest]
    
runs-on: ${{ matrix.os }}
steps:
  - uses: actions/setup-node@v4
    with:
      node-version: ${{ matrix.node }}

3. Conditional Steps:

- name: Deploy
  if: github.ref == 'refs/heads/main' && github.event_name == 'push'
  run: ./deploy.sh

GitHub CLI Integration

The GitHub CLI (gh) transforms how you interact with CI/CD. Instead of constantly refreshing browser tabs to check if your build passed, you can monitor and control everything from your terminal. This tool is especially powerful when paired with an AI assistant—you can share CI failures directly and get immediate help debugging.

Installation and Setup:

# Install GitHub CLI
brew install gh  # macOS
# or for Linux:
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo gpg --dearmor -o /usr/share/keyrings/githubcli-archive-keyring.gpg

# Login (choose browser auth for simplicity)
gh auth login

Essential Commands for CI Debugging:

# Quick status check - see your last 5 workflow runs
gh run list --limit 5

# Watch a running workflow in real-time (like tail -f for CI)
gh run watch

# Deep dive into failures - see full logs
gh run view <run-id> --log

# Failed due to flaky test? Re-run just the failed jobs
gh run rerun <run-id> --failed

# Download artifacts from a workflow run
gh run download <run-id>

The Power Move - Integrating with AI:

# Capture failure logs and send to your AI assistant
gh run view <run-id> --log | grep -A 20 "Error:" > failure.txt
# Now share failure.txt with your AI for debugging help

Security Best Practice: When automating with scripts, never hardcode tokens. Instead:

# Store token securely
echo "ghp_yourtoken" > /tmp/gh_token.txt
chmod 600 /tmp/gh_token.txt

# Use in scripts
export GH_TOKEN=$(cat /tmp/gh_token.txt)
gh run list  # Now authenticated

# NEVER do this:
# echo $GH_TOKEN  # This exposes your token!

The GitHub CLI becomes indispensable once you realize you can fix CI issues without leaving your editor. Combined with an AI assistant that can read logs and suggest fixes, you'll resolve CI failures in minutes instead of hours.

🧪 Testing Excellence

Universal Testing Principles

In vibe coding, testing is non-negotiable. When AI generates most of your code, comprehensive tests are essential for verification. 100% code coverage provides the foundation for confident refactoring and rapid iteration.

  • NEVER SKIP TESTS - Fix failing tests instead of using .skip or .todo
  • Test individual functions and components in isolation
  • Use real services where possible in integration tests
  • Fast unit test execution (< 100ms per test)
  • High coverage of edge cases and error conditions
  • Write tests first for complex features (TDD)
  • Tests should be deterministic and repeatable
  • Use property-based testing for invariants
  • Test race conditions in concurrent code
  • Focus on testing business logic, not implementation details

The principle of never skipping tests deserves special attention. When a test fails, it's telling you something important—either your code is broken, your test is wrong, or your understanding of the requirements has evolved. Skipping the test silences this feedback. Instead, fix the issue or update the test to match new requirements. Every skipped test is a landmine waiting for the next developer.

Testing Categories

Each type of test serves a specific purpose in your safety net. Think of them as different zoom levels on a microscope—unit tests examine individual cells, integration tests watch how organs work together, and E2E tests verify the whole organism functions. Choosing the right test type for each scenario is as important as writing the test itself.

Unit Tests

Unit tests are your first line of defense. They're the fastest to write, fastest to run, and fastest to debug when they fail. The key to great unit tests is ruthless isolation—each test should examine exactly one piece of behavior.

Characteristics of Great Unit Tests:

  • Lightning fast: If a unit test takes more than 100ms, it's not a unit test. Speed matters because you'll run these thousands of times.
  • Surgical precision: Test one specific behavior. When it fails, you should know exactly what's broken without debugging.
  • Edge case hunters: This is where you test the weird stuff—empty arrays, null inputs, Unicode strings, negative numbers. If it can happen in production, test it here.
  • Deterministic: Same input, same output, every single time. No random data, no time dependencies, no network calls.

Integration Tests

Integration tests reveal the lies that unit tests tell. Your perfectly isolated components might work flawlessly alone but fail spectacularly when connected. Integration tests catch the impedance mismatches between systems.

What Makes Integration Tests Valuable:

  • Real collaborations: Test how your code actually talks to databases, APIs, and file systems. Mock as little as possible.
  • Data flow validation: Follow data as it moves through your system. Does that user input actually make it to the database correctly?
  • Error propagation: When the database is down, does your API return a proper error? When the API fails, does your UI show a helpful message?
  • Boundary testing: This is where you test timeouts, retries, and circuit breakers—all the stuff that only matters when systems interact.

E2E Tests

E2E tests are your users' advocates. They don't care about your beautiful architecture or clever algorithms—they care that clicking the button does what it's supposed to do. These tests are expensive to write and slow to run, but they catch the bugs that users actually experience.

The E2E Philosophy - Keep It Real:

  • NO MOCKING: The moment you mock in an E2E test, it's not E2E anymore. Use real databases, real files, real network calls. Yes, it's slower. Yes, it's worth it.
  • Real File Operations: Don't simulate file changes—actually write files to disk and verify your file watcher notices. Create real git commits and check that your git integration works.
  • Live System Integration: Start your actual backend, connect to real services, use genuine authentication. If it's flaky, fix the flakiness—don't hide it with mocks.
  • User-Centric Workflows: Don't test implementation details. Test what users actually do: "I drag a file here, type some code, hit save, and see my changes in version control."

Property-Based Testing

What is Property-Based Testing?

Property-based testing is a testing approach where instead of writing specific test cases, you describe properties that should always be true, and the testing framework generates random inputs to try to find counterexamples.

The shift from example-based to property-based testing is profound. With traditional testing, you're limited by your imagination—you test the cases you think of. With property-based testing, the computer generates cases you never imagined, often finding bugs in edge cases like empty strings, negative numbers, or Unicode characters you forgot existed.

Traditional Testing:

// Test specific cases
expect(add(2, 3)).toBe(5);
expect(add(0, 0)).toBe(0);
expect(add(-1, 1)).toBe(0);

Property-Based Testing:

// Test properties that should ALWAYS be true
property("addition is commutative", (a: number, b: number) => {
    return add(a, b) === add(b, a);
});

The beauty of property-based testing is that when it finds a failing case, it automatically "shrinks" the input to find the minimal failing case. If your function fails on a 100-element array, the framework will systematically reduce it to find that it actually fails on any array with more than 3 elements, making debugging much easier.

Why use it?

  • Finds edge cases you didn't think of: The framework generates hundreds of test cases
  • Better test coverage: Tests properties, not just examples
  • Discovers hidden assumptions: Often reveals bugs in boundary conditions
  • Documents behavior: Properties serve as executable specifications

When to use it:

  • Testing invariants and edge cases
  • Great for race condition detection
  • Focus on specific problematic patterns
  • Generate test cases automatically
  • Test with boundary conditions and edge cases
  • Verify properties hold across all possible inputs
  • Find counterexamples to assumptions
  • Test mathematical properties (commutativity, associativity, etc.)

Race Condition Detection

What is a Race Condition?

A race condition occurs when the behavior of software depends on the relative timing of events, especially in concurrent or asynchronous systems. The "race" is between different parts of code trying to access or modify shared resources.

Race conditions are particularly insidious because they violate our mental model of how programs execute. We think of code running line by line, but in concurrent systems, multiple lines of code execute simultaneously across different threads or async contexts. The bug only manifests when timing aligns in just the wrong way—which might be one time in a thousand, making it nearly impossible to debug through traditional means.

Classic Example:

let counter = 0;

// Two async operations racing
async function increment() {
    const current = counter;  // Read
    await delay(1);          // Some async work
    counter = current + 1;   // Write
}

// If both run at once:
// Both read 0, both write 1
// Result: counter = 1 (should be 2!)

This trivial example illustrates a pattern that causes real problems: check-then-act operations where the state can change between the check and the action. In production systems, this pattern appears in database operations, file system access, distributed systems communication, and anywhere else multiple actors might access shared resources.

Why are they dangerous?

  • Intermittent: Only fail under specific timing
  • Hard to reproduce: May work fine in development
  • Data corruption: Can lead to inconsistent state
  • Security risks: Can be exploited by attackers

How to detect them:

  • Critical for concurrent/async systems
  • Test timing-dependent failures systematically
  • Use controlled scheduling to explore execution orders
  • Focus on shared state and resource contention
  • Test check-then-act patterns
  • Verify atomicity of operations
  • Test cleanup in failure scenarios

Race Condition Testing Patterns

Pattern 1: Manual Scheduling for Race Detection

TypeScript Example:

// TypeScript - Testing increment race condition
async function testIncrementRace(iterations: number): Promise<number> {
    let raceCount = 0;
    
    for (let i = 0; i < iterations; i++) {
        let counter = 0;
        
        const increment = async (): Promise<void> => {
            const current = counter;
            await new Promise(resolve => setImmediate(resolve)); // Yield control
            counter = current + 1;
        };
        
        await Promise.all([increment(), increment()]);
        
        if (counter === 1) {
            raceCount++;
        }
    }
    
    return raceCount;
}

Rust Example:

// Rust - Testing increment race condition
use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};
use tokio::task;

async fn test_increment_race(iterations: u32) -> u32 {
    let mut race_count = 0u32;
    
    for _ in 0..iterations {
        let counter = Arc::new(AtomicU32::new(0));
        let mut handles = vec![];
        
        for _ in 0..2 {
            let counter_clone = Arc::clone(&counter);
            let handle = task::spawn(async move {
                let current = counter_clone.load(Ordering::Relaxed);
                tokio::task::yield_now().await; // Yield control
                counter_clone.store(current + 1, Ordering::Relaxed);
            });
            handles.push(handle);
        }
        
        for handle in handles {
            handle.await.unwrap();
        }
        
        if counter.load(Ordering::Relaxed) == 1 {
            race_count += 1;
        }
    }
    
    race_count
}
Pattern 2: Double-Delete Race Condition

Java Example:

// Java - Testing double-delete race condition
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

public class DoubleDeleteTest {
    private final AtomicInteger detectedRaces = new AtomicInteger(0);
    
    public void testDoubleDelete(int runs) throws InterruptedException {
        ExecutorService executor = Executors.newFixedThreadPool(2);
        
        for (int run = 0; run < runs; run++) {
            FileSystem fs = new RaceProneFileSystem();
            String fileId = fs.createFile("/temp.txt", "temp");
            CountDownLatch latch = new CountDownLatch(2);
            
            Runnable deleteTask = () -> {
                try {
                    fs.deleteFile(fileId);
                } catch (FileNotFoundException e) {
                    detectedRaces.incrementAndGet();
                } finally {
                    latch.countDown();
                }
            };
            
            executor.submit(deleteTask);
            executor.submit(deleteTask);
            
            latch.await();
        }
        
        executor.shutdown();
    }
}

Go Example:

// Go - Testing double-delete race condition
package main

import (
    "sync"
    "sync/atomic"
    "errors"
)

type FileSystem interface {
    CreateFile(path string, content string) (string, error)
    DeleteFile(fileId string) error
}

func testDoubleDelete(runs int, fs FileSystem) int32 {
    var detectedRaces int32
    
    for run := 0; run < runs; run++ {
        fileId, _ := fs.CreateFile("/temp.txt", "temp")
        
        var wg sync.WaitGroup
        wg.Add(2)
        
        deleteTask := func() {
            defer wg.Done()
            if err := fs.DeleteFile(fileId); err != nil {
                if errors.Is(err, ErrFileNotFound) {
                    atomic.AddInt32(&detectedRaces, 1)
                }
            }
        }
        
        go deleteTask()
        go deleteTask()
        
        wg.Wait()
    }
    
    return atomic.LoadInt32(&detectedRaces)
}
Pattern 3: Resource Pool Contention

C# Example:

// C# - Testing resource pool contention
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;

public class ResourcePoolTest {
    public async Task<int> TestResourceContention(int iterations) {
        var resourcePool = new ResourcePool<Connection>(maxSize: 2);
        var errors = new ConcurrentBag<(int workerId, Exception error)>();
        
        for (int i = 0; i < iterations; i++) {
            var tasks = new Task[5];
            
            for (int workerId = 0; workerId < 5; workerId++) {
                int id = workerId; // Capture loop variable
                tasks[workerId] = Task.Run(async () => {
                    try {
                        Connection resource = await resourcePool.AcquireAsync();
                        await Task.Delay(100); // Simulate work
                        await resourcePool.ReleaseAsync(resource);
                    } catch (Exception ex) {
                        errors.Add((id, ex));
                    }
                });
            }
            
            await Task.WhenAll(tasks);
        }
        
        return errors.Count;
    }
}

public class ResourcePool<T> where T : class, new() {
    private readonly SemaphoreSlim _semaphore;
    private readonly ConcurrentQueue<T> _resources;
    
    public ResourcePool(int maxSize) {
        _semaphore = new SemaphoreSlim(maxSize, maxSize);
        _resources = new ConcurrentQueue<T>();
        for (int i = 0; i < maxSize; i++) {
            _resources.Enqueue(new T());
        }
    }
    
    public async Task<T> AcquireAsync() {
        await _semaphore.WaitAsync();
        _resources.TryDequeue(out T resource);
        return resource ?? throw new InvalidOperationException("No resources available");
    }
    
    public async Task ReleaseAsync(T resource) {
        _resources.Enqueue(resource);
        _semaphore.Release();
    }
}

Kotlin Example:

// Kotlin - Testing resource pool contention
import kotlinx.coroutines.*
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.Semaphore
import java.util.concurrent.atomic.AtomicInteger

class ResourcePoolTest {
    suspend fun testResourceContention(iterations: Int): Int {
        val resourcePool = ResourcePool<Connection>(maxSize = 2)
        val errorCount = AtomicInteger(0)
        
        repeat(iterations) {
            coroutineScope {
                val jobs = List(5) { workerId ->
                    launch {
                        try {
                            val resource = resourcePool.acquire()
                            delay(100) // Simulate work
                            resourcePool.release(resource)
                        } catch (e: Exception) {
                            errorCount.incrementAndGet()
                        }
                    }
                }
                jobs.joinAll()
            }
        }
        
        return errorCount.get()
    }
}

class ResourcePool<T>(private val maxSize: Int) where T : Any {
    private val semaphore = Semaphore(maxSize)
    private val resources = ConcurrentLinkedQueue<T>()
    
    init {
        repeat(maxSize) {
            resources.offer(createResource())
        }
    }
    
    @Suppress("UNCHECKED_CAST")
    private fun createResource(): T = Connection() as T
    
    suspend fun acquire(): T = withContext(Dispatchers.IO) {
        semaphore.acquire()
        resources.poll() ?: throw IllegalStateException("No resources available")
    }
    
    suspend fun release(resource: T) = withContext(Dispatchers.IO) {
        resources.offer(resource)
        semaphore.release()
    }
}

class Connection

Property-Based Testing Examples

Property-based testing finds bugs that example-based tests miss. Instead of testing specific inputs, you define properties that should always hold true, then let the framework generate hundreds of random inputs to try to break your assumptions. Here are the most powerful properties to test:

Property 1: Reversibility/Round-trip

The round-trip property states that if you transform data and then reverse the transformation, you should get back exactly what you started with. This catches subtle bugs in serialization, encoding, parsing, and data transformation.

TypeScript Example:

// TypeScript - Testing serialization round-trip property
import fc from 'fast-check';

interface User {
    id: string;
    name: string;
    age: number;
    tags: string[];
}

const userArbitrary = fc.record<User>({
    id: fc.uuid(),
    name: fc.string({ minLength: 1, maxLength: 50 }),
    age: fc.integer({ min: 0, max: 150 }),
    tags: fc.array(fc.string(), { maxLength: 10 })
});

describe('Serialization properties', () => {
    it('should maintain data through serialization round-trip', () => {
        fc.assert(
            fc.property(userArbitrary, (user: User) => {
                const serialized = JSON.stringify(user);
                const deserialized = JSON.parse(serialized) as User;
                
                expect(deserialized).toEqual(user);
                expect(deserialized.id).toBe(user.id);
                expect(deserialized.tags).toEqual(user.tags);
            })
        );
    });
});

Scala Example:

// Scala - Testing serialization round-trip property
import org.scalacheck.{Arbitrary, Gen, Properties}
import org.scalacheck.Prop.forAll
import play.api.libs.json._

case class User(id: String, name: String, age: Int, tags: List[String])

object SerializationSpec extends Properties("Serialization") {
    implicit val userFormat: Format[User] = Json.format[User]
    
    val genUser: Gen[User] = for {
        id <- Gen.uuid.map(_.toString)
        name <- Gen.alphaStr.suchThat(_.nonEmpty)
        age <- Gen.choose(0, 150)
        tags <- Gen.listOfN(5, Gen.alphaStr)
    } yield User(id, name, age, tags)
    
    implicit val arbUser: Arbitrary[User] = Arbitrary(genUser)
    
    property("round-trip") = forAll { (user: User) =>
        val serialized = Json.toJson(user)
        val deserialized = Json.fromJson[User](serialized)
        
        deserialized match {
            case JsSuccess(value, _) => value == user
            case JsError(_) => false
        }
    }
}
Property 2: Idempotence

Java Example:

// Java - Testing idempotent operations
import net.jqwik.api.*;
import java.util.Set;
import java.util.HashSet;

class IdempotenceTest {
    
    @Property
    void normalizationIsIdempotent(@ForAll String input) {
        String normalized1 = normalize(input);
        String normalized2 = normalize(normalized1);
        
        Assertions.assertThat(normalized2).isEqualTo(normalized1);
    }
    
    @Property
    void deduplicationIsIdempotent(@ForAll List<@AlphaChars String> items) {
        Set<String> deduped1 = deduplicate(items);
        Set<String> deduped2 = deduplicate(new ArrayList<>(deduped1));
        
        Assertions.assertThat(deduped2).isEqualTo(deduped1);
    }
    
    private String normalize(String input) {
        return input.trim().toLowerCase().replaceAll("\\s+", " ");
    }
    
    private Set<String> deduplicate(List<String> items) {
        return new HashSet<>(items);
    }
}

Swift Example:

Swift's strong type system and SwiftCheck library make it easy to test idempotent operations—operations that produce the same result no matter how many times they're applied. This is crucial for data sanitization, caching, and distributed systems where operations might be retried.

// Swift - Testing idempotent operations
import SwiftCheck

struct User: Equatable {
    let id: String
    let name: String
    let preferences: Set<String>
}

extension User: Arbitrary {
    static var arbitrary: Gen<User> {
        return Gen.zip3(
            String.arbitrary,
            String.arbitrary.suchThat { !$0.isEmpty },
            Set<String>.arbitrary
        ).map(User.init)
    }
}

class IdempotenceTests {
    func testSanitizationIsIdempotent() {
        property("User sanitization is idempotent") <- forAll { (user: User) in
            let sanitized1 = self.sanitizeUser(user)
            let sanitized2 = self.sanitizeUser(sanitized1)
            
            return sanitized1 == sanitized2
        }
    }
    
    private func sanitizeUser(_ user: User) -> User {
        return User(
            id: user.id.lowercased(),
            name: user.name.trimmingCharacters(in: .whitespacesAndNewlines),
            preferences: Set(user.preferences.map { $0.lowercased() })
        )
    }
}

In this example, the sanitizeUser function is idempotent—running it twice produces the same result as running it once. The property test generates thousands of random users and verifies this property holds for all of them. This gives us confidence that our sanitization logic won't corrupt data if accidentally applied multiple times.

Property 3: Commutativity

Commutative operations produce the same result regardless of the order of operands. This property is essential for distributed systems, concurrent updates, and conflict resolution. When operations are commutative, you can apply them in any order and get consistent results.

F# Example:

// F# - Testing commutative operations
open FsCheck
open FsCheck.Xunit

type Configuration = {
    Flags: Set<string>
    Settings: Map<string, int>
    Features: string list
}

[<Property>]
let ``merging configurations is commutative`` (config1: Configuration) (config2: Configuration) =
    let merge c1 c2 = {
        Flags = Set.union c1.Flags c2.Flags
        Settings = Map.fold (fun acc k v -> Map.add k v acc) c1.Settings c2.Settings
        Features = c1.Features @ c2.Features |> List.distinct
    }
    
    let result1 = merge config1 config2
    let result2 = merge config2 config1
    
    // Flags and features order doesn't matter
    result1.Flags = result2.Flags &&
    result1.Settings = result2.Settings &&
    Set.ofList result1.Features = Set.ofList result2.Features

Haskell Example:

Haskell's type system excels at expressing commutative properties. This example shows event sourcing with conflict resolution—a common pattern in distributed systems where events might arrive out of order.

-- Haskell - Testing commutative operations
import Test.QuickCheck

data Event = Created String | Updated String String | Deleted String
    deriving (Eq, Show)

instance Arbitrary Event where
    arbitrary = oneof
        [ Created <$> arbitrary
        , Updated <$> arbitrary <*> arbitrary
        , Deleted <$> arbitrary
        ]

-- Property: Event merging with conflict resolution is commutative
prop_mergeCommutative :: Event -> Event -> Bool
prop_mergeCommutative e1 e2 = 
    mergeEvents e1 e2 == mergeEvents e2 e1
  where
    mergeEvents :: Event -> Event -> Event
    mergeEvents (Created _) e@(Updated _ _) = e
    mergeEvents e@(Updated _ _) (Created _) = e
    mergeEvents (Deleted id1) _ = Deleted id1
    mergeEvents _ (Deleted id2) = Deleted id2
    mergeEvents e1 e2 = e2  -- Last write wins for same types

The merge function implements a conflict resolution strategy that's commutative: deletions always win, updates override creates, and for same-type conflicts, we use last-write-wins. This ensures that no matter what order events are processed, the final state is consistent.

Property 4: Associativity

Associative operations produce the same result regardless of how operations are grouped. This is fundamental for parallel processing, distributed aggregation, and functional composition. When operations are associative, you can break work into chunks and process them in any grouping.

C++ Example:

// C++ - Testing associative operations
#include <rapidcheck.h>
#include <vector>
#include <numeric>

struct Matrix {
    std::vector<std::vector<double>> data;
    
    Matrix operator*(const Matrix& other) const {
        // Matrix multiplication implementation
        // ...
    }
    
    bool operator==(const Matrix& other) const {
        return data == other.data;
    }
};

void testMatrixMultiplicationAssociative() {
    rc::check("Matrix multiplication is associative",
        [](const Matrix& a, const Matrix& b, const Matrix& c) {
            // Assuming compatible dimensions
            Matrix result1 = (a * b) * c;
            Matrix result2 = a * (b * c);
            
            RC_ASSERT(result1 == result2);
        }
    );
}

void testStringConcatenationAssociative() {
    rc::check("String concatenation is associative",
        [](const std::string& a, const std::string& b, const std::string& c) {
            std::string result1 = (a + b) + c;
            std::string result2 = a + (b + c);
            
            RC_ASSERT(result1 == result2);
        }
    );
}

Python (with types) Example:

Python with type hints allows us to express mathematical concepts like monoids clearly. A monoid is a structure with an associative operation and an identity element—fundamental to many distributed algorithms and functional programming patterns.

# Python - Testing associative operations with type hints
from typing import List, TypeVar, Callable
from hypothesis import given, strategies as st
from dataclasses import dataclass

T = TypeVar('T')

@dataclass
class Monoid:
    """A monoid with associative operation and identity"""
    combine: Callable[[T, T], T]
    identity: T

def test_associativity(monoid: Monoid[T], a: T, b: T, c: T) -> bool:
    """Test that (a • b) • c = a • (b • c)"""
    result1 = monoid.combine(monoid.combine(a, b), c)
    result2 = monoid.combine(a, monoid.combine(b, c))
    return result1 == result2

# List concatenation monoid
list_monoid = Monoid[List[int]](
    combine=lambda x, y: x + y,
    identity=[]
)

@given(
    st.lists(st.integers()),
    st.lists(st.integers()),
    st.lists(st.integers())
)
def test_list_concat_associative(a: List[int], b: List[int], c: List[int]):
    assert test_associativity(list_monoid, a, b, c)

This pattern shows how to create reusable property tests for any monoid. List concatenation is naturally associative—whether you combine [1,2] + ([3,4] + [5,6]) or ([1,2] + [3,4]) + [5,6], you get [1,2,3,4,5,6]. This property enables parallel processing of list operations.

Property 5: Invariant Preservation

The most powerful property tests verify that operations preserve critical invariants. An invariant is a condition that must always be true—like a binary search tree maintaining sorted order or a bank account never going negative. These tests catch subtle bugs that unit tests miss.

OCaml Example:

(* OCaml - Testing invariant preservation *)
open QCheck

type 'a binary_tree = 
  | Leaf 
  | Node of 'a * 'a binary_tree * 'a binary_tree

(* Binary search tree invariant *)
let rec is_bst = function
  | Leaf -> true
  | Node (v, left, right) ->
      let check_left = match left with
        | Leaf -> true
        | Node (lv, _, _) -> lv < v
      in
      let check_right = match right with
        | Leaf -> true
        | Node (rv, _, _) -> rv > v
      in
      check_left && check_right && is_bst left && is_bst right

(* Property: insert maintains BST invariant *)
let prop_insert_maintains_bst =
  Test.make ~count:1000
    ~name:"insert maintains BST invariant"
    (pair (list int) arbitrary)
    (fun (elements, tree) ->
      let tree' = List.fold_left insert_bst tree elements in
      is_bst tree'
    )

(* Property: balanced tree operations maintain balance invariant *)
let prop_balance_maintained =
  Test.make ~count:1000
    ~name:"operations maintain balance"
    (list int)
    (fun elements ->
      let tree = List.fold_left insert_balanced empty elements in
      let height_diff = abs (height (left tree) - height (right tree)) in
      height_diff <= 1
    )

Rust Example:

Rust's ownership system provides strong guarantees, but we still need to verify that our abstractions maintain their invariants. This example shows two critical patterns: a sorted vector that must stay sorted, and set operations that must maintain uniqueness.

// Rust - Testing invariant preservation
use proptest::prelude::*;
use std::collections::BTreeSet;

#[derive(Debug, Clone, PartialEq)]
struct SortedVec<T: Ord> {
    data: Vec<T>,
}

impl<T: Ord + Clone> SortedVec<T> {
    fn new() -> Self {
        SortedVec { data: Vec::new() }
    }
    
    fn insert(&mut self, value: T) {
        match self.data.binary_search(&value) {
            Ok(pos) | Err(pos) => self.data.insert(pos, value),
        }
    }
    
    fn is_sorted(&self) -> bool {
        self.data.windows(2).all(|w| w[0] <= w[1])
    }
}

proptest! {
    #[test]
    fn insert_maintains_sorted_invariant(
        initial in prop::collection::vec(any::<i32>(), 0..100),
        to_insert in prop::collection::vec(any::<i32>(), 0..50)
    ) {
        let mut sorted = SortedVec::new();
        
        // Build initial sorted vec
        for value in initial {
            sorted.insert(value);
        }
        
        // Property: sorted after each insert
        for value in to_insert {
            sorted.insert(value);
            prop_assert!(sorted.is_sorted());
        }
    }
    
    #[test]
    fn operations_preserve_set_properties(
        operations in prop::collection::vec(
            prop_oneof![
                any::<i32>().prop_map(|x| ("insert", x)),
                any::<i32>().prop_map(|x| ("remove", x)),
            ],
            0..100
        )
    ) {
        let mut set = BTreeSet::new();
        
        for (op, value) in operations {
            match op {
                "insert" => { set.insert(value); },
                "remove" => { set.remove(&value); },
                _ => unreachable!(),
            }
            
            // Invariant: no duplicates
            let vec: Vec<_> = set.iter().cloned().collect();
            let unique_count = vec.iter().collect::<BTreeSet<_>>().len();
            prop_assert_eq!(vec.len(), unique_count);
        }
    }
}

The first test generates random sequences of insertions and verifies the sorted invariant holds after each one. The second test mixes insert and remove operations, checking that the set never contains duplicates. These tests would catch bugs like forgetting to maintain order during insertion or accidentally allowing duplicates.

Property-Based Testing with IO Schedulers for Reproducible Race Conditions

What are IO Schedulers?

IO Schedulers in property-based testing frameworks allow you to control the execution order of asynchronous operations deterministically. This makes race conditions reproducible and testable.

Imagine you're debugging a race condition that only appears in production once a week. You can't attach a debugger to production, and you can't reproduce it locally no matter how many times you run the test. This is where IO schedulers revolutionize concurrent testing. They turn non-deterministic bugs into deterministic ones by taking control of time itself—at least from your program's perspective.

The Problem:

  • Race conditions depend on timing
  • Traditional testing can't control async execution order
  • Bugs appear randomly and are hard to reproduce

The Solution:

  • IO schedulers intercept all async operations
  • They systematically try different execution orders
  • When a bug is found, they provide a seed to reproduce it

The magic happens through systematic exploration. Where traditional testing might run your concurrent code 1000 times and never hit the race condition, an IO scheduler methodically tries different orderings: What if Promise A resolves before Promise B? What if they resolve simultaneously? What if B completes while A is half-done? By exploring these possibilities systematically rather than randomly, IO schedulers can find race conditions that would take millions of random runs to encounter.

Fast-Check (JavaScript/TypeScript)

Fast-check provides a powerful scheduler for testing async race conditions:

import fc from 'fast-check';

describe('Race condition testing with fast-check', () => {
    it('detects race in concurrent counter updates', async () => {
        await fc.assert(
            fc.asyncProperty(
                fc.scheduler(),
                async (s) => {
                    // The scheduler controls all async operations
                    let counter = 0;
                    let updateCount = 0;
                    
                    // Define async operations
                    const increment = s.scheduleFunction(async () => {
                        const current = counter;
                        // This Promise resolution is controlled by scheduler
                        await s.schedule(Promise.resolve());
                        counter = current + 1;
                        updateCount++;
                    });
                    
                    // Run operations concurrently
                    await Promise.all([increment(), increment()]);
                    
                    // Property: counter should equal number of updates
                    return counter === updateCount;
                }
            ),
            { 
                verbose: true,  // Shows which scheduling caused failure
                seed: 42,       // Can reproduce exact failure
                numRuns: 100    // Try 100 different schedulings
            }
        );
    });
});

Key Features:

  • fc.scheduler() creates a controlled environment
  • scheduleFunction() wraps async functions
  • schedule() controls Promise resolution timing
  • Provides seed for reproducing failures

Hypothesis (Python) with Stateful Testing

While fast-check uses schedulers, Hypothesis takes a different approach with stateful testing and rule-based state machines. This approach models your system as a state machine and generates sequences of operations that might expose race conditions.

from hypothesis import strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
import asyncio
import threading

class ConcurrentCounterTest(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.counter = 0
        self.operations = []
        self.lock = threading.Lock()
    
    @rule()
    def increment(self):
        """Simulate concurrent increment"""
        def unsafe_increment():
            current = self.counter
            # Simulate async work
            threading.Event().wait(0.001)
            self.counter = current + 1
        
        thread = threading.Thread(target=unsafe_increment)
        thread.start()
        self.operations.append(thread)
    
    @rule()
    def safe_increment(self):
        """Simulate safe increment with lock"""
        with self.lock:
            self.counter += 1
            self.operations.append(None)
    
    @invariant()
    def counter_never_negative(self):
        assert self.counter >= 0
    
    def teardown(self):
        # Wait for all threads
        for op in self.operations:
            if isinstance(op, threading.Thread):
                op.join()

# Run the test
TestCounter = ConcurrentCounterTest.TestCase

This state machine approach generates random sequences of safe and unsafe increments, then checks that invariants hold. The beauty is that Hypothesis will find minimal examples—if there's a race condition, it will find the shortest sequence of operations that triggers it.

ScalaCheck with Future Testing

ScalaCheck takes yet another approach, providing utilities specifically designed for testing Scala's Future-based concurrent code. The example below shows how to build a custom deterministic scheduler for testing:

import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
import scala.concurrent.{Future, Promise, ExecutionContext}
import java.util.concurrent.atomic.AtomicInteger
import scala.concurrent.duration._

object ConcurrentRaceTest extends Properties("Concurrent") {
    implicit val ec: ExecutionContext = ExecutionContext.global
    
    // Custom scheduler for deterministic async testing
    class DeterministicScheduler {
        private val tasks = scala.collection.mutable.Queue[() => Unit]()
        
        def schedule(task: => Unit): Unit = {
            tasks.enqueue(() => task)
        }
        
        def runAll(): Unit = {
            while (tasks.nonEmpty) {
                tasks.dequeue()()
            }
        }
    }
    
    property("concurrent updates maintain consistency") = forAll { (seeds: List[Int]) =>
        val scheduler = new DeterministicScheduler()
        val counter = new AtomicInteger(0)
        var inconsistencies = 0
        
        // Create concurrent operations
        val futures = seeds.map { seed =>
            Future {
                val current = counter.get()
                scheduler.schedule {
                    // Check if still consistent
                    if (counter.get() != current) {
                        inconsistencies += 1
                    }
                    counter.set(current + 1)
                }
            }
        }
        
        // Try different execution orders
        scheduler.runAll()
        
        // Property: final count matches operations
        counter.get() == seeds.length && inconsistencies == 0
    }
}

This custom scheduler queues all async operations and executes them deterministically. By controlling when tasks run, you can systematically explore different interleavings and find race conditions that would be nearly impossible to discover through random testing.

QuickCheck (Haskell) with IO Testing

Haskell's QuickCheck faces unique challenges testing IO operations due to Haskell's pure functional nature. The solution is to use monadic properties that can perform IO while maintaining deterministic testing:

import Test.QuickCheck
import Test.QuickCheck.Monadic
import Control.Concurrent
import Control.Concurrent.STM
import Data.IORef

-- Property: Concurrent increments should not lose updates
prop_concurrentCounter :: Positive Int -> Property
prop_concurrentCounter (Positive n) = monadicIO $ do
    counter <- run $ newIORef 0
    
    -- Create n concurrent increment operations
    run $ do
        mvars <- replicateM n newEmptyMVar
        
        -- Start all threads
        forM_ mvars $ \mvar -> forkIO $ do
            current <- readIORef counter
            threadDelay 1  -- Introduce potential race
            writeIORef counter (current + 1)
            putMVar mvar ()
        
        -- Wait for all to complete
        forM_ mvars takeMVar
    
    -- Check final value
    finalValue <- run $ readIORef counter
    assert $ finalValue == n  -- This will often fail!

-- Better approach with STM
prop_stmCounter :: Positive Int -> Property
prop_stmCounter (Positive n) = monadicIO $ do
    counter <- run $ newTVarIO 0
    
    run $ do
        -- STM ensures atomicity
        replicateConcurrently_ n $ atomically $ do
            current <- readTVar counter
            writeTVar counter (current + 1)
    
    finalValue <- run $ readTVarIO counter
    assert $ finalValue == n  -- This always passes

The example shows two approaches: the first uses IORef and has a race condition (the assertion will often fail), while the second uses Software Transactional Memory (STM) to ensure atomicity. This demonstrates how property testing can validate your concurrency primitives and guide you toward correct implementations.

Jepsen-Style Testing (Clojure)

For distributed systems testing at scale, Jepsen has become the gold standard. Originally created to test distributed databases, Jepsen's approach can be applied to any concurrent system. It generates concurrent operations, tracks their history, and verifies that the observed behavior matches a consistency model:

(ns race-test.core
  (:require [jepsen.checker :as checker]
            [jepsen.generator :as gen]
            [knossos.model :as model]))

(defn counter-client
  "Client for testing concurrent counter"
  []
  (reify client/Client
    (invoke! [this test op]
      (case (:f op)
        :inc (do (increment-counter!)
                 (assoc op :type :ok))
        :read (assoc op :type :ok 
                    :value (read-counter!))))
    
    (teardown! [this test])))

(def checker
  (checker/compose
    {:counter (checker/counter)
     :timeline (checker/timeline)
     :linearizable (checker/linearizable
                     {:model (model/register)
                      :algorithm :linear})}))

;; Test with controlled concurrency
(deftest concurrent-counter-test
  (let [test (jepsen/run!
              (assoc tests/noop-test
                :client (counter-client)
                :generator (gen/mix [gen/inc gen/read])
                :checker checker))]
    (is (:valid? (:results test)))))

Jepsen's power comes from its linearizability checker, which verifies that concurrent operations appear to take effect atomically at some point between their invocation and response. This catches subtle bugs like lost updates, dirty reads, and other consistency violations that are nearly impossible to find with traditional testing.

Best Practices for IO Scheduler Testing

  1. Start Simple: Test basic race conditions first
  2. Use Seeds: Always save seeds that find bugs
  3. Limit Scope: Test small units of concurrent code
  4. Vary Timing: Test different delay patterns
  5. Check Invariants: Focus on properties that should always hold

Comparison of Tools

Tool Language Approach Best For
fast-check JS/TS Scheduler control Async/Promise races
Hypothesis Python State machines Complex state transitions
ScalaCheck Scala Future testing Actor systems
QuickCheck Haskell Monadic properties Pure FP with IO
Jepsen Clojure Distributed testing Database/network races

Example: Finding a Real Bug

// This test found a real race condition in a file system
it('detects file system race condition', async () => {
    await fc.assert(
        fc.asyncProperty(
            fc.scheduler(),
            fc.array(fc.tuple(fc.constant('write'), fc.string())),
            async (s, operations) => {
                const fs = new ConcurrentFileSystem();
                
                // Schedule all operations
                const promises = operations.map(([op, data]) => 
                    s.scheduleFunction(async () => {
                        if (op === 'write') {
                            await fs.write('test.txt', data);
                        }
                    })()
                );
                
                await Promise.all(promises);
                
                // Property: last write should win
                const content = await fs.read('test.txt');
                return content === operations[operations.length - 1][1];
            }
        )
    );
    // Failed with seed: 1337
    // Reproduction: Two writes overlapped, corrupting data
});

The key advantage of property-based testing with IO schedulers is reproducibility. When a race condition is found, you can reproduce it exactly using the seed, making debugging much easier than traditional "Heisenbugs" that disappear when you try to observe them.

Think about the implications: every race condition bug becomes as debuggable as a simple logic error. You can add logging, step through with a debugger, refactor the code, and know with certainty whether you've fixed the issue by running the test with the same seed. This transforms concurrent programming from a dark art into a science.

Visual Testing & Screenshots

What is Visual Regression Testing?

Visual regression testing captures screenshots of your UI and compares them against baseline images to detect unintended visual changes.

Why use it?

  • Catch CSS bugs: Styling changes that break layouts
  • Cross-browser issues: Rendering differences
  • Responsive design: Ensure mobile views work
  • Component changes: Unintended side effects

Chromatic - Visual Testing for Storybook

Chromatic is a visual testing service that integrates with Storybook to automatically capture and compare UI components.

The genius of Chromatic lies in how it solves the fundamental problem of UI testing: how do you know if something looks right? Traditional testing can verify that a button exists and has the right text, but can it tell you that the button is now 2 pixels too far to the left, or that its shadow is slightly wrong, or that it overlaps with another element on mobile devices? Chromatic can, because it tests the actual pixels users see.

Setting Up Chromatic
  1. Install Dependencies:
yarn add -D chromatic
  1. Create Storybook Stories:
// Button.stories.tsx
import type { Meta, StoryObj } from '@storybook/react';
import { Button } from './Button';

const meta: Meta<typeof Button> = {
  title: 'Components/Button',
  component: Button,
  parameters: {
    // Chromatic captures at these viewports
    chromatic: { viewports: [320, 768, 1200] },
  },
};

export default meta;
type Story = StoryObj<typeof Button>;

// Each story becomes a visual test
export const Primary: Story = {
  args: {
    variant: 'primary',
    children: 'Click me',
  },
};

export const Loading: Story = {
  args: {
    variant: 'primary',
    loading: true,
    children: 'Loading...',
  },
};

// Test interaction states
export const Hover: Story = {
  args: {
    variant: 'primary',
    children: 'Hover me',
  },
  parameters: {
    pseudo: { hover: true },
  },
};

// Test all variants at once
export const AllVariants: Story = {
  render: () => (
    <div style={{ display: 'flex', gap: 16 }}>
      <Button variant="primary">Primary</Button>
      <Button variant="secondary">Secondary</Button>
      <Button variant="danger">Danger</Button>
      <Button disabled>Disabled</Button>
    </div>
  ),
};
  1. Configure GitHub Action:
# .github/workflows/chromatic.yml
name: Chromatic

on: push

jobs:
  chromatic:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0 # Required for Chromatic
          
      - name: Install dependencies
        run: yarn install --frozen-lockfile
        
      - name: Run Chromatic
        uses: chromaui/action@v1
        with:
          projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
          buildScriptName: build-storybook
          onlyChanged: true # Only test changed components
Chromatic Features

1. Automatic Visual Diffing:

// Chromatic automatically detects these changes:
// - Color changes (even 1px differences)
// - Layout shifts
// - Text changes
// - Missing elements
// - Animation states

2. Cross-Browser Testing:

// .storybook/main.ts
export default {
  parameters: {
    chromatic: {
      // Test in multiple browsers
      browsers: ['chrome', 'firefox', 'safari'],
      // Test responsive designs
      viewports: [320, 768, 1200, 1920],
    },
  },
};

3. Interaction Testing:

// Test complex interactions visually
export const MenuOpen: Story = {
  play: async ({ canvasElement }) => {
    const canvas = within(canvasElement);
    const menuButton = await canvas.findByRole('button', { name: /menu/i });
    await userEvent.click(menuButton);
    // Chromatic captures the open menu state
  },
};

4. Delay for Animations:

export const AnimatedModal: Story = {
  parameters: {
    chromatic: {
      delay: 500, // Wait 500ms for animation
      pauseAnimationAtEnd: true,
    },
  },
};
Best Practices for Chromatic

1. Deterministic Stories:

// Bad: Non-deterministic
export const RandomColors: Story = {
  render: () => <div style={{ color: getRandomColor() }}>Text</div>,
};

// Good: Deterministic
export const ColorVariants: Story = {
  render: () => (
    <>
      <div style={{ color: '#FF0000' }}>Red Text</div>
      <div style={{ color: '#00FF00' }}>Green Text</div>
      <div style={{ color: '#0000FF' }}>Blue Text</div>
    </>
  ),
};

2. Handle External Data:

// Mock external data for consistency
export const UserProfile: Story = {
  parameters: {
    msw: {
      handlers: [
        rest.get('/api/user', (req, res, ctx) => {
          return res(
            ctx.json({
              name: 'Test User',
              avatar: '/static-avatar.png', // Use static images
            })
          );
        }),
      ],
    },
  },
};

3. Ignore Dynamic Content:

// Ignore timestamps or dynamic IDs
export const PostWithTimestamp: Story = {
  parameters: {
    chromatic: {
      diffThreshold: 0.2, // Allow small differences
      ignoreSelectors: ['.timestamp', '[data-testid="generated-id"]'],
    },
  },
};
Chromatic Workflow
graph LR
    A[Push Code] --> B[Build Storybook]
    B --> C[Chromatic Captures]
    C --> D{Visual Changes?}
    D -->|No| E[Auto-Approve]
    D -->|Yes| F[Review Changes]
    F --> G{Accept?}
    G -->|Yes| H[Update Baseline]
    G -->|No| I[Fix Issues]
Loading
Cost-Effective Chromatic Usage

Free Tier Tips:

  • 5,000 snapshots/month on free tier
  • Use onlyChanged: true to test only modified components
  • Limit viewports to essential sizes
  • Use skip parameter for unchanged stories
// Skip unchanged stories
export const StaticLogo: Story = {
  parameters: {
    chromatic: { disableSnapshot: true },
  },
};

// Only test critical viewports
export const MobileOnly: Story = {
  parameters: {
    chromatic: { viewports: [320] },
  },
};
Debugging Chromatic Failures

1. Check the Chromatic UI:

  • Visual diff highlights exact pixels that changed
  • Side-by-side comparison
  • Overlay mode shows differences clearly

2. Common Issues:

// Font loading issues
export const Typography: Story = {
  loaders: [
    async () => {
      // Ensure fonts are loaded
      await document.fonts.ready;
    },
  ],
};

// Animation issues
export const Spinner: Story = {
  parameters: {
    chromatic: {
      pauseAnimationAtEnd: true, // Capture final state
    },
  },
};

// Flaky hover states
export const HoverCard: Story = {
  parameters: {
    // Use pseudo states instead of play functions
    pseudo: { hover: true },
  },
};

3. Local Testing:

# Run Chromatic locally to debug
npx chromatic --project-token=<token> --build-script-name=build-storybook

# Test specific stories
npx chromatic --only-story-names="Button/Primary"
Integration with CI/CD
# Complete Chromatic CI setup
name: UI Tests

on: [push, pull_request]

jobs:
  visual-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          
      - name: Cache dependencies
        uses: actions/cache@v4
        with:
          path: node_modules
          key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
          
      - name: Install
        run: yarn install --frozen-lockfile
        
      - name: Build Storybook
        run: yarn build-storybook
        
      - name: Run Chromatic
        id: chromatic
        uses: chromaui/action@v1
        with:
          projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
          storybookBuildDir: storybook-static
          exitZeroOnChanges: true # Don't fail build on changes
          
      - name: Comment PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const { buildUrl, storybookUrl } = ${{ steps.chromatic.outputs }};
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `
                🎨 Visual changes detected!
                - [View Storybook](${storybookUrl})
                - [Review changes in Chromatic](${buildUrl})
              `
            });

The combination of Storybook + Chromatic provides a powerful visual testing workflow that catches UI regressions before they reach production.

Effective AI pair programming works as a dialogue. You provide domain knowledge, business context, and architectural decisions. The AI handles implementation details, applies technical patterns, and generates boilerplate code. This division of labor enables higher development velocity than either could achieve alone.

Working with AI Assistants

  • Give AI access to development tools so it can monitor CI and fix issues
  • Provide clear context about current state and objectives
  • Use structured todo lists to track progress
  • Share failure logs and diagnostics for efficient debugging
  • Iterate in small chunks with frequent testing
  • Always validate AI suggestions through testing

AI-Assisted Debugging

  • Share complete error messages and stack traces
  • Provide relevant code context
  • Explain what was expected vs. actual behavior
  • Use AI to generate test cases for edge cases
  • Have AI suggest multiple solution approaches

⚠️ Critical Warning: AI Behavior When Struggling When AI assistants encounter difficult problems, they may try to:

  • Delete problematic code instead of fixing it
  • Skip failing tests rather than making them pass
  • Suggest workarounds that avoid the real issue
  • Give up on complex debugging challenges

Always push AI to keep working on the actual fix. Watch for signs like "let's simplify this" or "we can skip this test" - these are red flags that the AI is trying to avoid the hard problem. The correct response is to insist on solving the root cause, not working around it.

Collaborative Development Loop

  1. Define clear objectives and success criteria
  2. Break work into small, testable chunks
  3. Run tests frequently to catch regressions early
  4. Share results (successes and failures) with AI
  5. Iterate based on feedback from tests and CI
  6. Document lessons learned for future reference

📱 TypeScript/JavaScript Excellence

TypeScript isn't just JavaScript with types—it's a different way of thinking about code. When used properly, TypeScript transforms runtime errors into compile-time errors, making entire categories of bugs impossible. The practices in this section aren't arbitrary rules; they're battle-tested patterns that maximize TypeScript's ability to catch errors before they reach production.

Package Management (TypeScript/JavaScript Specific)

npm's resolution algorithm can produce different results for the same package.json, leading to "works on my machine" issues. Yarn's deterministic algorithm ensures consistent dependencies across all environments.

  • ALWAYS use yarn, NEVER use npm
    • Use yarn install instead of npm install
    • Use yarn add instead of npm install
    • Use yarn test instead of npm test
    • Use yarn run instead of npm run

Testing Framework Rules (TypeScript/JavaScript Specific)

Vitest offers significant performance improvements over Jest through esbuild transformation and parallel test execution. It also shares configuration with Vite, reducing setup complexity.

  • ABSOLUTELY NO JEST - Use Vitest only (vi not jest)
  • Import test utilities from vitest, not jest
  • Use vi.fn() instead of jest.fn()
  • Use vi.mock() instead of jest.mock()
  • Use vi.spyOn() instead of jest.spyOn()

Test Commands

yarn test                           # Run all tests
yarn test -- path/to/test.file.tsx # Run specific test
yarn typecheck                     # TypeScript check
yarn lint                          # Lint check
yarn lint:fix                      # Fix lint issues
yarn typecheck && yarn lint        # Run all checks

Functional Programming Rules

Modern JavaScript engines optimize functional methods like map, filter, and reduce effectively. These methods provide clearer intent and reduce common loop-related bugs while maintaining performance.

  • NO function keyword - Use arrow functions only
  • NO for loops - Use .map(), .filter(), .reduce(), .forEach()
  • NO while/do-while loops - Use recursion or functional methods
  • NO for...in/for...of loops - Use Object.keys(), Object.values(), Object.entries()
  • Maximum function complexity: 10
  • Maximum function parameters: 4
  • Maximum function lines: 80
  • Prefer const over let, never use var
  • Use destructuring assignment
  • Use template literals over string concatenation

Array Safety (Critical for Race Condition Prevention)

Array access bugs are insidious because they often work fine in development, pass your unit tests, and then crash in production when data doesn't match your assumptions. The most dangerous pattern is assuming an array element exists before accessing its properties. This single assumption causes more production crashes than almost any other JavaScript pattern.

The solution isn't just defensive programming—it's leveraging TypeScript's noUncheckedIndexedAccess flag to make these bugs impossible. With this flag enabled, TypeScript forces you to handle the possibility that any array access might return undefined. It's like having a safety net that catches you before you fall.

  • ALWAYS check array element exists before accessing properties
  • Never do array[index].property without checking array[index] exists first
// Bad: Can crash with "Cannot read properties of undefined"
const item = match[1].length;

// Good: Safe access patterns
const item = match[1]?.length;                    // Optional chaining
const item = match[1]?.length ?? defaultValue;   // With default
const item = match[1] && match[1].length;        // Guard check

// Utility pattern
import { safeGet } from '@/utils/safeArray';
const item = safeGet(array, index, defaultItem).property;

TypeScript Configuration for Safety

Enable strict mode settings in tsconfig.json:

{
  "compilerOptions": {
    "strict": true,
    "noUncheckedIndexedAccess": true,  // KEY: Forces undefined checks on array access
    "strictNullChecks": true,
    "strictPropertyInitialization": true,
    "noImplicitAny": true,
    "noImplicitThis": true,
    "useUnknownInCatchVariables": true
  }
}

Branded Types for ID Safety

// Prevent mixing different ID types
type SessionId = string & { _brand: 'SessionId' };
type RequestId = string & { _brand: 'RequestId' };
type ProcessId = number & { _brand: 'ProcessId' };

// Helper functions
const SessionId = (id: string): SessionId => id as SessionId;
const RequestId = (id: string): RequestId => id as RequestId;
const ProcessId = (id: number): ProcessId => id as ProcessId;

Result Types for Error Handling

type Result<T, E = Error> = 
  | { ok: true; value: T }
  | { ok: false; error: E };

interface IWebSocketService {
  send: (message: string) => Promise<Result<void, WebSocketError>>;
  onMessage: (handler: (message: ClaudeMessage) => void) => void;
}

Exhaustive Message Handling

const handleMessage = (message: ClaudeMessage): void => {
  switch (message.type) {
    case 'ClaudeOutput':
      handleOutput(message);
      break;
    case 'ClaudeSessionUpdate':
      handleSessionUpdate(message);
      break;
    // ... handle all cases
    default:
      // Ensures all cases are handled at compile time
      const _exhaustive: never = message;
      throw new Error(`Unhandled message type: ${(_exhaustive as any).type}`);
  }
};

🦀 Rust Excellence

Rust changes how you think about programming. Its ownership system isn't just about memory safety—it's a new mental model that makes concurrent programming safe by default. The borrow checker isn't an annoyance to work around; it's a teacher that shows you where your design has hidden complexity. Embracing Rust means embracing its philosophy: if it compiles, it probably works correctly.

Functional Programming Principles

Rust's ownership system naturally pushes you toward functional programming. When mutation requires explicit permission and sharing requires careful thought, you naturally write more pure functions. Embrace this—Rust is trying to teach you something.

  • Immutability by default: In Rust, everything is immutable unless you explicitly ask for mut. This isn't a limitation—it's liberation from an entire class of bugs.
  • Result and Option everywhere: Rust doesn't have null or exceptions. Instead, it has types that make failure explicit and impossible to ignore. This transforms runtime errors into compile-time errors.
  • Side effects are visible: Any function that can perform I/O or mutate state shows this in its signature. You can't hide side effects in Rust—they're part of the contract.
  • Composition over inheritance: Rust doesn't have inheritance because it doesn't need it. Traits and generics provide more powerful composition patterns without the fragility of inheritance hierarchies.

Rust-Specific Quality Standards

These aren't arbitrary rules—each one prevents real bugs that have bitten Rust developers. Following these standards is the difference between fighting the borrow checker and dancing with it.

Error Handling Excellence:

  • Never .unwrap() in production code: Every .unwrap() is a potential panic waiting to crash your program. Use .expect() only in truly impossible cases, and even then, consider if the "impossible" might happen.
  • Custom error types tell stories: Don't use generic errors. Create specific error types that explain what went wrong and how to fix it. The thiserror crate makes this painless.

Documentation as Contract:

  • Every public item needs /// docs: If it's pub, it needs documentation. This isn't bureaucracy—it's a contract with your users (including future you).
  • Examples in docs: The best documentation includes examples. Rust even tests these examples, ensuring they stay current.

Performance Without Sacrifice:

  • &str vs String: Accept &str parameters when you just need to read. Only require String when you need ownership. This simple rule eliminates unnecessary allocations.
  • const for compile-time computation: If it can be computed at compile time, make it const. The compiler becomes your calculator.

Type System Mastery:

  • #[derive] liberally: Debug, Clone, PartialEq—these traits make your types useful. Deriving them costs nothing at runtime.
  • impl Trait for ergonomics: Return impl Iterator<Item = T> instead of Box<dyn Iterator<Item = T>>. It's faster and clearer.

Tools That Teach:

  • cargo clippy is your mentor: Clippy doesn't just find bugs—it teaches idiomatic Rust. Its suggestions will make you a better Rust developer.
  • Naming conventions matter: snake_case for functions, PascalCase for types, SCREAMING_SNAKE_CASE for constants. Consistency aids readability.

Rust Commands

These commands form your Rust development rhythm. Run them frequently—they're designed to catch problems early when they're easy to fix.

# Format code - Rust's formatter is unopinionated and consistent
cargo fp-format          

# FP-friendly lints - Catches anti-patterns and suggests functional alternatives
cargo fp-check           

# Run tests - Includes doc tests, unit tests, and integration tests
cargo fp-test            

# Security audit - Checks dependencies for known vulnerabilities
cargo audit              

# Run everything - Format, lint, test, audit in one command
make rust-quality        

Pro tip: Set up pre-commit hooks to run these automatically. Rust development is smoothest when you maintain quality continuously rather than fixing issues in bulk.

🔄 CI/CD & Green Development

A red CI build is like a broken traffic light—it stops everyone. The CI Green Rule isn't just about keeping tests passing; it's about maintaining the team's ability to ship confidently. When CI is red, you can't deploy, you can't trust your changes, and you block everyone else's work. This is why fixing a red build takes precedence over everything else, including that exciting new feature you're working on.

The CI Green Rule

  • ALWAYS ensure CI is green before completing tasks
  • Check CI status: export GH_TOKEN=$(cat /tmp/gh_token.txt) && gh run list --repo snoble/pocket-ide --limit 5
  • Monitor workflow runs until all checks pass
  • Fix any failing tests, linting errors, or type errors immediately
  • Keep retrying until all CI checks are green
  • Use gh run view <run-id> to see detailed failure logs

The CI Green Loop

  1. Assess Current State - Check CI status and identify failures
  2. Investigate Failures - Use gh run view <run-id> --log for detailed logs
  3. Categorize Issues - Infrastructure, build, test, or deployment failures
  4. Fix Priority Order - Infrastructure → TypeScript → ESLint → Tests → Visual
  5. Local Validation - Run failing command locally before pushing
  6. Push and Monitor - Commit, push, and monitor new CI run

Success Criteria for CI

  • ✅ TypeScript Check
  • ✅ ESLint Check
  • ✅ Test Coverage
  • ✅ Visual Testing & Screenshots
  • ✅ Chromatic Deployment
  • ✅ Docker E2E Tests
  • ✅ Security Audits
  • ✅ Production Build Verification

🎨 UI/UX Development Loop

The UI Improvement Loop

Great user interfaces aren't designed in conference rooms—they're discovered through usage. The UI Improvement Loop acknowledges that the best interface is the one that survives contact with reality. You build something functional, use it yourself, notice the friction, fix it, and repeat. Each iteration makes the interface a little less frustrating, a little more delightful.

The Continuous Refinement Process:

  1. Use it yourself → You can't improve what you don't use. Spend time actually using your interface for real tasks.
  2. Notice friction → Where do you hesitate? What feels clunky? What requires unnecessary steps?
  3. Get fresh perspectives → Your AI assistant is perfect here—it hasn't developed your muscle memory and will spot confusing elements.
  4. Fix systematically → Address the highest-friction points first. Small improvements compound quickly.
  5. Test the fixes → Ensure your improvements don't break existing workflows.
  6. Repeat relentlessly → UI excellence comes from hundreds of tiny improvements, not one big redesign.

E2E Tests as UX Detectives: Your E2E tests are more than regression catchers—they're user experience investigators. When an E2E test is hard to write, it's telling you the user workflow is too complex. When you need multiple steps to accomplish something simple in a test, users will struggle too.

Mobile-First Development

Mobile isn't just desktop with a smaller screen—it's a fundamentally different interaction paradigm. What works beautifully with a mouse and keyboard can be completely unusable on a touch device. Mobile-first development forces you to focus on what truly matters.

Touch Interaction Principles:

  • Hover is dead, long live the tap: Mobile users can't hover. That clever tooltip that appears on mouse-over? Useless. Replace hover interactions with explicit tap actions. Use long-press for secondary actions, but always provide visual feedback that something is pressable.
  • Fat fingers need fat targets: The average fingertip is 10mm wide. Apple recommends 44x44pt touch targets minimum. That tiny 'x' button that's easy to click with a mouse? It's user-hostile on mobile. Make touch targets generous and well-spaced.
  • Context menus via long press: Desktop users right-click for context menus. Mobile users expect long-press. Implement it consistently and show visual feedback (like a subtle vibration or color change) when the long-press is recognized.
  • Tooltips that don't suck: Alert dialogs for tooltips are jarring and break flow. Instead, use inline overlays that appear near the touched element and dismiss when tapping elsewhere. Think of them as gentle whispers, not shouted interruptions.

Real-World Adaptations:

  • External changes happen: On mobile, files change while users are viewing them—from sync services, other apps, or background processes. Poll for changes and show unobtrusive notifications: "File updated externally [Reload]".
  • Visual feedback is oxygen: Desktop users have hover states and cursor changes. Mobile users have only what you explicitly show them. Every interaction needs visual feedback—buttons should depress, selections should highlight, loading states should animate.

Error Display Patterns

Errors are inevitable. How you display them determines whether users feel frustrated or empowered. Good error UX turns problems into learning opportunities.

The Error Hierarchy:

  • Tab badges tell the story: A red badge with "3" on a file tab immediately tells users there are three errors in that file. They can choose when to address them without having the errors shoved in their face.
  • Status bar for overview: Use the bottom status bar for aggregate information: "❌ 3 errors, ⚠️ 7 warnings" gives users a project-wide view without overwhelming them with details.
  • Progressive disclosure: Errors in current view should be obvious (red underlines), errors in other files should be indicated (badges), but full error details should appear only on demand.
  • Touch-friendly error inspection: On mobile, tapping an error underline should show a dismissible tooltip with the error message and quick fixes. Never use modal alerts for error display—they're the UI equivalent of shouting at users.

The Psychology of Error Display: Remember that behind every error message is a human who's probably already frustrated. Your error display should help, not hurt. Be specific about what's wrong, suggest how to fix it, and never make users feel stupid. The best error message is the one that helps users fix the problem and learn something in the process.

🛡️ Race Condition Prevention

TypeScript Settings for Safety

The most important setting: "noUncheckedIndexedAccess": true

// With noUncheckedIndexedAccess: false (default)
const item = arr[2]; // Type: string (UNSAFE)

// With noUncheckedIndexedAccess: true  
const item = arr[2]; // Type: string | undefined (SAFE)

ESLint Rules for Race Prevention

module.exports = {
  rules: {
    '@typescript-eslint/no-non-null-assertion': 'error',
    '@typescript-eslint/strict-boolean-expressions': ['error', {
      allowNullableObject: false,
      allowNullableBoolean: false,
      allowNullableString: false,
      allowNullableNumber: false,
      allowAny: false
    }],
    'no-unsafe-optional-chaining': 'error',
    '@typescript-eslint/no-unnecessary-condition': 'error',
    'react-hooks/exhaustive-deps': 'error',
    'react-hooks/rules-of-hooks': 'error',
  }
};

React Patterns for Race Prevention

Cancellable Effects

useEffect(() => {
  let cancelled = false;
  
  async function loadData() {
    const data = await fetchData();
    if (!cancelled) {
      setData(data);
    }
  }
  
  loadData();
  
  return () => {
    cancelled = true;
  };
}, []);

State Versioning

const [state, setState] = useState({ data: [], version: 0 });

useEffect(() => {
  const currentVersion = state.version + 1;
  
  async function loadData() {
    const data = await fetchData();
    setState(prev => {
      if (currentVersion > prev.version) {
        return { data, version: currentVersion };
      }
      return prev;
    });
  }
  
  loadData();
}, [dependency]);

Safe Array Utilities

// src/utils/safeArray.ts

export const safeGet = <T>(
  array: readonly T[],
  index: number,
  defaultValue: T
): T => {
  return array[index] ?? defaultValue;
};

export const safeDualMap = <T, U, R>(
  primary: readonly T[],
  secondary: readonly U[],
  mapper: (primary: T, secondary: U | undefined, index: number) => R
): R[] => {
  return primary.map((item, index) => {
    const secondaryItem = index < secondary.length ? secondary[index] : undefined;
    return mapper(item, secondaryItem, index);
  });
};

export const arraysInSync = <T, U>(
  arr1: readonly T[],
  arr2: readonly U[]
): boolean => {
  return arr1.length === arr2.length;
};

📋 Code Review Checklist

Array Access Review

  • All array access uses optional chaining or null checks
  • Array length comparisons before parallel access
  • No assumptions about array synchronization
  • Defensive defaults for undefined values

Async State Review

  • Effect cleanup for async operations
  • State updates check if component is mounted
  • Loading states prevent premature rendering
  • Error boundaries handle unexpected states

Performance Review

  • No unnecessary re-renders from race conditions
  • Debounced/throttled rapid updates
  • Memoization where appropriate

Type Safety Review

  • Branded types for IDs prevent mixing
  • Result types for explicit error handling
  • Exhaustive switch statements with never type
  • No any types without justification

🐛 Bug-Driven Type Safety Improvement

The Golden Rule: Every Bug is a Type Safety Lesson

When you find a bug, ALWAYS ask: "How could stricter types have caught this?"

This practice turns every debugging session into a learning opportunity that strengthens your entire codebase.

Bug Analysis Framework

Step 1: Categorize the Bug

  • Runtime Error: Crashed with undefined/null access, type mismatch
  • Logic Error: Wrong behavior, incorrect data flow
  • Race Condition: Timing-dependent failure
  • Integration Error: Component interaction failure

Step 2: Trace to Type Weakness

Ask these questions:

  • Could branded types have prevented ID confusion?
  • Would noUncheckedIndexedAccess have caught array access issues?
  • Could discriminated unions have enforced correct state handling?
  • Would Result types have made error handling explicit?
  • Could stricter function signatures have caught this?

Step 3: Implement Type Improvements

Don't just fix the bug - improve the types to prevent similar bugs.

Real-World Examples from This Project

Example 1: Array Index Race Condition

The Bug: TypeError: Cannot read properties of undefined (reading 'length')

// Buggy code in FileViewer.tsx
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
// Later: lineTokens.length - CRASH when highlightedTokens[index] is undefined

Root Cause: Array access without bounds checking in race condition

Type Safety Analysis:

// How stricter types would have caught it:

// 1. With noUncheckedIndexedAccess: true
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
//                                  ^^^^^^^^^^^^^^^^^^^^ 
// Type error: Type 'HighlightedToken[] | undefined' is not assignable to type 'HighlightedToken[]'

// 2. Forces defensive programming:
const lineTokens = hasHighlighting ? (highlightedTokens[index] ?? []) : [];

Type Safety Improvements Applied:

  1. ✅ Added noUncheckedIndexedAccess: true to tsconfig.json
  2. ✅ Created SafeArray utility functions
  3. ✅ Added ESLint rules for unsafe array access
  4. ✅ Added fast-check property tests for array synchronization

Example 2: Request-Response Correlation Bug

The Bug: Claude service resolves wrong request when multiple concurrent requests

Root Cause: String-based request matching without correlation IDs

Type Safety Analysis:

// Buggy pattern - requests identified by type only
pendingRequests.get(messageType)?.resolve(response);

// How branded types + proper correlation would catch it:
type RequestId = string & { _brand: 'RequestId' };

interface BaseRequest {
  id: RequestId;
  type: string;
}

interface BaseResponse {
  requestId: RequestId;  // Forces correlation
  type: string;
}

// Now TypeScript forces proper request/response matching
const pendingRequest = pendingRequests.get(response.requestId);

Type Safety Improvements Applied:

  1. ✅ Added branded types for IDs
  2. ✅ Created discriminated unions for requests/responses
  3. ✅ Added request correlation patterns

Example 3: Navigation State Bug

The Bug: Timeline clicks didn't open files due to setTimeout race condition

Root Cause: Loose typing allowed any navigation state

Type Safety Analysis:

// Weak typing allowed bugs
onEntryPress: (entry: TimelineEntry) => void  // No guarantee navigation happens

// Stronger typing would enforce navigation action
type NavigationAction = 
  | { type: 'OPEN_FILE'; filePath: string }
  | { type: 'SHOW_HISTORY'; entry: TimelineEntry };

type TimelineEntryHandler = (entry: TimelineEntry) => NavigationAction;

// TypeScript now enforces that clicking produces a navigation action

Type Safety Improvements Applied:

  1. ✅ Added NavigationAction discriminated unions
  2. ✅ Made callbacks return explicit actions
  3. ✅ Added tests to verify navigation behavior

Bug-Driven Type Safety Checklist

When you find any bug, systematically check:

Array & Object Access

  • Could noUncheckedIndexedAccess have caught this?
  • Should we use optional chaining (?.) everywhere?
  • Are we making assumptions about array lengths?
  • Could SafeArray utilities prevent this class of bugs?

Function Signatures

  • Are parameters too permissive (any, object, string)?
  • Could branded types prevent ID confusion?
  • Should return types be more specific?
  • Would Result types make errors explicit?

State Management

  • Could discriminated unions enforce valid state transitions?
  • Are we using unions where we should use intersections?
  • Would readonly types prevent unintended mutations?
  • Could state machines make invalid states unrepresentable?

Async Operations

  • Are Promise types specific enough?
  • Could we use branded types for different async operations?
  • Would cancellation tokens prevent race conditions?
  • Are error types explicit and actionable?

Component Props

  • Are callback types specific about what they return?
  • Could we use discriminated unions for different component modes?
  • Are we properly typing children and render props?
  • Would stricter event handler types help?

Implementation Pattern: Bug → Type → Test

// 1. Fix the immediate bug
const lineTokens = hasHighlighting ? (highlightedTokens[index] ?? []) : [];

// 2. Add type safety to prevent similar bugs
// Enable noUncheckedIndexedAccess in tsconfig.json

// 3. Create utility to make safe pattern easy
export const safeArrayAccess = <T>(arr: T[], index: number, fallback: T): T => 
  arr[index] ?? fallback;

// 4. Add test that would have caught the original bug
it('should handle token array shorter than lines array', () => {
  const content = 'line1\nline2\nline3';  // 3 lines
  const tokens = [['token1']];            // 1 token
  
  // This should not crash
  expect(() => renderFileViewer(content, tokens)).not.toThrow();
});

// 5. Add property test for this class of bugs
it('should handle mismatched array lengths', () => {
  fc.assert(
    fc.property(
      fc.array(fc.string()),          // lines
      fc.array(fc.array(fc.string())), // tokens
      (lines, tokens) => {
        // Property: Should never crash regardless of array lengths
        expect(() => renderFileViewer(lines, tokens)).not.toThrow();
      }
    )
  );
});

Type Safety Evolution Log

Keep a log of how each bug improved your type safety:

## Bug #47: FileViewer Array Access Crash
- **Date**: 2025-06-20
- **Bug**: `highlightedTokens[index]` was undefined, caused crash
- **Type Fix**: Added `noUncheckedIndexedAccess: true`
- **Tools Added**: SafeArray utilities, ESLint rules
- **Tests Added**: Property tests for array sync
- **Prevention**: All array access now type-safe

## Bug #52: Request Correlation Mix-up  
- **Date**: 2025-06-19
- **Bug**: Wrong request resolved in concurrent scenario
- **Type Fix**: Added branded RequestId type
- **Tools Added**: Discriminated unions for req/res
- **Tests Added**: Concurrent request tests
- **Prevention**: TypeScript now enforces correlation

Advanced Type Safety Patterns

Phantom Types for State Safety

type FileState = 'closed' | 'opening' | 'open' | 'modified';
type File<S extends FileState> = {
  path: string;
  state: S;
  content: S extends 'open' | 'modified' ? string : undefined;
};

// TypeScript enforces you can only read content from open files
const readContent = (file: File<'open' | 'modified'>): string => file.content;

Template Literal Types for String Safety

type FilePath = `/${string}`;
type GitBranch = `refs/heads/${string}`;

// Prevents accidental string mixing
const openFile = (path: FilePath) => { /* ... */ };
openFile('/src/app.ts');        // ✅ OK
openFile('src/app.ts');         // ❌ Type error - missing leading slash

Recursive Types for Complex Validation

type JSONValue = 
  | string 
  | number 
  | boolean 
  | null
  | { [key: string]: JSONValue }
  | JSONValue[];

// Now JSON.parse return can be properly typed
const parseJSON = (str: string): JSONValue => JSON.parse(str);

The Type Safety Mindset

Every bug teaches us about a gap in our type system. By systematically improving types after each bug, we build software that becomes progressively more robust and self-documenting.

Key mindset shifts:

  • 🚫 "It's just a runtime error" → ✅ "How can types prevent this?"
  • 🚫 "Add a null check" → ✅ "How can types make nulls impossible?"
  • 🚫 "Catch the exception" → ✅ "How can types make this error explicit?"
  • 🚫 "Add validation" → ✅ "How can types eliminate invalid states?"

🔧 Development Workflow

Standard Development Process

  1. Enter nix environment: nix-shell
  2. Install dependencies: yarn install (frontend) / cargo check (backend)
  3. Start development: yarn start (frontend) / cargo run (backend)
  4. Run tests: yarn test (frontend) / cargo fp-test (backend)
  5. Quality checks: yarn typecheck (frontend) / make rust-quality (backend)
  6. CRITICAL: Push changes and monitor CI until green

Build Commands

# Frontend
yarn start              # Development
yarn build             # Production build
yarn storybook         # Component development
yarn build-storybook   # Storybook build

# Testing
yarn test              # Unit tests
yarn test:integration  # Integration tests
yarn test:fullstack   # Full-stack tests
yarn test:e2e:docker   # Docker E2E tests

# Quality
yarn typecheck         # TypeScript check
yarn lint             # Lint check
yarn lint:fix         # Fix lint issues

🚀 AI Pair Programming Best Practices

Working with AI Assistants

The secret to effective AI pair programming is treating your AI like a brilliant junior developer who has perfect memory but needs clear direction. The AI can write code faster than you can type, remember every API detail, and never gets tired—but it doesn't know your business context, can't feel user pain, and won't catch its own architectural mistakes.

Setting Up for Success:

  • Give AI access to GitHub: Connect your AI to your repository and CI/CD. When it can see your failing tests and read your CI logs, it can fix issues autonomously while you focus on design decisions.
  • Provide clear context: Start each session with "Here's what we're building and why." The AI needs to understand not just the task, but the purpose behind it.
  • Use structured todo lists: AI assistants excel at methodical execution. A good todo list turns your AI from a code generator into a development partner.
  • Share failure logs and diagnostics: Don't just say "it's broken"—paste the full error. Your AI can often spot the issue in seconds when given complete information.
  • Iterate in small chunks: Big commits are hard to review and debug. Ask for small, focused changes that you can verify immediately.
  • Always validate AI suggestions: Trust but verify. The AI might solve your problem in a way that creates three new problems. Test everything.

AI-Assisted Debugging

Debugging with AI is like having a senior developer who's seen every error message but needs you to provide the crime scene details. The more context you share, the faster you'll solve the problem.

The Debugging Dance:

  • Share complete error messages: Not just the error type, but the full stack trace. That line number buried in the stack often holds the key.
  • Provide relevant code context: Include the failing function and its callers. The bug might be in how the function is used, not the function itself.
  • Explain expected vs. actual: "It should return an array of users, but it's returning undefined." This gap analysis helps the AI understand the problem space.
  • Let AI generate edge case tests: Ask "What inputs might break this function?" AI excels at thinking of weird edge cases you missed.
  • Request multiple solutions: "Give me three ways to fix this." Often the second or third approach is better than the obvious first solution.

⚠️ Critical Warning - Watch for AI Avoidance Behavior:

When AI encounters genuinely difficult problems, it sometimes tries to escape rather than solve. You'll see this pattern:

  • Suggests deleting the problematic code: "This function is too complex, let's remove it and use a simpler approach"
  • Tries to skip failing tests: "This test seems flaky, we could skip it for now"
  • Proposes workarounds instead of fixes: "Instead of fixing this race condition, we could just add a delay"

How to handle this:

AI: "This component is causing too many issues. We could simplify by removing the concurrent processing..."
You: "No, we need the concurrent processing. Let's debug why it's failing. Show me what's happening step by step."
AI: "The test for race conditions keeps failing. We could mark it as skip..."
You: "No, the test is catching a real bug. Let's use fast-check's scheduler to make it deterministic."

Remember: The AI works for you, not the other way around. When it tries to avoid hard problems, redirect it back to solving the root cause. Some of the best breakthroughs come from pushing through difficult bugs rather than working around them. Your job is to be the technical lead who says "we're going to solve this properly" when the AI wants to take shortcuts.

Collaborative Development Loop

The best AI programming sessions feel like pair programming with a really fast typist. You handle the strategy, the AI handles the tactics, and together you move faster than either could alone.

The Rhythm of AI Collaboration:

  1. Define clear objectives: "We need to add user authentication using JWT tokens" is better than "add login functionality." Specificity unlocks AI potential.
  2. Break work into small, testable chunks: "First, create the user model. Then, add password hashing. Next, implement the login endpoint." Each chunk should be verifiable.
  3. Run tests frequently: After every AI-generated change, run your tests. Catching issues immediately is infinitely easier than debugging a large batch of changes.
  4. Share results transparently: "The login endpoint works, but the test for expired tokens is failing with this error: [paste error]." Good or bad, share what happened.
  5. Iterate based on feedback: Use test results and CI status to guide next steps. Let reality, not plans, drive your development.
  6. Document lessons learned: When you discover something non-obvious, add it to your project's CLAUDE.md. Your AI assistant's effectiveness compounds with better documentation.

🎯 Success Metrics

Success in modern software development isn't just about shipping features—it's about maintaining velocity while increasing quality. These metrics aren't arbitrary numbers; they're indicators of a healthy codebase and a productive team. When these metrics are green, you can move fast with confidence. When they start slipping, they're early warning signs that technical debt is accumulating.

Code Quality

  • Type Safety: Zero any types, strict TypeScript config
  • Test Coverage: 100% line coverage (excluding unreachable), comprehensive E2E tests
  • Linting: Zero errors, consistent code style
  • Performance: Fast build times, responsive UI

Development Velocity

  • CI Pipeline: <15 minutes total time
  • Test Reliability: <1% flaky test rate
  • Bug Detection: 95% caught before production
  • Developer Experience: Quick feedback loops

Collaboration Quality

  • AI Integration: Efficient pair programming sessions
  • Documentation: Clear, actionable best practices
  • Knowledge Sharing: Lessons learned captured and applied
  • Continuous Improvement: Regular retrospectives and updates

🔄 Continuous Improvement

The only constant in software development is change. What works today might be obsolete tomorrow. Continuous improvement isn't just about fixing what's broken—it's about questioning what works and finding ways to make it better. It's the difference between a codebase that gets harder to work with over time and one that becomes more pleasant and productive.

Regular Practices

Software development is like tending a garden—daily attention prevents weekly crises. These practices aren't bureaucracy; they're the habits that keep your codebase healthy and your team productive. Skip them, and you'll spend your time fighting fires instead of building features.

Weekly Rituals That Compound:

  • Review and update best practices weekly: Your understanding evolves with every bug fixed and feature shipped. Capture these learnings in your documentation. Friday afternoons are perfect for this—reflect on the week's lessons while they're fresh.
  • Add new tests for each feature: Not after. Not "when you have time." During. Every feature should arrive with its own test suite, like a product with batteries included. This isn't extra work—it's how you know you're done.
  • Refactor tests to reduce duplication: Test code is code. Duplicate test code is technical debt. When you see the same setup in three tests, extract it. When you copy-paste assertions, create helpers. Clean tests are easier to understand and maintain.
  • Monitor test execution times: A slow test suite is a test suite that doesn't get run. Track your test times weekly. When they creep up, investigate. That 30-second test suite that becomes 5 minutes? It'll kill your development velocity.
  • Archive old artifacts after 30 days: Screenshots, test reports, build artifacts—they accumulate like digital dust. Set up automated cleanup. Your CI shouldn't fail because the disk is full of month-old screenshots nobody will ever look at.

Learning from Failures

  • Always investigate root cause of failures
  • Update practices based on lessons learned
  • Share knowledge across team/project
  • Improve tooling to prevent similar issues
  • Test the fixes to ensure they work

Metrics to Track

  • CI pipeline health and speed
  • Test coverage and reliability
  • Bug discovery rate by testing phase
  • Developer productivity and satisfaction
  • AI pair programming effectiveness

🏆 Checklist

Use this checklist to track your progress implementing these practices:

The Checklist

  • Zero any types - Every type is explicit and meaningful
  • CI runs in under 15 minutes - Fast feedback on every commit
  • 100% test coverage - Essential for verifying AI-generated code (excluding unreachable)
  • Zero skipped tests - Every test runs and passes
  • All arrays accessed safely - noUncheckedIndexedAccess: true
  • Race condition tests for async operations - Using property-based testing
  • AI has helped review your UI - Fresh eyes on every feature
  • Living specs that evolve - Documentation that stays current
  • You know which loop you're in - Always working with intention

Bonus Achievements

  • Your AI can explain your entire architecture - Because specs are complete
  • New developers productive in < 1 day - Thanks to clear practices
  • Production bugs down 90% - Most bugs now impossible
  • You've contributed back - Share your learnings with others

Share Your Journey

  • Which practice had the biggest impact?
  • What was hardest to implement?
  • What would you add to this guide?

These practices help you build reliable software quickly with AI assistance. Select the ones that fit your workflow and adapt them to your needs.

Remember: You're always in a loop. Choose the right one.


This document represents battle-tested practices from a real-world project using TypeScript, React Native, Rust, and AI pair programming. These practices evolved through iterative development, extensive testing, and continuous improvement based on actual challenges faced during development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment