This document contains practices specifically designed for AI-first development. Browse through it, find the sections that resonate with your workflow, and copy them into your repository for your AI assistant to follow.
When you're vibe coding, you're moving fast with AI assistance. The key to maintaining quality at speed is choosing the right feedback loop for your current situation.
Many developers struggle because they're in the wrong loop at the wrong time. They're optimizing performance when CI is broken. They're polishing UI when users can't log in. They're adding features when existing bugs are driving users away.
This document describes 9 different feedback loops, each designed for a specific situation. Understanding when to use each one helps maintain momentum without sacrificing quality.
- 🔄 9 Feedback Loops: From 5-minute CI fixes to 90-minute performance deep dives
- 🤖 AI-First Development: Practices designed for working with AI pair programmers
- 🏃♂️ Loop Transitions: When to switch from one loop to another
- 🔬 Race Condition Testing: Tools for making non-deterministic bugs reproducible
- 🛡️ Type Safety: TypeScript settings that prevent entire categories of bugs
Jump Based on Your Current Situation:
- Build is red? → CI Green Loop (5-15 min)
- Users complaining? → Bug Investigation Loop (15-45 min)
- Feature request? → User Story Loop (30-60 min)
- App feels slow? → Performance Loop (45-90 min)
- Looking for general improvements? → Browse the practices that fit your workflow
- 🔄 Feedback Iteration Loops - Core development patterns
- 💡 Key Concepts - AI as test user, reproducible testing
- ⚡ Quick Wins - Simple improvements you can make today
- 🔄 CI/CD & Green Development
- 🚀 AI Pair Programming Best Practices
- 🎯 Success Metrics
- 🔄 Continuous Improvement
Modern development is about choosing the right feedback loop for your current objective. Here are the key loops to master:
When vibe coding with AI, feedback loops become even more critical. Each loop serves a different purpose and operates on a different timescale. The skill is recognizing which loop you need to be in right now and executing it efficiently.
Think of these loops as different lenses through which to view your codebase. Sometimes you need the microscope of the CI Green Loop to fix immediate breakages. Other times you need the telescope of the User Story Loop to see where you're headed. The key is knowing when to switch lenses and having the discipline to complete each loop before moving to the next.
Goal: Get all CI checks passing and keep them green Cycle Time: 5-15 minutes per iteration
When CI is red, development stalls. A broken build blocks deployments and creates cascading problems. In vibe coding, where you're moving fast with AI assistance, maintaining green CI is essential for confidence in your changes.
- Check Status → 2. Identify Failures → 3. Fix Locally → 4. Push & Monitor → 5. Repeat
- ✅ All CI checks green within 15 minutes
- ✅ Zero flaky tests
- ✅ Fast feedback on every commit
- After making any code changes
- Before merging PRs
- When onboarding new team members
- Daily health checks
Goal: Build features that actually solve user problems Cycle Time: 30-60 minutes per iteration
The User Story Loop helps ensure you're building features that solve real problems. When vibe coding, it's easy to get caught up in technical solutions. This loop starts with the user's actual need and validates that your solution works for them.
AI's Dual Role: Your AI assistant plays both user and developer here. First, it helps generate realistic user stories from a user's perspective. Then it switches to developer mode to help implement them.
- AI Generates User Story → 2. Create E2E Test → 3. Watch Test Fail → 4. Build UI → 5. Test Passes → 6. AI Reviews as User → 7. Iterate
// Example: AI as user, then coder
Human: "We need a way to see file changes"
AI (as user): "As a developer reviewing code, I want to see which
files have uncommitted changes so I can quickly navigate to
files I'm actively working on"
Human: "How would you test this?"
AI (as coder): "Create an E2E test that modifies a file, then
verifies the UI shows an indicator next to that file"
// Later...
Human: "Here's the implementation" [screenshot]
AI (as user): "The orange dot is too subtle. On a bright screen
I might miss it. Consider adding a text badge with count"
- ✅ Tests accurately represent user workflows
- ✅ UI solves the actual user problem
- ✅ Stories evolve based on UI insights
- Building new features
- Improving existing workflows
- Fixing UX issues
- Understanding user needs
Goal: Not just fix bugs, but prevent similar bugs systematically Cycle Time: 15-45 minutes per iteration
The Bug Investigation Loop turns debugging into systematic problem-solving. Instead of making random changes, you follow the evidence, form hypotheses, and test them methodically. This is especially important in vibe coding where you need to understand code you didn't personally write.
- Reproduce Bug → 2. Trace Root Cause → 3. Fix Immediate Issue → 4. Improve Safety → 5. Add Tests → 6. Document Learning
- ✅ Bug fixed and prevention measures added
- ✅ Type/safety system strengthened
- ✅ Tests prevent regression
- Any time you encounter a bug
- Code review findings
- Production issues
- Improving code quality
Goal: Identify bottlenecks and improve them systematically Cycle Time: 45-90 minutes per iteration
The Performance Loop emphasizes measurement over guesswork. Profile first, identify actual bottlenecks, fix them systematically, and measure the impact. This prevents wasting time optimizing code that isn't actually slow.
- Measure Baseline → 2. Identify Bottleneck → 3. Hypothesize Fix → 4. Implement → 5. Measure Impact → 6. Repeat or Revert
- ✅ Measurable performance improvements
- ✅ User-perceived speed increases
- ✅ Resource usage reduction
- App feels slow
- CI takes too long
- Memory usage high
- Resource constraints
Goal: Make interfaces intuitive and delightful Cycle Time: 20-40 minutes per iteration
The UI Polish Loop is about discovering better interfaces through use. You implement something functional, use it yourself, notice friction points, and improve them iteratively.
Useful technique: Your AI assistant can act as a fresh-eyes test user. Show it screenshots or describe workflows. The AI hasn't developed your muscle memory, so it can spot confusing elements you've gotten used to.
- Use the Feature → 2. Note Friction → 3. Get AI Feedback → 4. Design Improvement → 5. Implement → 6. Iterate
// Example AI feedback session:
Human: "Here's our new file upload UI" [screenshot]
AI: "I see three potential friction points:
1. The drop zone isn't visually distinct from the rest of the page
2. There's no indication of supported file types until after an error
3. The upload progress shows percentage but not time remaining"
Human: "Which would frustrate you most as a user?"
AI: "Not knowing supported file types upfront - users will waste time
trying to upload invalid files"
- ✅ Reduced friction in common workflows
- ✅ Positive user feedback
- ✅ Fewer support questions
- After core functionality works
- Based on user feedback
- Improving daily-use features
- Making complex features accessible
Goal: Build robust features with comprehensive test coverage Cycle Time: 20-30 minutes per iteration
The Test-Driven Feature Loop uses tests as a design tool. Writing tests first helps you think about interfaces before implementations, naturally leading to more modular code. This is particularly valuable in vibe coding where you need clear specifications for your AI assistant.
- Red (Write failing test) → 2. Green (Make it pass) → 3. Refactor (Improve code) → 4. Repeat
- ✅ 100% test coverage (excluding unreachable)
- ✅ Tests guide design decisions
- ✅ Refactoring is safe and fast
- Building new features
- Fixing complex bugs
- Refactoring existing code
- Learning new APIs
Goal: Maintain high code standards and prevent technical debt Cycle Time: 10-20 minutes per iteration
The Code Quality Loop helps prevent technical debt accumulation. By frequently running quality checks and fixing issues immediately, you prevent small problems from becoming larger ones. This is crucial when vibe coding at high velocity.
- Run Quality Checks → 2. Fix Issues → 3. Improve Tooling → 4. Repeat
- ✅ Zero linting errors
- ✅ All types strict
- ✅ No security vulnerabilities
- Before every commit
- Code review preparation
- Scheduled maintenance
- Onboarding new contributors
Choose your loop based on current priorities:
- CI Green Loop - Always maintain green CI
- Bug Investigation Loop - Fix issues systematically
- User Story Loop - Build what users actually need
- Performance Loop - When app feels slow
- UI Polish Loop - Improve daily-use features
- Mobile UX Loop - Ensure cross-platform quality
- Test-Driven Loop - When building complex features
- Code Quality Loop - Regular maintenance
- Bug Investigation Loop - Turn bugs into type improvements
Goal: Make your feedback loops faster and more effective Cycle Time: Weekly reflection
The Meta-Loop is about improving your process itself. Weekly reflection on which loops are working and which need adjustment helps optimize your development flow.
- Which loops are taking too long?
- Where are the bottlenecks?
- What tools could speed things up?
- Are we measuring the right things?
- How can AI assist in these loops?
- Automate routine checks (pre-commit hooks, CI gates)
- Improve tooling (faster builds, better error messages)
- Share knowledge (document successful patterns)
- Optimize environment (powerful hardware, good network)
Remember: The goal isn't to be in every loop all the time, but to choose the right loop for your current objective and execute it efficiently.
// Before: Bugs are possible
const item = items[index].name; // 💥 Crashes if index out of bounds
// After: Bugs are impossible
// With noUncheckedIndexedAccess: true in tsconfig.json
const item = items[index]?.name; // TypeScript FORCES you to handle undefined
💡 Note: The TypeScript flag
noUncheckedIndexedAccess
was controversial because it requires handling undefined cases, but it prevents an entire category of runtime errors.
// This test WILL find your race condition and give you a seed to reproduce it
it('detects race conditions deterministically', async () => {
await fc.assert(
fc.asyncProperty(fc.scheduler(), async (s) => {
// Your async operations here
// Fast-check will try EVERY possible execution order
})
);
// Output: "Failed with seed: 1337" - now you can debug deterministically!
});
With deterministic testing, race conditions become as debuggable as simple logic errors. You can reproduce the exact failure case consistently.
Your AI isn't just a coder—it's your always-available test user who brings fresh eyes to your UI whenever you need them. This transforms how you think about UI development.
// In the UI Polish Loop:
Human: "Look at this screenshot of our new feature"
AI: "I notice the 'Submit' button is grayed out but there's no
indication why. Users might think it's broken."
Human: "Good catch! What else do you see?"
AI: "The error message appears 200px below the form. On mobile,
users would need to scroll to see why their submission failed."
💡 Advantage: AI doesn't develop muscle memory for your UI quirks, so it can consistently spot usability issues you've gotten used to.
By maintaining living specs, your AI assistant learns your system's architecture, past decisions, and design patterns. Each bug fixed makes the AI better at preventing similar bugs.
# In PROJECT_SPEC.md
## Decision: Use Event Sourcing (2024-01-15)
**Why**: Need audit trail and time-travel debugging
**Trade-off**: More complex, but provides complete history
**Revisit**: When we reach 1M events/day
# AI now knows this context for all future suggestions
The "CI must be green" practice has roots in manufacturing quality control. When CI takes longer than 15 minutes, developers stop running it, and quality suffers. Fast CI enables tight feedback loops.
// Bug: User with null email crashed the system
// Don't just fix it - make it impossible:
// Before
type User = {
email: string | null;
name: string;
}
// After - use branded types
type VerifiedEmail = string & { _brand: 'VerifiedEmail' };
type User = {
email: VerifiedEmail; // Can't be null, must be verified
name: string;
}
Quick test: You can check how many potential array access bugs exist in your code:
echo '{"compilerOptions":{"noUncheckedIndexedAccess":true}}' > tsconfig.strict.json
npx tsc --project tsconfig.strict.json --noEmit
Each error represents a potential runtime crash.
Before diving deep, here are changes you can make RIGHT NOW that will immediately improve your code:
// Add to tsconfig.json
{
"compilerOptions": {
"noUncheckedIndexedAccess": true // Prevents 90% of "Cannot read property of undefined"
}
}
# Create a CLAUDE.md file for your AI assistant
echo "# Project Context for AI
## Key Decisions
- We use yarn, not npm
- We use Vitest, not Jest
- All arrays must be accessed safely
## Current Focus
- [Add your current task here]
" > CLAUDE.md
# Save as .git/hooks/pre-commit
#!/bin/sh
yarn typecheck && yarn lint || {
echo "❌ Fix type/lint errors before committing"
exit 1
}
// .vscode/settings.json
{
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.eslint": true
},
"typescript.preferences.includePackageJsonAutoImports": "on"
}
# Today's Loops (add to your README)
- [ ] Morning: CI Green Loop (get everything passing)
- [ ] Feature: User Story Loop (what are we building?)
- [ ] Afternoon: UI Polish Loop (get AI feedback on screenshots)
- [ ] Evening: Code Quality Loop (clean up for tomorrow)
These five changes take minutes to implement but provide immediate value in your vibe coding workflow.
A skipped test is worse than no test. It gives false confidence and hides broken functionality. If a test is skipped, either fix it right now or delete it. No exceptions.
// This is a lie to yourself and your team
it.skip('should handle user logout', () => {
// "TODO: fix this later" = never
});
Not to main branch, but to feature branches. Let your AI fix lint errors, update tests, and make small improvements directly. Review the commits, but let it work autonomously for mechanical tasks.
When using AI to write code, you need strong verification methods. 100% coverage (excluding explicitly unreachable code) is your baseline, not your goal. It's the foundation that lets you confidently accept AI-generated code and refactor rapidly.
// 100% coverage is necessary but not sufficient
test('user service', () => {
const user = new UserService();
user.getUser('123'); // No assertions!
expect(true).toBe(true); // This passes coverage but tests nothing
});
// Real testing goes beyond coverage
test('user service handles all edge cases', () => {
// Property-based tests
// Race condition tests
// Error scenarios
// Performance boundaries
});
Why this matters in vibe coding: When AI writes most of your code, you need comprehensive tests to verify behavior. 100% coverage is your safety net.
Users interact with UI, not your perfectly isolated functions. A working UI with poor unit tests ships value. Perfect unit tests with broken UI ships nothing.
If you need a comment to explain what code does, the code is too complex. The only good comments explain WHY, not WHAT.
// Bad: explains what
// Increment the counter by one
counter++;
// Good: explains why
// We retry 3 times because the API has intermittent failures on Mondays
const MAX_RETRIES = 3;
These practices may seem extreme, but they address real problems in modern AI-assisted development where you need strong guardrails to maintain quality at speed.
What Happened: In FileViewer component, highlightedTokens[index]
returned undefined when the syntax highlighter was slower than the render cycle.
The Code:
// The killer line
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
// highlightedTokens had 97 items, but we were rendering 100 lines
Cost:
- 2 hours of downtime
- 3 engineers debugging
- Hundreds of error reports
Prevention: noUncheckedIndexedAccess: true
would have caught this at compile time
Lesson: Race conditions often manifest as array access errors
What Happened: Git status component worked perfectly... except on Mondays when devs had 200+ changed files from weekend work.
Root Cause: UI rendered synchronously, blocking the main thread for 3+ seconds Solution: Virtualized list rendering + pagination Prevention: The Performance Loop would have caught this with realistic test data
What Happened: Developer changed yarn build
to use parallel compilation. CI stayed green. Production builds were missing critical files.
Why CI Missed It: CI used cached build artifacts Fix: Added production build verification to CI Lesson: Your CI isn't testing what you think it's testing
Common patterns in production failures:
- Array access without bounds checking
- Race conditions in async operations
- Performance cliffs with realistic data volumes
- CI environment differs from production
When encountering broken code, fix it rather than deleting it. That broken code often contains valuable business logic and edge case handling that took time to develop.
- Fix issues at their root cause
- Don't skip tests or remove functionality because it's difficult
- Maintain all existing features while improving the codebase
This principle becomes especially important in vibe coding where your AI assistant might suggest removing complex code rather than understanding and fixing it.
When you're coding through conversation with AI, you need strong safety nets to verify the generated code:
- 100% code coverage - Comprehensive tests verify the AI-generated code works as intended
- CI always green - Broken builds block everyone and break momentum
- No skipped tests - Every test documents expected behavior
- Type safety - Let the compiler catch errors the AI might introduce
These guardrails enable speed, not restrict it. With comprehensive tests and type checking, you can accept AI suggestions confidently.
Code quality isn't about perfectionism—it's about sustainability. These standards emerge from decades of collective experience showing what makes code maintainable over time. When functions grow too complex, they become impossible to understand. When parameter lists grow too long, the function is trying to do too much. When we allow silent failures, we create systems that fail mysteriously in production.
The key to maintainable code is keeping functions simple enough that you can understand them at a glance. Complex functions hide bugs, resist testing, and terrify other developers (including future you). Here's how different languages encourage simplicity:
Go Example:
// Good: Low complexity, single responsibility
func calculateDiscount(price float64, customerType string) float64 {
discountRates := map[string]float64{
"premium": 0.20,
"regular": 0.10,
"new": 0.15,
}
rate, exists := discountRates[customerType]
if !exists {
rate = 0.0
}
return price * rate
}
// Bad: High complexity, multiple responsibilities
func processOrderBad(order Order) (Result, error) {
// Too many nested conditions and responsibilities
// Split into smaller functions
}
Go's simplicity forces you to be explicit about error handling and avoid clever abstractions. The calculateDiscount
function does one thing well - it maps customer types to discounts. No hidden complexity, no surprising behavior.
Kotlin Example:
// Good: Clear, focused functions
sealed class Result<out T> {
data class Success<T>(val value: T) : Result<T>()
data class Error(val message: String) : Result<Nothing>()
}
fun parseConfig(json: String): Result<Config> =
try {
Result.Success(Json.decodeFromString<Config>(json))
} catch (e: Exception) {
Result.Error("Invalid configuration: ${e.message}")
}
// Use small, composable functions
fun validateConfig(config: Config): Result<Config> =
when {
config.timeout <= 0 -> Result.Error("Timeout must be positive")
config.retries < 0 -> Result.Error("Retries cannot be negative")
else -> Result.Success(config)
}
Kotlin's sealed classes and expression-based functions make error handling elegant and type-safe. The Result
type forces callers to handle both success and failure cases explicitly. Notice how parseConfig
and validateConfig
are small, focused functions that compose together - each does exactly one thing.
Mutability is the root of countless bugs. When data can change anywhere, anytime, reasoning about program behavior becomes impossible. Immutable data structures force you to be explicit about state changes, making programs easier to understand and debug.
Scala Example:
// Good: Immutable case classes and collections
case class User(
id: UserId,
name: String,
email: Email,
preferences: Set[Preference]
)
def updateUserPreferences(user: User, newPrefs: Set[Preference]): User =
user.copy(preferences = user.preferences ++ newPrefs)
// Working with immutable collections
val users = List(user1, user2, user3)
val premiumUsers = users.filter(_.isPremium)
val updatedUsers = users.map(u => u.copy(lastSeen = Instant.now()))
Scala's case classes are immutable by default. The copy
method creates a new instance with selected fields changed, leaving the original untouched. This makes it impossible to accidentally modify shared state - a common source of bugs in concurrent programs.
F# Example:
// F# - Immutable by default
type Customer = {
Id: CustomerId
Name: string
Orders: Order list
TotalSpent: decimal
}
// Pure functions with immutable data
let addOrder customer order =
{ customer with
Orders = order :: customer.Orders
TotalSpent = customer.TotalSpent + order.Total }
// Pipelining with immutable transformations
let processCustomers customers =
customers
|> List.filter (fun c -> c.TotalSpent > 1000m)
|> List.map (fun c -> { c with Status = Premium })
|> List.sortByDescending (fun c -> c.TotalSpent)
F# takes immutability even further - everything is immutable by default. The with
syntax creates a new record with specific fields updated. The pipeline operator (|>
) makes data transformations read like a story: take customers, filter them, update their status, sort them. Each step produces a new collection, leaving the original untouched.
Silent failures are time bombs in your codebase. When errors are hidden or ignored, they surface at the worst possible moments - usually in production, usually at 3 AM. Explicit error handling forces you to consider and handle failure cases at compile time, not debug time.
Rust Example:
// Rust - Explicit error handling with Result type
#[derive(Debug, thiserror::Error)]
enum ConfigError {
#[error("File not found: {0}")]
FileNotFound(String),
#[error("Parse error: {0}")]
ParseError(#[from] serde_json::Error),
#[error("Invalid value: {field} must be {requirement}")]
InvalidValue { field: String, requirement: String },
}
fn load_config(path: &str) -> Result<Config, ConfigError> {
let content = std::fs::read_to_string(path)
.map_err(|_| ConfigError::FileNotFound(path.to_string()))?;
let config: Config = serde_json::from_str(&content)?;
validate_config(&config)?;
Ok(config)
}
fn validate_config(config: &Config) -> Result<(), ConfigError> {
if config.timeout_ms == 0 {
return Err(ConfigError::InvalidValue {
field: "timeout_ms".to_string(),
requirement: "greater than 0".to_string(),
});
}
Ok(())
}
Rust makes error handling impossible to ignore. The Result
type forces you to handle both success and failure cases. The ?
operator provides convenient error propagation while maintaining explicitness. Custom error types with thiserror
make errors self-documenting - when something fails, you know exactly what went wrong and why.
Swift Example:
// Swift - Explicit error handling with typed errors
enum ValidationError: Error {
case emptyInput
case invalidFormat(String)
case outOfRange(Int, min: Int, max: Int)
}
struct EmailValidator {
static func validate(_ email: String) throws -> ValidatedEmail {
guard !email.isEmpty else {
throw ValidationError.emptyInput
}
let emailRegex = #"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$"#
let predicate = NSPredicate(format: "SELF MATCHES[c] %@", emailRegex)
guard predicate.evaluate(with: email) else {
throw ValidationError.invalidFormat("Invalid email format")
}
return ValidatedEmail(email)
}
}
// Usage with proper error handling
func registerUser(email: String, password: String) -> Result<User, Error> {
do {
let validEmail = try EmailValidator.validate(email)
let hashedPassword = try PasswordHasher.hash(password)
let user = User(email: validEmail, passwordHash: hashedPassword)
return .success(user)
} catch {
return .failure(error)
}
}
Swift's error handling combines the best of exceptions and return values. The throws
keyword makes error possibilities explicit in function signatures, while pattern matching on Result
types provides ergonomic error handling. The type system ensures you can't accidentally ignore errors—they must be handled with try
, try?
, or try!
.
Traditional software development often treats specifications as contracts written in stone before coding begins. This approach fails because our understanding of the problem evolves as we build the solution. Living documentation embraces this reality.
Specifications should be living documents that evolve with your project, not static requirements written once and forgotten. They serve as both a guide and a historical record of decisions. When you discover that a feature needs to work differently than originally specified, you update the spec alongside the code. When you learn why a particular approach doesn't work, you document that learning in the spec. This way, future developers (including yourself in six months) understand not just what the system does, but why it does it that way.
# Project Name Specification
## Vision
One paragraph describing what success looks like for this project.
## Core Principles
- Principle 1: Explanation
- Principle 2: Explanation
## Success Metrics
- Metric 1: How we measure it
- Metric 2: How we measure it
## Non-Goals
Things we explicitly choose NOT to do.
# Architecture Specification
## System Overview
High-level architecture diagram and description.
## Key Design Decisions
### Decision: Use Event Sourcing
**Context**: Need audit trail and time-travel debugging
**Decision**: Implement event sourcing for state management
**Consequences**: More complex, but provides complete history
**Date**: 2024-01-15
**Revisit**: When we reach 1M events/day
## Component Specifications
### Component Name
- **Purpose**: What it does
- **Interfaces**: How it connects
- **Invariants**: What must always be true
- **Example Usage**: Code example
In AI-assisted development, your process documentation helps the AI understand how you prefer to work and what patterns to follow.
# Process Specification
## Our Development Philosophy
We practice AI-assisted development - iterative, feedback-driven coding where the AI writes most of the implementation based on our specifications.
## Feedback Loops We Use
1. **CI Green Loop** (5-15 min) - Our default state
2. **Bug Investigation Loop** (15-45 min) - When issues arise
3. **UI Polish Loop** (20-40 min) - After features work
4. **Performance Loop** (45-90 min) - When things feel slow
## Iteration Cadence
- **Micro**: Every commit (5-15 minutes)
- **Minor**: Every feature (2-4 hours)
- **Major**: Every week (retrospective)
## How We Learn
1. **Test First**: Write tests that describe what we want
2. **Implement**: Make the tests pass
3. **Reflect**: What did we learn? Update specs
4. **Iterate**: Apply learnings to next cycle
## UI Improvement Process
Following UI_IMPROVEMENT_LOOP.md:
1. Test current UI with real use cases
2. Identify friction points
3. Design improvements
4. Implement with tests
5. Validate with users
6. Document learnings
## Measuring Success
- CI stays green >95% of time
- Features ship within estimated loops
- Bug fix includes prevention measure
- Each iteration improves velocity
## Process Evolution
This process itself is a living document. We update it when:
- A loop consistently takes longer than expected
- We discover a new effective pattern
- Team feedback suggests improvements
- Metrics show process bottlenecks
Human + AI Collaboration Pattern:
// Human provides context
"I need a file sync system that handles conflicts"
// AI asks clarifying questions
"What types of conflicts? How should they be resolved?"
// Human provides constraints
"Last-write-wins for now, but log all conflicts"
// AI drafts initial spec
interface FileSyncSpec {
conflictResolution: "last-write-wins" | "manual" | "merge";
conflictLog: ConflictEvent[];
syncStrategy: "immediate" | "batched" | "scheduled";
}
// Human refines
"Add offline support and partial sync"
// Iterate until complete
Java Example - Evolving API Spec:
// Version 1.0 - Initial spec
public interface UserService {
User createUser(String email, String password);
User getUser(Long id);
}
// Version 1.1 - After discovering auth needs
public interface UserService {
User createUser(String email, String password);
User getUser(Long id);
// ADDED: v1.1 - Need for API authentication
User getUserByToken(String authToken);
}
// Version 2.0 - After performance issues
public interface UserService {
CompletableFuture<User> createUser(String email, String password);
CompletableFuture<User> getUser(Long id);
// CHANGED: v2.0 - Made async for better performance
CompletableFuture<User> getUserByToken(String authToken);
// ADDED: v2.0 - Batch operations for efficiency
CompletableFuture<List<User>> getUsers(List<Long> ids);
}
TypeScript Example - Growing Feature Spec:
// specs/search-feature.spec.ts - Version 1
export interface SearchSpec {
capabilities: {
textSearch: boolean;
filters: string[];
maxResults: number;
};
requirements: {
responseTime: "<100ms for 95% of queries";
accuracy: "90%+ relevance score";
};
}
// Version 2 - After user feedback
export interface SearchSpec {
capabilities: {
textSearch: boolean;
fuzzySearch: boolean; // ADDED: Users need typo tolerance
filters: string[];
maxResults: number;
pagination: boolean; // ADDED: Large result sets
};
requirements: {
responseTime: "<100ms for 95% of queries";
accuracy: "90%+ relevance score";
typoTolerance: "1-2 character errors"; // ADDED
};
// ADDED: Specific examples to clarify behavior
examples: {
fuzzySearch: [
{ input: "teh", expected: ["the", "tea", "tech"] },
{ input: "pythn", expected: ["python"] }
];
};
}
Go Example:
// specs/rate-limiter.md
/*
## Rate Limiter Specification
### What
- Limit API calls to 100 requests per minute per user
- Use sliding window algorithm
- Return 429 status when limit exceeded
### Why
- Prevent API abuse (we had DDoS in Q3 2023)
- Ensure fair resource usage across customers
- Sliding window prevents burst exploitation
### Implementation Notes
*/
type RateLimiter interface {
// Check returns true if request is allowed
// Spec: Must be O(1) operation for performance
Check(userID string) bool
// Reset clears limits for testing
// Spec: Only available in test builds
Reset(userID string)
}
C# Example:
// Specs/CacheSpec.cs
namespace ProjectSpecs
{
/// <summary>
/// Cache Specification v2.1
///
/// Purpose: Reduce database load by 80%
/// Strategy: Two-tier cache (memory + Redis)
///
/// History:
/// - v1.0: Memory only (failed at scale)
/// - v2.0: Added Redis tier
/// - v2.1: Added cache warming
/// </summary>
public interface ICacheSpec
{
// Requirement: 99.9% cache availability
TimeSpan DefaultExpiration { get; }
// Requirement: <10ms read latency
Task<T?> GetAsync<T>(string key);
// Requirement: Write-through to database
Task SetAsync<T>(string key, T value, TimeSpan? expiration = null);
}
}
Python Example:
# specs/data_pipeline_spec.py
"""
Data Pipeline Specification
## Decision Log
### 2024-01: Chose Batch over Streaming
- **Options Considered**:
1. Real-time streaming (Kafka + Flink)
2. Micro-batching (Spark Streaming)
3. Traditional batch (Airflow + Spark)
- **Decision**: Traditional batch
- **Rationale**:
- 15-minute data freshness acceptable
- Simpler operations (team expertise)
- 70% lower infrastructure cost
- **Revisit When**:
- Need <5 minute freshness
- Team gains streaming expertise
### 2024-03: Added Incremental Processing
- **Problem**: Full reprocessing taking 6+ hours
- **Solution**: Track high watermarks, process only new data
- **Trade-off**: More complex state management
"""
from dataclasses import dataclass
from typing import Protocol, List
from datetime import datetime
class DataPipelineSpec(Protocol):
"""Specification for data pipeline components"""
def process_batch(
self,
start_time: datetime,
end_time: datetime
) -> BatchResult:
"""Process data within time window"""
...
def get_watermark(self) -> datetime:
"""Get last successfully processed timestamp"""
...
Rust Example:
// specs/reliability_spec.rs
/// Reliability Specification
///
/// This spec defines the reliability guarantees our system provides.
/// All implementations MUST pass these tests.
pub trait ReliabilitySpec {
type Error;
/// Messages must be delivered exactly once
async fn deliver_message(&self, msg: Message) -> Result<DeliveryReceipt, Self::Error>;
/// System must auto-recover from transient failures
async fn handle_failure(&self, error: Self::Error) -> RecoveryAction;
}
#[cfg(test)]
mod spec_tests {
use super::*;
/// Any implementation of ReliabilitySpec must pass this test
async fn test_exactly_once_delivery<T: ReliabilitySpec>(system: &T) {
let msg = Message::new("test");
// Send same message twice
let receipt1 = system.deliver_message(msg.clone()).await.unwrap();
let receipt2 = system.deliver_message(msg.clone()).await.unwrap();
// Must get same receipt (idempotent)
assert_eq!(receipt1.id, receipt2.id);
// Must have delivered exactly once
assert_eq!(get_delivery_count(msg.id), 1);
}
}
Kotlin Example:
// Weekly spec review with AI
class SpecReview {
fun reviewWithAI() {
"""
Human: "Review our search spec against last week's bug reports"
AI: "Found 3 issues:
1. Spec doesn't cover empty query behavior (Bug #123)
2. No mention of special character handling (Bug #125)
3. Performance requirement unrealistic for fuzzy search (Bug #130)"
Human: "Update spec to address these"
AI: "Here's the updated spec with additions marked..."
""".trimIndent()
}
}
// Spec evolves based on real-world learning
interface SearchSpecV3 {
fun handleEmptyQuery(): SearchResult // ADDED: Based on Bug #123
fun escapeSpecialChars(query: String): String // ADDED: Bug #125
companion object {
// UPDATED: Relaxed for fuzzy search based on Bug #130
const val FUZZY_SEARCH_TARGET_LATENCY = "200ms"
const val EXACT_SEARCH_TARGET_LATENCY = "100ms"
}
}
project-root/
├── README.md # Points to specs
├── specs/
│ ├── README.md # Spec overview & index
│ ├── PROJECT_SPEC.md # Overall vision
│ ├── PROCESS_SPEC.md # How we work & iterate
│ ├── ARCHITECTURE.md # Technical architecture
│ ├── API_SPEC.md # API contracts
│ ├── features/
│ │ ├── search.spec.md
│ │ ├── auth.spec.md
│ │ └── sync.spec.md
│ └── decisions/
│ ├── 2024-01-database-choice.md
│ ├── 2024-02-caching-strategy.md
│ └── 2024-03-api-versioning.md
├── src/
│ └── [implementation following specs]
└── tests/
└── spec-compliance/ # Tests that verify spec compliance
The specification feedback loop transforms documentation from a chore into a powerful development tool. This isn't about bureaucracy—it's about learning faster and building better software.
- Write initial spec (Human + AI collaboration)
- Implement against spec (With AI assistance)
- Discover gaps/issues (Through usage)
- Update spec (Document learning)
- Refactor if needed (Maintain alignment)
- Repeat
This creates a virtuous cycle where specifications improve based on real-world experience, and implementations stay aligned with evolved understanding. Each iteration makes the spec more accurate and the code more purposeful. The AI assistant becomes more helpful over time because it has access to your accumulated wisdom in the specs. New team members onboard faster because the specs explain not just the what, but the why and the why-not.
Many large companies use Nix to ensure developers have identical environments, eliminating "works on my machine" problems.
nix-shell is a tool that creates isolated, reproducible development environments. Think of it as a more powerful version of Python's virtualenv that works for ANY language and tool.
Why use it?
- Consistency: Everyone gets the exact same versions of all tools
- No "works on my machine": If it works in nix-shell, it works everywhere
- Clean system: Doesn't pollute your global system with dependencies
- Easy onboarding: New developers just run
nix-shell
and have everything
How to use it:
# Install Nix (one-time setup)
curl -L https://nixos.org/nix/install | sh
# Enter the development environment
nix-shell
# Now you have all project tools available
which node # Specific Node.js version for this project
which cargo # Specific Rust version for this project
Example shell.nix file:
{ pkgs ? import <nixpkgs> {} }:
pkgs.mkShell {
buildInputs = with pkgs; [
nodejs-18_x
yarn
rustc
cargo
python311
git
];
shellHook = ''
echo "Welcome to the project dev environment!"
echo "Node $(node --version), Yarn $(yarn --version)"
'';
}
Note: Some challenges with Python due to virtual environment conflicts - use poetry or pipenv inside nix-shell for Python projects.
Package management is where good intentions meet harsh reality. Every npm install
or yarn add
is a trust decision—you're inviting someone else's code into your project, along with all their dependencies, and their dependencies' dependencies. A single compromised package can take down thousands of projects, as we've seen with incidents like left-pad and event-stream.
Core Principles:
- Consistency is non-negotiable: Choose yarn or npm at project start and stick with it. Mixing package managers creates subtle bugs that waste hours of debugging time.
- Lock files are your safety net:
yarn.lock
orpackage-lock.json
ensures everyone gets exactly the same versions. Commit these files always—they're as important as your source code. - Audit regularly, update thoughtfully: Run
yarn audit
weekly, but don't blindly update everything. Each update is a potential breaking change. Update security patches immediately, minor versions carefully, major versions with full testing. - Document everything: Your README should tell a new developer exactly how to get from zero to running code. If it takes more than three commands, you're doing it wrong.
GitHub Actions provides 2,000 free minutes per month for private repos, which is sufficient for most small teams.
GitHub Actions is GitHub's built-in CI/CD platform. It runs your tests, builds, and deployments automatically when you push code.
Key Concepts:
- Workflow: A complete CI/CD process (defined in
.github/workflows/
) - Job: A set of steps that run on the same runner
- Step: Individual task (run tests, build, deploy)
- Runner: Virtual machine that executes your jobs
- Action: Reusable unit of code (like a function)
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install dependencies
run: yarn install --frozen-lockfile
- name: Run tests
run: yarn test
- name: Run type checks
run: yarn typecheck
- name: Run linter
run: yarn lint
Why cache?
- Speed: Avoid re-downloading dependencies every run
- Cost: Fewer API calls to package registries
- Reliability: Less dependent on external services
Yarn/NPM Caching Example:
- name: Cache node modules
uses: actions/cache@v4
with:
path: |
~/.cache/yarn
node_modules
key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.os }}-yarn-
Rust/Cargo Caching Example:
- name: Cache cargo registry
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
What is Cachix? Cachix is a binary cache service for Nix that dramatically speeds up CI builds by caching compiled packages.
Setting up Cachix (Free Tier):
- uses: cachix/install-nix-action@v24
with:
nix_path: nixpkgs=channel:nixos-unstable
- uses: cachix/cachix-action@v14
with:
name: your-cache-name # Use the public cache
# No authToken needed for public caches
- run: nix-shell --run "yarn test"
Benefits of Cachix:
- Fast builds: Download pre-built binaries instead of compiling
- Free tier: Public caches are free
- Shared cache: Team members benefit from each other's builds
name: Complete CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Nix with Cachix for reproducible environment
- uses: cachix/install-nix-action@v24
- uses: cachix/cachix-action@v14
with:
name: nix-community # Using public community cache
# Node.js caching
- name: Cache node modules
uses: actions/cache@v4
with:
path: |
~/.cache/yarn
node_modules
key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
# Run everything in nix-shell
- name: Install dependencies
run: nix-shell --run "yarn install --frozen-lockfile"
- name: Run tests
run: nix-shell --run "yarn test"
- name: Type check
run: nix-shell --run "yarn typecheck"
- name: Lint
run: nix-shell --run "yarn lint"
# Upload test results
- name: Upload coverage
if: always()
uses: actions/upload-artifact@v4
with:
name: coverage
path: coverage/
docker-e2e:
runs-on: ubuntu-latest
needs: test # Only run after tests pass
steps:
- uses: actions/checkout@v4
# Docker layer caching
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and test
run: |
docker-compose -f docker-compose.test.yml build
docker-compose -f docker-compose.test.yml up --abort-on-container-exit
- name: Upload test artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: e2e-results
path: test-results/
1. Parallel Jobs:
jobs:
lint:
runs-on: ubuntu-latest
steps: [...]
test:
runs-on: ubuntu-latest
steps: [...]
typecheck:
runs-on: ubuntu-latest
steps: [...]
# These run in parallel!
2. Matrix Builds:
strategy:
matrix:
node: [16, 18, 20]
os: [ubuntu-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
3. Conditional Steps:
- name: Deploy
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: ./deploy.sh
The GitHub CLI (gh
) transforms how you interact with CI/CD. Instead of constantly refreshing browser tabs to check if your build passed, you can monitor and control everything from your terminal. This tool is especially powerful when paired with an AI assistant—you can share CI failures directly and get immediate help debugging.
Installation and Setup:
# Install GitHub CLI
brew install gh # macOS
# or for Linux:
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo gpg --dearmor -o /usr/share/keyrings/githubcli-archive-keyring.gpg
# Login (choose browser auth for simplicity)
gh auth login
Essential Commands for CI Debugging:
# Quick status check - see your last 5 workflow runs
gh run list --limit 5
# Watch a running workflow in real-time (like tail -f for CI)
gh run watch
# Deep dive into failures - see full logs
gh run view <run-id> --log
# Failed due to flaky test? Re-run just the failed jobs
gh run rerun <run-id> --failed
# Download artifacts from a workflow run
gh run download <run-id>
The Power Move - Integrating with AI:
# Capture failure logs and send to your AI assistant
gh run view <run-id> --log | grep -A 20 "Error:" > failure.txt
# Now share failure.txt with your AI for debugging help
Security Best Practice: When automating with scripts, never hardcode tokens. Instead:
# Store token securely
echo "ghp_yourtoken" > /tmp/gh_token.txt
chmod 600 /tmp/gh_token.txt
# Use in scripts
export GH_TOKEN=$(cat /tmp/gh_token.txt)
gh run list # Now authenticated
# NEVER do this:
# echo $GH_TOKEN # This exposes your token!
The GitHub CLI becomes indispensable once you realize you can fix CI issues without leaving your editor. Combined with an AI assistant that can read logs and suggest fixes, you'll resolve CI failures in minutes instead of hours.
In vibe coding, testing is non-negotiable. When AI generates most of your code, comprehensive tests are essential for verification. 100% code coverage provides the foundation for confident refactoring and rapid iteration.
- NEVER SKIP TESTS - Fix failing tests instead of using
.skip
or.todo
- Test individual functions and components in isolation
- Use real services where possible in integration tests
- Fast unit test execution (< 100ms per test)
- High coverage of edge cases and error conditions
- Write tests first for complex features (TDD)
- Tests should be deterministic and repeatable
- Use property-based testing for invariants
- Test race conditions in concurrent code
- Focus on testing business logic, not implementation details
The principle of never skipping tests deserves special attention. When a test fails, it's telling you something important—either your code is broken, your test is wrong, or your understanding of the requirements has evolved. Skipping the test silences this feedback. Instead, fix the issue or update the test to match new requirements. Every skipped test is a landmine waiting for the next developer.
Each type of test serves a specific purpose in your safety net. Think of them as different zoom levels on a microscope—unit tests examine individual cells, integration tests watch how organs work together, and E2E tests verify the whole organism functions. Choosing the right test type for each scenario is as important as writing the test itself.
Unit tests are your first line of defense. They're the fastest to write, fastest to run, and fastest to debug when they fail. The key to great unit tests is ruthless isolation—each test should examine exactly one piece of behavior.
Characteristics of Great Unit Tests:
- Lightning fast: If a unit test takes more than 100ms, it's not a unit test. Speed matters because you'll run these thousands of times.
- Surgical precision: Test one specific behavior. When it fails, you should know exactly what's broken without debugging.
- Edge case hunters: This is where you test the weird stuff—empty arrays, null inputs, Unicode strings, negative numbers. If it can happen in production, test it here.
- Deterministic: Same input, same output, every single time. No random data, no time dependencies, no network calls.
Integration tests reveal the lies that unit tests tell. Your perfectly isolated components might work flawlessly alone but fail spectacularly when connected. Integration tests catch the impedance mismatches between systems.
What Makes Integration Tests Valuable:
- Real collaborations: Test how your code actually talks to databases, APIs, and file systems. Mock as little as possible.
- Data flow validation: Follow data as it moves through your system. Does that user input actually make it to the database correctly?
- Error propagation: When the database is down, does your API return a proper error? When the API fails, does your UI show a helpful message?
- Boundary testing: This is where you test timeouts, retries, and circuit breakers—all the stuff that only matters when systems interact.
E2E tests are your users' advocates. They don't care about your beautiful architecture or clever algorithms—they care that clicking the button does what it's supposed to do. These tests are expensive to write and slow to run, but they catch the bugs that users actually experience.
The E2E Philosophy - Keep It Real:
- NO MOCKING: The moment you mock in an E2E test, it's not E2E anymore. Use real databases, real files, real network calls. Yes, it's slower. Yes, it's worth it.
- Real File Operations: Don't simulate file changes—actually write files to disk and verify your file watcher notices. Create real git commits and check that your git integration works.
- Live System Integration: Start your actual backend, connect to real services, use genuine authentication. If it's flaky, fix the flakiness—don't hide it with mocks.
- User-Centric Workflows: Don't test implementation details. Test what users actually do: "I drag a file here, type some code, hit save, and see my changes in version control."
Property-based testing is a testing approach where instead of writing specific test cases, you describe properties that should always be true, and the testing framework generates random inputs to try to find counterexamples.
The shift from example-based to property-based testing is profound. With traditional testing, you're limited by your imagination—you test the cases you think of. With property-based testing, the computer generates cases you never imagined, often finding bugs in edge cases like empty strings, negative numbers, or Unicode characters you forgot existed.
Traditional Testing:
// Test specific cases
expect(add(2, 3)).toBe(5);
expect(add(0, 0)).toBe(0);
expect(add(-1, 1)).toBe(0);
Property-Based Testing:
// Test properties that should ALWAYS be true
property("addition is commutative", (a: number, b: number) => {
return add(a, b) === add(b, a);
});
The beauty of property-based testing is that when it finds a failing case, it automatically "shrinks" the input to find the minimal failing case. If your function fails on a 100-element array, the framework will systematically reduce it to find that it actually fails on any array with more than 3 elements, making debugging much easier.
Why use it?
- Finds edge cases you didn't think of: The framework generates hundreds of test cases
- Better test coverage: Tests properties, not just examples
- Discovers hidden assumptions: Often reveals bugs in boundary conditions
- Documents behavior: Properties serve as executable specifications
When to use it:
- Testing invariants and edge cases
- Great for race condition detection
- Focus on specific problematic patterns
- Generate test cases automatically
- Test with boundary conditions and edge cases
- Verify properties hold across all possible inputs
- Find counterexamples to assumptions
- Test mathematical properties (commutativity, associativity, etc.)
A race condition occurs when the behavior of software depends on the relative timing of events, especially in concurrent or asynchronous systems. The "race" is between different parts of code trying to access or modify shared resources.
Race conditions are particularly insidious because they violate our mental model of how programs execute. We think of code running line by line, but in concurrent systems, multiple lines of code execute simultaneously across different threads or async contexts. The bug only manifests when timing aligns in just the wrong way—which might be one time in a thousand, making it nearly impossible to debug through traditional means.
Classic Example:
let counter = 0;
// Two async operations racing
async function increment() {
const current = counter; // Read
await delay(1); // Some async work
counter = current + 1; // Write
}
// If both run at once:
// Both read 0, both write 1
// Result: counter = 1 (should be 2!)
This trivial example illustrates a pattern that causes real problems: check-then-act operations where the state can change between the check and the action. In production systems, this pattern appears in database operations, file system access, distributed systems communication, and anywhere else multiple actors might access shared resources.
Why are they dangerous?
- Intermittent: Only fail under specific timing
- Hard to reproduce: May work fine in development
- Data corruption: Can lead to inconsistent state
- Security risks: Can be exploited by attackers
How to detect them:
- Critical for concurrent/async systems
- Test timing-dependent failures systematically
- Use controlled scheduling to explore execution orders
- Focus on shared state and resource contention
- Test check-then-act patterns
- Verify atomicity of operations
- Test cleanup in failure scenarios
TypeScript Example:
// TypeScript - Testing increment race condition
async function testIncrementRace(iterations: number): Promise<number> {
let raceCount = 0;
for (let i = 0; i < iterations; i++) {
let counter = 0;
const increment = async (): Promise<void> => {
const current = counter;
await new Promise(resolve => setImmediate(resolve)); // Yield control
counter = current + 1;
};
await Promise.all([increment(), increment()]);
if (counter === 1) {
raceCount++;
}
}
return raceCount;
}
Rust Example:
// Rust - Testing increment race condition
use std::sync::Arc;
use std::sync::atomic::{AtomicU32, Ordering};
use tokio::task;
async fn test_increment_race(iterations: u32) -> u32 {
let mut race_count = 0u32;
for _ in 0..iterations {
let counter = Arc::new(AtomicU32::new(0));
let mut handles = vec![];
for _ in 0..2 {
let counter_clone = Arc::clone(&counter);
let handle = task::spawn(async move {
let current = counter_clone.load(Ordering::Relaxed);
tokio::task::yield_now().await; // Yield control
counter_clone.store(current + 1, Ordering::Relaxed);
});
handles.push(handle);
}
for handle in handles {
handle.await.unwrap();
}
if counter.load(Ordering::Relaxed) == 1 {
race_count += 1;
}
}
race_count
}
Java Example:
// Java - Testing double-delete race condition
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
public class DoubleDeleteTest {
private final AtomicInteger detectedRaces = new AtomicInteger(0);
public void testDoubleDelete(int runs) throws InterruptedException {
ExecutorService executor = Executors.newFixedThreadPool(2);
for (int run = 0; run < runs; run++) {
FileSystem fs = new RaceProneFileSystem();
String fileId = fs.createFile("/temp.txt", "temp");
CountDownLatch latch = new CountDownLatch(2);
Runnable deleteTask = () -> {
try {
fs.deleteFile(fileId);
} catch (FileNotFoundException e) {
detectedRaces.incrementAndGet();
} finally {
latch.countDown();
}
};
executor.submit(deleteTask);
executor.submit(deleteTask);
latch.await();
}
executor.shutdown();
}
}
Go Example:
// Go - Testing double-delete race condition
package main
import (
"sync"
"sync/atomic"
"errors"
)
type FileSystem interface {
CreateFile(path string, content string) (string, error)
DeleteFile(fileId string) error
}
func testDoubleDelete(runs int, fs FileSystem) int32 {
var detectedRaces int32
for run := 0; run < runs; run++ {
fileId, _ := fs.CreateFile("/temp.txt", "temp")
var wg sync.WaitGroup
wg.Add(2)
deleteTask := func() {
defer wg.Done()
if err := fs.DeleteFile(fileId); err != nil {
if errors.Is(err, ErrFileNotFound) {
atomic.AddInt32(&detectedRaces, 1)
}
}
}
go deleteTask()
go deleteTask()
wg.Wait()
}
return atomic.LoadInt32(&detectedRaces)
}
C# Example:
// C# - Testing resource pool contention
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
public class ResourcePoolTest {
public async Task<int> TestResourceContention(int iterations) {
var resourcePool = new ResourcePool<Connection>(maxSize: 2);
var errors = new ConcurrentBag<(int workerId, Exception error)>();
for (int i = 0; i < iterations; i++) {
var tasks = new Task[5];
for (int workerId = 0; workerId < 5; workerId++) {
int id = workerId; // Capture loop variable
tasks[workerId] = Task.Run(async () => {
try {
Connection resource = await resourcePool.AcquireAsync();
await Task.Delay(100); // Simulate work
await resourcePool.ReleaseAsync(resource);
} catch (Exception ex) {
errors.Add((id, ex));
}
});
}
await Task.WhenAll(tasks);
}
return errors.Count;
}
}
public class ResourcePool<T> where T : class, new() {
private readonly SemaphoreSlim _semaphore;
private readonly ConcurrentQueue<T> _resources;
public ResourcePool(int maxSize) {
_semaphore = new SemaphoreSlim(maxSize, maxSize);
_resources = new ConcurrentQueue<T>();
for (int i = 0; i < maxSize; i++) {
_resources.Enqueue(new T());
}
}
public async Task<T> AcquireAsync() {
await _semaphore.WaitAsync();
_resources.TryDequeue(out T resource);
return resource ?? throw new InvalidOperationException("No resources available");
}
public async Task ReleaseAsync(T resource) {
_resources.Enqueue(resource);
_semaphore.Release();
}
}
Kotlin Example:
// Kotlin - Testing resource pool contention
import kotlinx.coroutines.*
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.Semaphore
import java.util.concurrent.atomic.AtomicInteger
class ResourcePoolTest {
suspend fun testResourceContention(iterations: Int): Int {
val resourcePool = ResourcePool<Connection>(maxSize = 2)
val errorCount = AtomicInteger(0)
repeat(iterations) {
coroutineScope {
val jobs = List(5) { workerId ->
launch {
try {
val resource = resourcePool.acquire()
delay(100) // Simulate work
resourcePool.release(resource)
} catch (e: Exception) {
errorCount.incrementAndGet()
}
}
}
jobs.joinAll()
}
}
return errorCount.get()
}
}
class ResourcePool<T>(private val maxSize: Int) where T : Any {
private val semaphore = Semaphore(maxSize)
private val resources = ConcurrentLinkedQueue<T>()
init {
repeat(maxSize) {
resources.offer(createResource())
}
}
@Suppress("UNCHECKED_CAST")
private fun createResource(): T = Connection() as T
suspend fun acquire(): T = withContext(Dispatchers.IO) {
semaphore.acquire()
resources.poll() ?: throw IllegalStateException("No resources available")
}
suspend fun release(resource: T) = withContext(Dispatchers.IO) {
resources.offer(resource)
semaphore.release()
}
}
class Connection
Property-based testing finds bugs that example-based tests miss. Instead of testing specific inputs, you define properties that should always hold true, then let the framework generate hundreds of random inputs to try to break your assumptions. Here are the most powerful properties to test:
The round-trip property states that if you transform data and then reverse the transformation, you should get back exactly what you started with. This catches subtle bugs in serialization, encoding, parsing, and data transformation.
TypeScript Example:
// TypeScript - Testing serialization round-trip property
import fc from 'fast-check';
interface User {
id: string;
name: string;
age: number;
tags: string[];
}
const userArbitrary = fc.record<User>({
id: fc.uuid(),
name: fc.string({ minLength: 1, maxLength: 50 }),
age: fc.integer({ min: 0, max: 150 }),
tags: fc.array(fc.string(), { maxLength: 10 })
});
describe('Serialization properties', () => {
it('should maintain data through serialization round-trip', () => {
fc.assert(
fc.property(userArbitrary, (user: User) => {
const serialized = JSON.stringify(user);
const deserialized = JSON.parse(serialized) as User;
expect(deserialized).toEqual(user);
expect(deserialized.id).toBe(user.id);
expect(deserialized.tags).toEqual(user.tags);
})
);
});
});
Scala Example:
// Scala - Testing serialization round-trip property
import org.scalacheck.{Arbitrary, Gen, Properties}
import org.scalacheck.Prop.forAll
import play.api.libs.json._
case class User(id: String, name: String, age: Int, tags: List[String])
object SerializationSpec extends Properties("Serialization") {
implicit val userFormat: Format[User] = Json.format[User]
val genUser: Gen[User] = for {
id <- Gen.uuid.map(_.toString)
name <- Gen.alphaStr.suchThat(_.nonEmpty)
age <- Gen.choose(0, 150)
tags <- Gen.listOfN(5, Gen.alphaStr)
} yield User(id, name, age, tags)
implicit val arbUser: Arbitrary[User] = Arbitrary(genUser)
property("round-trip") = forAll { (user: User) =>
val serialized = Json.toJson(user)
val deserialized = Json.fromJson[User](serialized)
deserialized match {
case JsSuccess(value, _) => value == user
case JsError(_) => false
}
}
}
Java Example:
// Java - Testing idempotent operations
import net.jqwik.api.*;
import java.util.Set;
import java.util.HashSet;
class IdempotenceTest {
@Property
void normalizationIsIdempotent(@ForAll String input) {
String normalized1 = normalize(input);
String normalized2 = normalize(normalized1);
Assertions.assertThat(normalized2).isEqualTo(normalized1);
}
@Property
void deduplicationIsIdempotent(@ForAll List<@AlphaChars String> items) {
Set<String> deduped1 = deduplicate(items);
Set<String> deduped2 = deduplicate(new ArrayList<>(deduped1));
Assertions.assertThat(deduped2).isEqualTo(deduped1);
}
private String normalize(String input) {
return input.trim().toLowerCase().replaceAll("\\s+", " ");
}
private Set<String> deduplicate(List<String> items) {
return new HashSet<>(items);
}
}
Swift Example:
Swift's strong type system and SwiftCheck library make it easy to test idempotent operations—operations that produce the same result no matter how many times they're applied. This is crucial for data sanitization, caching, and distributed systems where operations might be retried.
// Swift - Testing idempotent operations
import SwiftCheck
struct User: Equatable {
let id: String
let name: String
let preferences: Set<String>
}
extension User: Arbitrary {
static var arbitrary: Gen<User> {
return Gen.zip3(
String.arbitrary,
String.arbitrary.suchThat { !$0.isEmpty },
Set<String>.arbitrary
).map(User.init)
}
}
class IdempotenceTests {
func testSanitizationIsIdempotent() {
property("User sanitization is idempotent") <- forAll { (user: User) in
let sanitized1 = self.sanitizeUser(user)
let sanitized2 = self.sanitizeUser(sanitized1)
return sanitized1 == sanitized2
}
}
private func sanitizeUser(_ user: User) -> User {
return User(
id: user.id.lowercased(),
name: user.name.trimmingCharacters(in: .whitespacesAndNewlines),
preferences: Set(user.preferences.map { $0.lowercased() })
)
}
}
In this example, the sanitizeUser
function is idempotent—running it twice produces the same result as running it once. The property test generates thousands of random users and verifies this property holds for all of them. This gives us confidence that our sanitization logic won't corrupt data if accidentally applied multiple times.
Commutative operations produce the same result regardless of the order of operands. This property is essential for distributed systems, concurrent updates, and conflict resolution. When operations are commutative, you can apply them in any order and get consistent results.
F# Example:
// F# - Testing commutative operations
open FsCheck
open FsCheck.Xunit
type Configuration = {
Flags: Set<string>
Settings: Map<string, int>
Features: string list
}
[<Property>]
let ``merging configurations is commutative`` (config1: Configuration) (config2: Configuration) =
let merge c1 c2 = {
Flags = Set.union c1.Flags c2.Flags
Settings = Map.fold (fun acc k v -> Map.add k v acc) c1.Settings c2.Settings
Features = c1.Features @ c2.Features |> List.distinct
}
let result1 = merge config1 config2
let result2 = merge config2 config1
// Flags and features order doesn't matter
result1.Flags = result2.Flags &&
result1.Settings = result2.Settings &&
Set.ofList result1.Features = Set.ofList result2.Features
Haskell Example:
Haskell's type system excels at expressing commutative properties. This example shows event sourcing with conflict resolution—a common pattern in distributed systems where events might arrive out of order.
-- Haskell - Testing commutative operations
import Test.QuickCheck
data Event = Created String | Updated String String | Deleted String
deriving (Eq, Show)
instance Arbitrary Event where
arbitrary = oneof
[ Created <$> arbitrary
, Updated <$> arbitrary <*> arbitrary
, Deleted <$> arbitrary
]
-- Property: Event merging with conflict resolution is commutative
prop_mergeCommutative :: Event -> Event -> Bool
prop_mergeCommutative e1 e2 =
mergeEvents e1 e2 == mergeEvents e2 e1
where
mergeEvents :: Event -> Event -> Event
mergeEvents (Created _) e@(Updated _ _) = e
mergeEvents e@(Updated _ _) (Created _) = e
mergeEvents (Deleted id1) _ = Deleted id1
mergeEvents _ (Deleted id2) = Deleted id2
mergeEvents e1 e2 = e2 -- Last write wins for same types
The merge function implements a conflict resolution strategy that's commutative: deletions always win, updates override creates, and for same-type conflicts, we use last-write-wins. This ensures that no matter what order events are processed, the final state is consistent.
Associative operations produce the same result regardless of how operations are grouped. This is fundamental for parallel processing, distributed aggregation, and functional composition. When operations are associative, you can break work into chunks and process them in any grouping.
C++ Example:
// C++ - Testing associative operations
#include <rapidcheck.h>
#include <vector>
#include <numeric>
struct Matrix {
std::vector<std::vector<double>> data;
Matrix operator*(const Matrix& other) const {
// Matrix multiplication implementation
// ...
}
bool operator==(const Matrix& other) const {
return data == other.data;
}
};
void testMatrixMultiplicationAssociative() {
rc::check("Matrix multiplication is associative",
[](const Matrix& a, const Matrix& b, const Matrix& c) {
// Assuming compatible dimensions
Matrix result1 = (a * b) * c;
Matrix result2 = a * (b * c);
RC_ASSERT(result1 == result2);
}
);
}
void testStringConcatenationAssociative() {
rc::check("String concatenation is associative",
[](const std::string& a, const std::string& b, const std::string& c) {
std::string result1 = (a + b) + c;
std::string result2 = a + (b + c);
RC_ASSERT(result1 == result2);
}
);
}
Python (with types) Example:
Python with type hints allows us to express mathematical concepts like monoids clearly. A monoid is a structure with an associative operation and an identity element—fundamental to many distributed algorithms and functional programming patterns.
# Python - Testing associative operations with type hints
from typing import List, TypeVar, Callable
from hypothesis import given, strategies as st
from dataclasses import dataclass
T = TypeVar('T')
@dataclass
class Monoid:
"""A monoid with associative operation and identity"""
combine: Callable[[T, T], T]
identity: T
def test_associativity(monoid: Monoid[T], a: T, b: T, c: T) -> bool:
"""Test that (a • b) • c = a • (b • c)"""
result1 = monoid.combine(monoid.combine(a, b), c)
result2 = monoid.combine(a, monoid.combine(b, c))
return result1 == result2
# List concatenation monoid
list_monoid = Monoid[List[int]](
combine=lambda x, y: x + y,
identity=[]
)
@given(
st.lists(st.integers()),
st.lists(st.integers()),
st.lists(st.integers())
)
def test_list_concat_associative(a: List[int], b: List[int], c: List[int]):
assert test_associativity(list_monoid, a, b, c)
This pattern shows how to create reusable property tests for any monoid. List concatenation is naturally associative—whether you combine [1,2] + ([3,4] + [5,6])
or ([1,2] + [3,4]) + [5,6]
, you get [1,2,3,4,5,6]
. This property enables parallel processing of list operations.
The most powerful property tests verify that operations preserve critical invariants. An invariant is a condition that must always be true—like a binary search tree maintaining sorted order or a bank account never going negative. These tests catch subtle bugs that unit tests miss.
OCaml Example:
(* OCaml - Testing invariant preservation *)
open QCheck
type 'a binary_tree =
| Leaf
| Node of 'a * 'a binary_tree * 'a binary_tree
(* Binary search tree invariant *)
let rec is_bst = function
| Leaf -> true
| Node (v, left, right) ->
let check_left = match left with
| Leaf -> true
| Node (lv, _, _) -> lv < v
in
let check_right = match right with
| Leaf -> true
| Node (rv, _, _) -> rv > v
in
check_left && check_right && is_bst left && is_bst right
(* Property: insert maintains BST invariant *)
let prop_insert_maintains_bst =
Test.make ~count:1000
~name:"insert maintains BST invariant"
(pair (list int) arbitrary)
(fun (elements, tree) ->
let tree' = List.fold_left insert_bst tree elements in
is_bst tree'
)
(* Property: balanced tree operations maintain balance invariant *)
let prop_balance_maintained =
Test.make ~count:1000
~name:"operations maintain balance"
(list int)
(fun elements ->
let tree = List.fold_left insert_balanced empty elements in
let height_diff = abs (height (left tree) - height (right tree)) in
height_diff <= 1
)
Rust Example:
Rust's ownership system provides strong guarantees, but we still need to verify that our abstractions maintain their invariants. This example shows two critical patterns: a sorted vector that must stay sorted, and set operations that must maintain uniqueness.
// Rust - Testing invariant preservation
use proptest::prelude::*;
use std::collections::BTreeSet;
#[derive(Debug, Clone, PartialEq)]
struct SortedVec<T: Ord> {
data: Vec<T>,
}
impl<T: Ord + Clone> SortedVec<T> {
fn new() -> Self {
SortedVec { data: Vec::new() }
}
fn insert(&mut self, value: T) {
match self.data.binary_search(&value) {
Ok(pos) | Err(pos) => self.data.insert(pos, value),
}
}
fn is_sorted(&self) -> bool {
self.data.windows(2).all(|w| w[0] <= w[1])
}
}
proptest! {
#[test]
fn insert_maintains_sorted_invariant(
initial in prop::collection::vec(any::<i32>(), 0..100),
to_insert in prop::collection::vec(any::<i32>(), 0..50)
) {
let mut sorted = SortedVec::new();
// Build initial sorted vec
for value in initial {
sorted.insert(value);
}
// Property: sorted after each insert
for value in to_insert {
sorted.insert(value);
prop_assert!(sorted.is_sorted());
}
}
#[test]
fn operations_preserve_set_properties(
operations in prop::collection::vec(
prop_oneof![
any::<i32>().prop_map(|x| ("insert", x)),
any::<i32>().prop_map(|x| ("remove", x)),
],
0..100
)
) {
let mut set = BTreeSet::new();
for (op, value) in operations {
match op {
"insert" => { set.insert(value); },
"remove" => { set.remove(&value); },
_ => unreachable!(),
}
// Invariant: no duplicates
let vec: Vec<_> = set.iter().cloned().collect();
let unique_count = vec.iter().collect::<BTreeSet<_>>().len();
prop_assert_eq!(vec.len(), unique_count);
}
}
}
The first test generates random sequences of insertions and verifies the sorted invariant holds after each one. The second test mixes insert and remove operations, checking that the set never contains duplicates. These tests would catch bugs like forgetting to maintain order during insertion or accidentally allowing duplicates.
IO Schedulers in property-based testing frameworks allow you to control the execution order of asynchronous operations deterministically. This makes race conditions reproducible and testable.
Imagine you're debugging a race condition that only appears in production once a week. You can't attach a debugger to production, and you can't reproduce it locally no matter how many times you run the test. This is where IO schedulers revolutionize concurrent testing. They turn non-deterministic bugs into deterministic ones by taking control of time itself—at least from your program's perspective.
The Problem:
- Race conditions depend on timing
- Traditional testing can't control async execution order
- Bugs appear randomly and are hard to reproduce
The Solution:
- IO schedulers intercept all async operations
- They systematically try different execution orders
- When a bug is found, they provide a seed to reproduce it
The magic happens through systematic exploration. Where traditional testing might run your concurrent code 1000 times and never hit the race condition, an IO scheduler methodically tries different orderings: What if Promise A resolves before Promise B? What if they resolve simultaneously? What if B completes while A is half-done? By exploring these possibilities systematically rather than randomly, IO schedulers can find race conditions that would take millions of random runs to encounter.
Fast-check provides a powerful scheduler for testing async race conditions:
import fc from 'fast-check';
describe('Race condition testing with fast-check', () => {
it('detects race in concurrent counter updates', async () => {
await fc.assert(
fc.asyncProperty(
fc.scheduler(),
async (s) => {
// The scheduler controls all async operations
let counter = 0;
let updateCount = 0;
// Define async operations
const increment = s.scheduleFunction(async () => {
const current = counter;
// This Promise resolution is controlled by scheduler
await s.schedule(Promise.resolve());
counter = current + 1;
updateCount++;
});
// Run operations concurrently
await Promise.all([increment(), increment()]);
// Property: counter should equal number of updates
return counter === updateCount;
}
),
{
verbose: true, // Shows which scheduling caused failure
seed: 42, // Can reproduce exact failure
numRuns: 100 // Try 100 different schedulings
}
);
});
});
Key Features:
fc.scheduler()
creates a controlled environmentscheduleFunction()
wraps async functionsschedule()
controls Promise resolution timing- Provides seed for reproducing failures
While fast-check uses schedulers, Hypothesis takes a different approach with stateful testing and rule-based state machines. This approach models your system as a state machine and generates sequences of operations that might expose race conditions.
from hypothesis import strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
import asyncio
import threading
class ConcurrentCounterTest(RuleBasedStateMachine):
def __init__(self):
super().__init__()
self.counter = 0
self.operations = []
self.lock = threading.Lock()
@rule()
def increment(self):
"""Simulate concurrent increment"""
def unsafe_increment():
current = self.counter
# Simulate async work
threading.Event().wait(0.001)
self.counter = current + 1
thread = threading.Thread(target=unsafe_increment)
thread.start()
self.operations.append(thread)
@rule()
def safe_increment(self):
"""Simulate safe increment with lock"""
with self.lock:
self.counter += 1
self.operations.append(None)
@invariant()
def counter_never_negative(self):
assert self.counter >= 0
def teardown(self):
# Wait for all threads
for op in self.operations:
if isinstance(op, threading.Thread):
op.join()
# Run the test
TestCounter = ConcurrentCounterTest.TestCase
This state machine approach generates random sequences of safe and unsafe increments, then checks that invariants hold. The beauty is that Hypothesis will find minimal examples—if there's a race condition, it will find the shortest sequence of operations that triggers it.
ScalaCheck takes yet another approach, providing utilities specifically designed for testing Scala's Future-based concurrent code. The example below shows how to build a custom deterministic scheduler for testing:
import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
import scala.concurrent.{Future, Promise, ExecutionContext}
import java.util.concurrent.atomic.AtomicInteger
import scala.concurrent.duration._
object ConcurrentRaceTest extends Properties("Concurrent") {
implicit val ec: ExecutionContext = ExecutionContext.global
// Custom scheduler for deterministic async testing
class DeterministicScheduler {
private val tasks = scala.collection.mutable.Queue[() => Unit]()
def schedule(task: => Unit): Unit = {
tasks.enqueue(() => task)
}
def runAll(): Unit = {
while (tasks.nonEmpty) {
tasks.dequeue()()
}
}
}
property("concurrent updates maintain consistency") = forAll { (seeds: List[Int]) =>
val scheduler = new DeterministicScheduler()
val counter = new AtomicInteger(0)
var inconsistencies = 0
// Create concurrent operations
val futures = seeds.map { seed =>
Future {
val current = counter.get()
scheduler.schedule {
// Check if still consistent
if (counter.get() != current) {
inconsistencies += 1
}
counter.set(current + 1)
}
}
}
// Try different execution orders
scheduler.runAll()
// Property: final count matches operations
counter.get() == seeds.length && inconsistencies == 0
}
}
This custom scheduler queues all async operations and executes them deterministically. By controlling when tasks run, you can systematically explore different interleavings and find race conditions that would be nearly impossible to discover through random testing.
Haskell's QuickCheck faces unique challenges testing IO operations due to Haskell's pure functional nature. The solution is to use monadic properties that can perform IO while maintaining deterministic testing:
import Test.QuickCheck
import Test.QuickCheck.Monadic
import Control.Concurrent
import Control.Concurrent.STM
import Data.IORef
-- Property: Concurrent increments should not lose updates
prop_concurrentCounter :: Positive Int -> Property
prop_concurrentCounter (Positive n) = monadicIO $ do
counter <- run $ newIORef 0
-- Create n concurrent increment operations
run $ do
mvars <- replicateM n newEmptyMVar
-- Start all threads
forM_ mvars $ \mvar -> forkIO $ do
current <- readIORef counter
threadDelay 1 -- Introduce potential race
writeIORef counter (current + 1)
putMVar mvar ()
-- Wait for all to complete
forM_ mvars takeMVar
-- Check final value
finalValue <- run $ readIORef counter
assert $ finalValue == n -- This will often fail!
-- Better approach with STM
prop_stmCounter :: Positive Int -> Property
prop_stmCounter (Positive n) = monadicIO $ do
counter <- run $ newTVarIO 0
run $ do
-- STM ensures atomicity
replicateConcurrently_ n $ atomically $ do
current <- readTVar counter
writeTVar counter (current + 1)
finalValue <- run $ readTVarIO counter
assert $ finalValue == n -- This always passes
The example shows two approaches: the first uses IORef and has a race condition (the assertion will often fail), while the second uses Software Transactional Memory (STM) to ensure atomicity. This demonstrates how property testing can validate your concurrency primitives and guide you toward correct implementations.
For distributed systems testing at scale, Jepsen has become the gold standard. Originally created to test distributed databases, Jepsen's approach can be applied to any concurrent system. It generates concurrent operations, tracks their history, and verifies that the observed behavior matches a consistency model:
(ns race-test.core
(:require [jepsen.checker :as checker]
[jepsen.generator :as gen]
[knossos.model :as model]))
(defn counter-client
"Client for testing concurrent counter"
[]
(reify client/Client
(invoke! [this test op]
(case (:f op)
:inc (do (increment-counter!)
(assoc op :type :ok))
:read (assoc op :type :ok
:value (read-counter!))))
(teardown! [this test])))
(def checker
(checker/compose
{:counter (checker/counter)
:timeline (checker/timeline)
:linearizable (checker/linearizable
{:model (model/register)
:algorithm :linear})}))
;; Test with controlled concurrency
(deftest concurrent-counter-test
(let [test (jepsen/run!
(assoc tests/noop-test
:client (counter-client)
:generator (gen/mix [gen/inc gen/read])
:checker checker))]
(is (:valid? (:results test)))))
Jepsen's power comes from its linearizability checker, which verifies that concurrent operations appear to take effect atomically at some point between their invocation and response. This catches subtle bugs like lost updates, dirty reads, and other consistency violations that are nearly impossible to find with traditional testing.
- Start Simple: Test basic race conditions first
- Use Seeds: Always save seeds that find bugs
- Limit Scope: Test small units of concurrent code
- Vary Timing: Test different delay patterns
- Check Invariants: Focus on properties that should always hold
Tool | Language | Approach | Best For |
---|---|---|---|
fast-check | JS/TS | Scheduler control | Async/Promise races |
Hypothesis | Python | State machines | Complex state transitions |
ScalaCheck | Scala | Future testing | Actor systems |
QuickCheck | Haskell | Monadic properties | Pure FP with IO |
Jepsen | Clojure | Distributed testing | Database/network races |
// This test found a real race condition in a file system
it('detects file system race condition', async () => {
await fc.assert(
fc.asyncProperty(
fc.scheduler(),
fc.array(fc.tuple(fc.constant('write'), fc.string())),
async (s, operations) => {
const fs = new ConcurrentFileSystem();
// Schedule all operations
const promises = operations.map(([op, data]) =>
s.scheduleFunction(async () => {
if (op === 'write') {
await fs.write('test.txt', data);
}
})()
);
await Promise.all(promises);
// Property: last write should win
const content = await fs.read('test.txt');
return content === operations[operations.length - 1][1];
}
)
);
// Failed with seed: 1337
// Reproduction: Two writes overlapped, corrupting data
});
The key advantage of property-based testing with IO schedulers is reproducibility. When a race condition is found, you can reproduce it exactly using the seed, making debugging much easier than traditional "Heisenbugs" that disappear when you try to observe them.
Think about the implications: every race condition bug becomes as debuggable as a simple logic error. You can add logging, step through with a debugger, refactor the code, and know with certainty whether you've fixed the issue by running the test with the same seed. This transforms concurrent programming from a dark art into a science.
Visual regression testing captures screenshots of your UI and compares them against baseline images to detect unintended visual changes.
Why use it?
- Catch CSS bugs: Styling changes that break layouts
- Cross-browser issues: Rendering differences
- Responsive design: Ensure mobile views work
- Component changes: Unintended side effects
Chromatic is a visual testing service that integrates with Storybook to automatically capture and compare UI components.
The genius of Chromatic lies in how it solves the fundamental problem of UI testing: how do you know if something looks right? Traditional testing can verify that a button exists and has the right text, but can it tell you that the button is now 2 pixels too far to the left, or that its shadow is slightly wrong, or that it overlaps with another element on mobile devices? Chromatic can, because it tests the actual pixels users see.
- Install Dependencies:
yarn add -D chromatic
- Create Storybook Stories:
// Button.stories.tsx
import type { Meta, StoryObj } from '@storybook/react';
import { Button } from './Button';
const meta: Meta<typeof Button> = {
title: 'Components/Button',
component: Button,
parameters: {
// Chromatic captures at these viewports
chromatic: { viewports: [320, 768, 1200] },
},
};
export default meta;
type Story = StoryObj<typeof Button>;
// Each story becomes a visual test
export const Primary: Story = {
args: {
variant: 'primary',
children: 'Click me',
},
};
export const Loading: Story = {
args: {
variant: 'primary',
loading: true,
children: 'Loading...',
},
};
// Test interaction states
export const Hover: Story = {
args: {
variant: 'primary',
children: 'Hover me',
},
parameters: {
pseudo: { hover: true },
},
};
// Test all variants at once
export const AllVariants: Story = {
render: () => (
<div style={{ display: 'flex', gap: 16 }}>
<Button variant="primary">Primary</Button>
<Button variant="secondary">Secondary</Button>
<Button variant="danger">Danger</Button>
<Button disabled>Disabled</Button>
</div>
),
};
- Configure GitHub Action:
# .github/workflows/chromatic.yml
name: Chromatic
on: push
jobs:
chromatic:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for Chromatic
- name: Install dependencies
run: yarn install --frozen-lockfile
- name: Run Chromatic
uses: chromaui/action@v1
with:
projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
buildScriptName: build-storybook
onlyChanged: true # Only test changed components
1. Automatic Visual Diffing:
// Chromatic automatically detects these changes:
// - Color changes (even 1px differences)
// - Layout shifts
// - Text changes
// - Missing elements
// - Animation states
2. Cross-Browser Testing:
// .storybook/main.ts
export default {
parameters: {
chromatic: {
// Test in multiple browsers
browsers: ['chrome', 'firefox', 'safari'],
// Test responsive designs
viewports: [320, 768, 1200, 1920],
},
},
};
3. Interaction Testing:
// Test complex interactions visually
export const MenuOpen: Story = {
play: async ({ canvasElement }) => {
const canvas = within(canvasElement);
const menuButton = await canvas.findByRole('button', { name: /menu/i });
await userEvent.click(menuButton);
// Chromatic captures the open menu state
},
};
4. Delay for Animations:
export const AnimatedModal: Story = {
parameters: {
chromatic: {
delay: 500, // Wait 500ms for animation
pauseAnimationAtEnd: true,
},
},
};
1. Deterministic Stories:
// Bad: Non-deterministic
export const RandomColors: Story = {
render: () => <div style={{ color: getRandomColor() }}>Text</div>,
};
// Good: Deterministic
export const ColorVariants: Story = {
render: () => (
<>
<div style={{ color: '#FF0000' }}>Red Text</div>
<div style={{ color: '#00FF00' }}>Green Text</div>
<div style={{ color: '#0000FF' }}>Blue Text</div>
</>
),
};
2. Handle External Data:
// Mock external data for consistency
export const UserProfile: Story = {
parameters: {
msw: {
handlers: [
rest.get('/api/user', (req, res, ctx) => {
return res(
ctx.json({
name: 'Test User',
avatar: '/static-avatar.png', // Use static images
})
);
}),
],
},
},
};
3. Ignore Dynamic Content:
// Ignore timestamps or dynamic IDs
export const PostWithTimestamp: Story = {
parameters: {
chromatic: {
diffThreshold: 0.2, // Allow small differences
ignoreSelectors: ['.timestamp', '[data-testid="generated-id"]'],
},
},
};
graph LR
A[Push Code] --> B[Build Storybook]
B --> C[Chromatic Captures]
C --> D{Visual Changes?}
D -->|No| E[Auto-Approve]
D -->|Yes| F[Review Changes]
F --> G{Accept?}
G -->|Yes| H[Update Baseline]
G -->|No| I[Fix Issues]
Free Tier Tips:
- 5,000 snapshots/month on free tier
- Use
onlyChanged: true
to test only modified components - Limit viewports to essential sizes
- Use
skip
parameter for unchanged stories
// Skip unchanged stories
export const StaticLogo: Story = {
parameters: {
chromatic: { disableSnapshot: true },
},
};
// Only test critical viewports
export const MobileOnly: Story = {
parameters: {
chromatic: { viewports: [320] },
},
};
1. Check the Chromatic UI:
- Visual diff highlights exact pixels that changed
- Side-by-side comparison
- Overlay mode shows differences clearly
2. Common Issues:
// Font loading issues
export const Typography: Story = {
loaders: [
async () => {
// Ensure fonts are loaded
await document.fonts.ready;
},
],
};
// Animation issues
export const Spinner: Story = {
parameters: {
chromatic: {
pauseAnimationAtEnd: true, // Capture final state
},
},
};
// Flaky hover states
export const HoverCard: Story = {
parameters: {
// Use pseudo states instead of play functions
pseudo: { hover: true },
},
};
3. Local Testing:
# Run Chromatic locally to debug
npx chromatic --project-token=<token> --build-script-name=build-storybook
# Test specific stories
npx chromatic --only-story-names="Button/Primary"
# Complete Chromatic CI setup
name: UI Tests
on: [push, pull_request]
jobs:
visual-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cache dependencies
uses: actions/cache@v4
with:
path: node_modules
key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
- name: Install
run: yarn install --frozen-lockfile
- name: Build Storybook
run: yarn build-storybook
- name: Run Chromatic
id: chromatic
uses: chromaui/action@v1
with:
projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
storybookBuildDir: storybook-static
exitZeroOnChanges: true # Don't fail build on changes
- name: Comment PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const { buildUrl, storybookUrl } = ${{ steps.chromatic.outputs }};
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `
🎨 Visual changes detected!
- [View Storybook](${storybookUrl})
- [Review changes in Chromatic](${buildUrl})
`
});
The combination of Storybook + Chromatic provides a powerful visual testing workflow that catches UI regressions before they reach production.
Effective AI pair programming works as a dialogue. You provide domain knowledge, business context, and architectural decisions. The AI handles implementation details, applies technical patterns, and generates boilerplate code. This division of labor enables higher development velocity than either could achieve alone.
- Give AI access to development tools so it can monitor CI and fix issues
- Provide clear context about current state and objectives
- Use structured todo lists to track progress
- Share failure logs and diagnostics for efficient debugging
- Iterate in small chunks with frequent testing
- Always validate AI suggestions through testing
- Share complete error messages and stack traces
- Provide relevant code context
- Explain what was expected vs. actual behavior
- Use AI to generate test cases for edge cases
- Have AI suggest multiple solution approaches
⚠️ Critical Warning: AI Behavior When Struggling When AI assistants encounter difficult problems, they may try to:
- Delete problematic code instead of fixing it
- Skip failing tests rather than making them pass
- Suggest workarounds that avoid the real issue
- Give up on complex debugging challenges
Always push AI to keep working on the actual fix. Watch for signs like "let's simplify this" or "we can skip this test" - these are red flags that the AI is trying to avoid the hard problem. The correct response is to insist on solving the root cause, not working around it.
- Define clear objectives and success criteria
- Break work into small, testable chunks
- Run tests frequently to catch regressions early
- Share results (successes and failures) with AI
- Iterate based on feedback from tests and CI
- Document lessons learned for future reference
TypeScript isn't just JavaScript with types—it's a different way of thinking about code. When used properly, TypeScript transforms runtime errors into compile-time errors, making entire categories of bugs impossible. The practices in this section aren't arbitrary rules; they're battle-tested patterns that maximize TypeScript's ability to catch errors before they reach production.
npm's resolution algorithm can produce different results for the same package.json, leading to "works on my machine" issues. Yarn's deterministic algorithm ensures consistent dependencies across all environments.
- ALWAYS use yarn, NEVER use npm
- Use
yarn install
instead ofnpm install
- Use
yarn add
instead ofnpm install
- Use
yarn test
instead ofnpm test
- Use
yarn run
instead ofnpm run
- Use
Vitest offers significant performance improvements over Jest through esbuild transformation and parallel test execution. It also shares configuration with Vite, reducing setup complexity.
- ABSOLUTELY NO JEST - Use Vitest only (
vi
notjest
) - Import test utilities from
vitest
, notjest
- Use
vi.fn()
instead ofjest.fn()
- Use
vi.mock()
instead ofjest.mock()
- Use
vi.spyOn()
instead ofjest.spyOn()
yarn test # Run all tests
yarn test -- path/to/test.file.tsx # Run specific test
yarn typecheck # TypeScript check
yarn lint # Lint check
yarn lint:fix # Fix lint issues
yarn typecheck && yarn lint # Run all checks
Modern JavaScript engines optimize functional methods like map, filter, and reduce effectively. These methods provide clearer intent and reduce common loop-related bugs while maintaining performance.
- NO
function
keyword - Use arrow functions only - NO for loops - Use
.map()
,.filter()
,.reduce()
,.forEach()
- NO while/do-while loops - Use recursion or functional methods
- NO for...in/for...of loops - Use
Object.keys()
,Object.values()
,Object.entries()
- Maximum function complexity: 10
- Maximum function parameters: 4
- Maximum function lines: 80
- Prefer
const
overlet
, never usevar
- Use destructuring assignment
- Use template literals over string concatenation
Array access bugs are insidious because they often work fine in development, pass your unit tests, and then crash in production when data doesn't match your assumptions. The most dangerous pattern is assuming an array element exists before accessing its properties. This single assumption causes more production crashes than almost any other JavaScript pattern.
The solution isn't just defensive programming—it's leveraging TypeScript's noUncheckedIndexedAccess
flag to make these bugs impossible. With this flag enabled, TypeScript forces you to handle the possibility that any array access might return undefined
. It's like having a safety net that catches you before you fall.
- ALWAYS check array element exists before accessing properties
- Never do
array[index].property
without checkingarray[index]
exists first
// Bad: Can crash with "Cannot read properties of undefined"
const item = match[1].length;
// Good: Safe access patterns
const item = match[1]?.length; // Optional chaining
const item = match[1]?.length ?? defaultValue; // With default
const item = match[1] && match[1].length; // Guard check
// Utility pattern
import { safeGet } from '@/utils/safeArray';
const item = safeGet(array, index, defaultItem).property;
Enable strict mode settings in tsconfig.json
:
{
"compilerOptions": {
"strict": true,
"noUncheckedIndexedAccess": true, // KEY: Forces undefined checks on array access
"strictNullChecks": true,
"strictPropertyInitialization": true,
"noImplicitAny": true,
"noImplicitThis": true,
"useUnknownInCatchVariables": true
}
}
// Prevent mixing different ID types
type SessionId = string & { _brand: 'SessionId' };
type RequestId = string & { _brand: 'RequestId' };
type ProcessId = number & { _brand: 'ProcessId' };
// Helper functions
const SessionId = (id: string): SessionId => id as SessionId;
const RequestId = (id: string): RequestId => id as RequestId;
const ProcessId = (id: number): ProcessId => id as ProcessId;
type Result<T, E = Error> =
| { ok: true; value: T }
| { ok: false; error: E };
interface IWebSocketService {
send: (message: string) => Promise<Result<void, WebSocketError>>;
onMessage: (handler: (message: ClaudeMessage) => void) => void;
}
const handleMessage = (message: ClaudeMessage): void => {
switch (message.type) {
case 'ClaudeOutput':
handleOutput(message);
break;
case 'ClaudeSessionUpdate':
handleSessionUpdate(message);
break;
// ... handle all cases
default:
// Ensures all cases are handled at compile time
const _exhaustive: never = message;
throw new Error(`Unhandled message type: ${(_exhaustive as any).type}`);
}
};
Rust changes how you think about programming. Its ownership system isn't just about memory safety—it's a new mental model that makes concurrent programming safe by default. The borrow checker isn't an annoyance to work around; it's a teacher that shows you where your design has hidden complexity. Embracing Rust means embracing its philosophy: if it compiles, it probably works correctly.
Rust's ownership system naturally pushes you toward functional programming. When mutation requires explicit permission and sharing requires careful thought, you naturally write more pure functions. Embrace this—Rust is trying to teach you something.
- Immutability by default: In Rust, everything is immutable unless you explicitly ask for
mut
. This isn't a limitation—it's liberation from an entire class of bugs. - Result and Option everywhere: Rust doesn't have null or exceptions. Instead, it has types that make failure explicit and impossible to ignore. This transforms runtime errors into compile-time errors.
- Side effects are visible: Any function that can perform I/O or mutate state shows this in its signature. You can't hide side effects in Rust—they're part of the contract.
- Composition over inheritance: Rust doesn't have inheritance because it doesn't need it. Traits and generics provide more powerful composition patterns without the fragility of inheritance hierarchies.
These aren't arbitrary rules—each one prevents real bugs that have bitten Rust developers. Following these standards is the difference between fighting the borrow checker and dancing with it.
Error Handling Excellence:
- Never
.unwrap()
in production code: Every.unwrap()
is a potential panic waiting to crash your program. Use.expect()
only in truly impossible cases, and even then, consider if the "impossible" might happen. - Custom error types tell stories: Don't use generic errors. Create specific error types that explain what went wrong and how to fix it. The
thiserror
crate makes this painless.
Documentation as Contract:
- Every public item needs
///
docs: If it'spub
, it needs documentation. This isn't bureaucracy—it's a contract with your users (including future you). - Examples in docs: The best documentation includes examples. Rust even tests these examples, ensuring they stay current.
Performance Without Sacrifice:
&str
vsString
: Accept&str
parameters when you just need to read. Only requireString
when you need ownership. This simple rule eliminates unnecessary allocations.const
for compile-time computation: If it can be computed at compile time, make itconst
. The compiler becomes your calculator.
Type System Mastery:
#[derive]
liberally:Debug
,Clone
,PartialEq
—these traits make your types useful. Deriving them costs nothing at runtime.impl Trait
for ergonomics: Returnimpl Iterator<Item = T>
instead ofBox<dyn Iterator<Item = T>>
. It's faster and clearer.
Tools That Teach:
cargo clippy
is your mentor: Clippy doesn't just find bugs—it teaches idiomatic Rust. Its suggestions will make you a better Rust developer.- Naming conventions matter:
snake_case
for functions,PascalCase
for types,SCREAMING_SNAKE_CASE
for constants. Consistency aids readability.
These commands form your Rust development rhythm. Run them frequently—they're designed to catch problems early when they're easy to fix.
# Format code - Rust's formatter is unopinionated and consistent
cargo fp-format
# FP-friendly lints - Catches anti-patterns and suggests functional alternatives
cargo fp-check
# Run tests - Includes doc tests, unit tests, and integration tests
cargo fp-test
# Security audit - Checks dependencies for known vulnerabilities
cargo audit
# Run everything - Format, lint, test, audit in one command
make rust-quality
Pro tip: Set up pre-commit hooks to run these automatically. Rust development is smoothest when you maintain quality continuously rather than fixing issues in bulk.
A red CI build is like a broken traffic light—it stops everyone. The CI Green Rule isn't just about keeping tests passing; it's about maintaining the team's ability to ship confidently. When CI is red, you can't deploy, you can't trust your changes, and you block everyone else's work. This is why fixing a red build takes precedence over everything else, including that exciting new feature you're working on.
- ALWAYS ensure CI is green before completing tasks
- Check CI status:
export GH_TOKEN=$(cat /tmp/gh_token.txt) && gh run list --repo snoble/pocket-ide --limit 5
- Monitor workflow runs until all checks pass
- Fix any failing tests, linting errors, or type errors immediately
- Keep retrying until all CI checks are green
- Use
gh run view <run-id>
to see detailed failure logs
- Assess Current State - Check CI status and identify failures
- Investigate Failures - Use
gh run view <run-id> --log
for detailed logs - Categorize Issues - Infrastructure, build, test, or deployment failures
- Fix Priority Order - Infrastructure → TypeScript → ESLint → Tests → Visual
- Local Validation - Run failing command locally before pushing
- Push and Monitor - Commit, push, and monitor new CI run
- ✅ TypeScript Check
- ✅ ESLint Check
- ✅ Test Coverage
- ✅ Visual Testing & Screenshots
- ✅ Chromatic Deployment
- ✅ Docker E2E Tests
- ✅ Security Audits
- ✅ Production Build Verification
Great user interfaces aren't designed in conference rooms—they're discovered through usage. The UI Improvement Loop acknowledges that the best interface is the one that survives contact with reality. You build something functional, use it yourself, notice the friction, fix it, and repeat. Each iteration makes the interface a little less frustrating, a little more delightful.
The Continuous Refinement Process:
- Use it yourself → You can't improve what you don't use. Spend time actually using your interface for real tasks.
- Notice friction → Where do you hesitate? What feels clunky? What requires unnecessary steps?
- Get fresh perspectives → Your AI assistant is perfect here—it hasn't developed your muscle memory and will spot confusing elements.
- Fix systematically → Address the highest-friction points first. Small improvements compound quickly.
- Test the fixes → Ensure your improvements don't break existing workflows.
- Repeat relentlessly → UI excellence comes from hundreds of tiny improvements, not one big redesign.
E2E Tests as UX Detectives: Your E2E tests are more than regression catchers—they're user experience investigators. When an E2E test is hard to write, it's telling you the user workflow is too complex. When you need multiple steps to accomplish something simple in a test, users will struggle too.
Mobile isn't just desktop with a smaller screen—it's a fundamentally different interaction paradigm. What works beautifully with a mouse and keyboard can be completely unusable on a touch device. Mobile-first development forces you to focus on what truly matters.
Touch Interaction Principles:
- Hover is dead, long live the tap: Mobile users can't hover. That clever tooltip that appears on mouse-over? Useless. Replace hover interactions with explicit tap actions. Use long-press for secondary actions, but always provide visual feedback that something is pressable.
- Fat fingers need fat targets: The average fingertip is 10mm wide. Apple recommends 44x44pt touch targets minimum. That tiny 'x' button that's easy to click with a mouse? It's user-hostile on mobile. Make touch targets generous and well-spaced.
- Context menus via long press: Desktop users right-click for context menus. Mobile users expect long-press. Implement it consistently and show visual feedback (like a subtle vibration or color change) when the long-press is recognized.
- Tooltips that don't suck: Alert dialogs for tooltips are jarring and break flow. Instead, use inline overlays that appear near the touched element and dismiss when tapping elsewhere. Think of them as gentle whispers, not shouted interruptions.
Real-World Adaptations:
- External changes happen: On mobile, files change while users are viewing them—from sync services, other apps, or background processes. Poll for changes and show unobtrusive notifications: "File updated externally [Reload]".
- Visual feedback is oxygen: Desktop users have hover states and cursor changes. Mobile users have only what you explicitly show them. Every interaction needs visual feedback—buttons should depress, selections should highlight, loading states should animate.
Errors are inevitable. How you display them determines whether users feel frustrated or empowered. Good error UX turns problems into learning opportunities.
The Error Hierarchy:
- Tab badges tell the story: A red badge with "3" on a file tab immediately tells users there are three errors in that file. They can choose when to address them without having the errors shoved in their face.
- Status bar for overview: Use the bottom status bar for aggregate information: "❌ 3 errors,
⚠️ 7 warnings" gives users a project-wide view without overwhelming them with details. - Progressive disclosure: Errors in current view should be obvious (red underlines), errors in other files should be indicated (badges), but full error details should appear only on demand.
- Touch-friendly error inspection: On mobile, tapping an error underline should show a dismissible tooltip with the error message and quick fixes. Never use modal alerts for error display—they're the UI equivalent of shouting at users.
The Psychology of Error Display: Remember that behind every error message is a human who's probably already frustrated. Your error display should help, not hurt. Be specific about what's wrong, suggest how to fix it, and never make users feel stupid. The best error message is the one that helps users fix the problem and learn something in the process.
The most important setting: "noUncheckedIndexedAccess": true
// With noUncheckedIndexedAccess: false (default)
const item = arr[2]; // Type: string (UNSAFE)
// With noUncheckedIndexedAccess: true
const item = arr[2]; // Type: string | undefined (SAFE)
module.exports = {
rules: {
'@typescript-eslint/no-non-null-assertion': 'error',
'@typescript-eslint/strict-boolean-expressions': ['error', {
allowNullableObject: false,
allowNullableBoolean: false,
allowNullableString: false,
allowNullableNumber: false,
allowAny: false
}],
'no-unsafe-optional-chaining': 'error',
'@typescript-eslint/no-unnecessary-condition': 'error',
'react-hooks/exhaustive-deps': 'error',
'react-hooks/rules-of-hooks': 'error',
}
};
useEffect(() => {
let cancelled = false;
async function loadData() {
const data = await fetchData();
if (!cancelled) {
setData(data);
}
}
loadData();
return () => {
cancelled = true;
};
}, []);
const [state, setState] = useState({ data: [], version: 0 });
useEffect(() => {
const currentVersion = state.version + 1;
async function loadData() {
const data = await fetchData();
setState(prev => {
if (currentVersion > prev.version) {
return { data, version: currentVersion };
}
return prev;
});
}
loadData();
}, [dependency]);
// src/utils/safeArray.ts
export const safeGet = <T>(
array: readonly T[],
index: number,
defaultValue: T
): T => {
return array[index] ?? defaultValue;
};
export const safeDualMap = <T, U, R>(
primary: readonly T[],
secondary: readonly U[],
mapper: (primary: T, secondary: U | undefined, index: number) => R
): R[] => {
return primary.map((item, index) => {
const secondaryItem = index < secondary.length ? secondary[index] : undefined;
return mapper(item, secondaryItem, index);
});
};
export const arraysInSync = <T, U>(
arr1: readonly T[],
arr2: readonly U[]
): boolean => {
return arr1.length === arr2.length;
};
- All array access uses optional chaining or null checks
- Array length comparisons before parallel access
- No assumptions about array synchronization
- Defensive defaults for undefined values
- Effect cleanup for async operations
- State updates check if component is mounted
- Loading states prevent premature rendering
- Error boundaries handle unexpected states
- No unnecessary re-renders from race conditions
- Debounced/throttled rapid updates
- Memoization where appropriate
- Branded types for IDs prevent mixing
- Result types for explicit error handling
- Exhaustive switch statements with never type
- No
any
types without justification
When you find a bug, ALWAYS ask: "How could stricter types have caught this?"
This practice turns every debugging session into a learning opportunity that strengthens your entire codebase.
- Runtime Error: Crashed with undefined/null access, type mismatch
- Logic Error: Wrong behavior, incorrect data flow
- Race Condition: Timing-dependent failure
- Integration Error: Component interaction failure
Ask these questions:
- Could branded types have prevented ID confusion?
- Would
noUncheckedIndexedAccess
have caught array access issues? - Could discriminated unions have enforced correct state handling?
- Would Result types have made error handling explicit?
- Could stricter function signatures have caught this?
Don't just fix the bug - improve the types to prevent similar bugs.
The Bug: TypeError: Cannot read properties of undefined (reading 'length')
// Buggy code in FileViewer.tsx
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
// Later: lineTokens.length - CRASH when highlightedTokens[index] is undefined
Root Cause: Array access without bounds checking in race condition
Type Safety Analysis:
// How stricter types would have caught it:
// 1. With noUncheckedIndexedAccess: true
const lineTokens = hasHighlighting ? highlightedTokens[index] : [];
// ^^^^^^^^^^^^^^^^^^^^
// Type error: Type 'HighlightedToken[] | undefined' is not assignable to type 'HighlightedToken[]'
// 2. Forces defensive programming:
const lineTokens = hasHighlighting ? (highlightedTokens[index] ?? []) : [];
Type Safety Improvements Applied:
- ✅ Added
noUncheckedIndexedAccess: true
to tsconfig.json - ✅ Created SafeArray utility functions
- ✅ Added ESLint rules for unsafe array access
- ✅ Added fast-check property tests for array synchronization
The Bug: Claude service resolves wrong request when multiple concurrent requests
Root Cause: String-based request matching without correlation IDs
Type Safety Analysis:
// Buggy pattern - requests identified by type only
pendingRequests.get(messageType)?.resolve(response);
// How branded types + proper correlation would catch it:
type RequestId = string & { _brand: 'RequestId' };
interface BaseRequest {
id: RequestId;
type: string;
}
interface BaseResponse {
requestId: RequestId; // Forces correlation
type: string;
}
// Now TypeScript forces proper request/response matching
const pendingRequest = pendingRequests.get(response.requestId);
Type Safety Improvements Applied:
- ✅ Added branded types for IDs
- ✅ Created discriminated unions for requests/responses
- ✅ Added request correlation patterns
The Bug: Timeline clicks didn't open files due to setTimeout race condition
Root Cause: Loose typing allowed any navigation state
Type Safety Analysis:
// Weak typing allowed bugs
onEntryPress: (entry: TimelineEntry) => void // No guarantee navigation happens
// Stronger typing would enforce navigation action
type NavigationAction =
| { type: 'OPEN_FILE'; filePath: string }
| { type: 'SHOW_HISTORY'; entry: TimelineEntry };
type TimelineEntryHandler = (entry: TimelineEntry) => NavigationAction;
// TypeScript now enforces that clicking produces a navigation action
Type Safety Improvements Applied:
- ✅ Added NavigationAction discriminated unions
- ✅ Made callbacks return explicit actions
- ✅ Added tests to verify navigation behavior
When you find any bug, systematically check:
- Could
noUncheckedIndexedAccess
have caught this? - Should we use optional chaining (
?.
) everywhere? - Are we making assumptions about array lengths?
- Could SafeArray utilities prevent this class of bugs?
- Are parameters too permissive (
any
,object
,string
)? - Could branded types prevent ID confusion?
- Should return types be more specific?
- Would Result types make errors explicit?
- Could discriminated unions enforce valid state transitions?
- Are we using unions where we should use intersections?
- Would readonly types prevent unintended mutations?
- Could state machines make invalid states unrepresentable?
- Are Promise types specific enough?
- Could we use branded types for different async operations?
- Would cancellation tokens prevent race conditions?
- Are error types explicit and actionable?
- Are callback types specific about what they return?
- Could we use discriminated unions for different component modes?
- Are we properly typing children and render props?
- Would stricter event handler types help?
// 1. Fix the immediate bug
const lineTokens = hasHighlighting ? (highlightedTokens[index] ?? []) : [];
// 2. Add type safety to prevent similar bugs
// Enable noUncheckedIndexedAccess in tsconfig.json
// 3. Create utility to make safe pattern easy
export const safeArrayAccess = <T>(arr: T[], index: number, fallback: T): T =>
arr[index] ?? fallback;
// 4. Add test that would have caught the original bug
it('should handle token array shorter than lines array', () => {
const content = 'line1\nline2\nline3'; // 3 lines
const tokens = [['token1']]; // 1 token
// This should not crash
expect(() => renderFileViewer(content, tokens)).not.toThrow();
});
// 5. Add property test for this class of bugs
it('should handle mismatched array lengths', () => {
fc.assert(
fc.property(
fc.array(fc.string()), // lines
fc.array(fc.array(fc.string())), // tokens
(lines, tokens) => {
// Property: Should never crash regardless of array lengths
expect(() => renderFileViewer(lines, tokens)).not.toThrow();
}
)
);
});
Keep a log of how each bug improved your type safety:
## Bug #47: FileViewer Array Access Crash
- **Date**: 2025-06-20
- **Bug**: `highlightedTokens[index]` was undefined, caused crash
- **Type Fix**: Added `noUncheckedIndexedAccess: true`
- **Tools Added**: SafeArray utilities, ESLint rules
- **Tests Added**: Property tests for array sync
- **Prevention**: All array access now type-safe
## Bug #52: Request Correlation Mix-up
- **Date**: 2025-06-19
- **Bug**: Wrong request resolved in concurrent scenario
- **Type Fix**: Added branded RequestId type
- **Tools Added**: Discriminated unions for req/res
- **Tests Added**: Concurrent request tests
- **Prevention**: TypeScript now enforces correlation
type FileState = 'closed' | 'opening' | 'open' | 'modified';
type File<S extends FileState> = {
path: string;
state: S;
content: S extends 'open' | 'modified' ? string : undefined;
};
// TypeScript enforces you can only read content from open files
const readContent = (file: File<'open' | 'modified'>): string => file.content;
type FilePath = `/${string}`;
type GitBranch = `refs/heads/${string}`;
// Prevents accidental string mixing
const openFile = (path: FilePath) => { /* ... */ };
openFile('/src/app.ts'); // ✅ OK
openFile('src/app.ts'); // ❌ Type error - missing leading slash
type JSONValue =
| string
| number
| boolean
| null
| { [key: string]: JSONValue }
| JSONValue[];
// Now JSON.parse return can be properly typed
const parseJSON = (str: string): JSONValue => JSON.parse(str);
Every bug teaches us about a gap in our type system. By systematically improving types after each bug, we build software that becomes progressively more robust and self-documenting.
Key mindset shifts:
- 🚫 "It's just a runtime error" → ✅ "How can types prevent this?"
- 🚫 "Add a null check" → ✅ "How can types make nulls impossible?"
- 🚫 "Catch the exception" → ✅ "How can types make this error explicit?"
- 🚫 "Add validation" → ✅ "How can types eliminate invalid states?"
- Enter nix environment:
nix-shell
- Install dependencies:
yarn install
(frontend) /cargo check
(backend) - Start development:
yarn start
(frontend) /cargo run
(backend) - Run tests:
yarn test
(frontend) /cargo fp-test
(backend) - Quality checks:
yarn typecheck
(frontend) /make rust-quality
(backend) - CRITICAL: Push changes and monitor CI until green
# Frontend
yarn start # Development
yarn build # Production build
yarn storybook # Component development
yarn build-storybook # Storybook build
# Testing
yarn test # Unit tests
yarn test:integration # Integration tests
yarn test:fullstack # Full-stack tests
yarn test:e2e:docker # Docker E2E tests
# Quality
yarn typecheck # TypeScript check
yarn lint # Lint check
yarn lint:fix # Fix lint issues
The secret to effective AI pair programming is treating your AI like a brilliant junior developer who has perfect memory but needs clear direction. The AI can write code faster than you can type, remember every API detail, and never gets tired—but it doesn't know your business context, can't feel user pain, and won't catch its own architectural mistakes.
Setting Up for Success:
- Give AI access to GitHub: Connect your AI to your repository and CI/CD. When it can see your failing tests and read your CI logs, it can fix issues autonomously while you focus on design decisions.
- Provide clear context: Start each session with "Here's what we're building and why." The AI needs to understand not just the task, but the purpose behind it.
- Use structured todo lists: AI assistants excel at methodical execution. A good todo list turns your AI from a code generator into a development partner.
- Share failure logs and diagnostics: Don't just say "it's broken"—paste the full error. Your AI can often spot the issue in seconds when given complete information.
- Iterate in small chunks: Big commits are hard to review and debug. Ask for small, focused changes that you can verify immediately.
- Always validate AI suggestions: Trust but verify. The AI might solve your problem in a way that creates three new problems. Test everything.
Debugging with AI is like having a senior developer who's seen every error message but needs you to provide the crime scene details. The more context you share, the faster you'll solve the problem.
The Debugging Dance:
- Share complete error messages: Not just the error type, but the full stack trace. That line number buried in the stack often holds the key.
- Provide relevant code context: Include the failing function and its callers. The bug might be in how the function is used, not the function itself.
- Explain expected vs. actual: "It should return an array of users, but it's returning undefined." This gap analysis helps the AI understand the problem space.
- Let AI generate edge case tests: Ask "What inputs might break this function?" AI excels at thinking of weird edge cases you missed.
- Request multiple solutions: "Give me three ways to fix this." Often the second or third approach is better than the obvious first solution.
When AI encounters genuinely difficult problems, it sometimes tries to escape rather than solve. You'll see this pattern:
- Suggests deleting the problematic code: "This function is too complex, let's remove it and use a simpler approach"
- Tries to skip failing tests: "This test seems flaky, we could skip it for now"
- Proposes workarounds instead of fixes: "Instead of fixing this race condition, we could just add a delay"
How to handle this:
AI: "This component is causing too many issues. We could simplify by removing the concurrent processing..."
You: "No, we need the concurrent processing. Let's debug why it's failing. Show me what's happening step by step."
AI: "The test for race conditions keeps failing. We could mark it as skip..."
You: "No, the test is catching a real bug. Let's use fast-check's scheduler to make it deterministic."
Remember: The AI works for you, not the other way around. When it tries to avoid hard problems, redirect it back to solving the root cause. Some of the best breakthroughs come from pushing through difficult bugs rather than working around them. Your job is to be the technical lead who says "we're going to solve this properly" when the AI wants to take shortcuts.
The best AI programming sessions feel like pair programming with a really fast typist. You handle the strategy, the AI handles the tactics, and together you move faster than either could alone.
The Rhythm of AI Collaboration:
- Define clear objectives: "We need to add user authentication using JWT tokens" is better than "add login functionality." Specificity unlocks AI potential.
- Break work into small, testable chunks: "First, create the user model. Then, add password hashing. Next, implement the login endpoint." Each chunk should be verifiable.
- Run tests frequently: After every AI-generated change, run your tests. Catching issues immediately is infinitely easier than debugging a large batch of changes.
- Share results transparently: "The login endpoint works, but the test for expired tokens is failing with this error: [paste error]." Good or bad, share what happened.
- Iterate based on feedback: Use test results and CI status to guide next steps. Let reality, not plans, drive your development.
- Document lessons learned: When you discover something non-obvious, add it to your project's CLAUDE.md. Your AI assistant's effectiveness compounds with better documentation.
Success in modern software development isn't just about shipping features—it's about maintaining velocity while increasing quality. These metrics aren't arbitrary numbers; they're indicators of a healthy codebase and a productive team. When these metrics are green, you can move fast with confidence. When they start slipping, they're early warning signs that technical debt is accumulating.
- Type Safety: Zero
any
types, strict TypeScript config - Test Coverage: 100% line coverage (excluding unreachable), comprehensive E2E tests
- Linting: Zero errors, consistent code style
- Performance: Fast build times, responsive UI
- CI Pipeline: <15 minutes total time
- Test Reliability: <1% flaky test rate
- Bug Detection: 95% caught before production
- Developer Experience: Quick feedback loops
- AI Integration: Efficient pair programming sessions
- Documentation: Clear, actionable best practices
- Knowledge Sharing: Lessons learned captured and applied
- Continuous Improvement: Regular retrospectives and updates
The only constant in software development is change. What works today might be obsolete tomorrow. Continuous improvement isn't just about fixing what's broken—it's about questioning what works and finding ways to make it better. It's the difference between a codebase that gets harder to work with over time and one that becomes more pleasant and productive.
Software development is like tending a garden—daily attention prevents weekly crises. These practices aren't bureaucracy; they're the habits that keep your codebase healthy and your team productive. Skip them, and you'll spend your time fighting fires instead of building features.
Weekly Rituals That Compound:
- Review and update best practices weekly: Your understanding evolves with every bug fixed and feature shipped. Capture these learnings in your documentation. Friday afternoons are perfect for this—reflect on the week's lessons while they're fresh.
- Add new tests for each feature: Not after. Not "when you have time." During. Every feature should arrive with its own test suite, like a product with batteries included. This isn't extra work—it's how you know you're done.
- Refactor tests to reduce duplication: Test code is code. Duplicate test code is technical debt. When you see the same setup in three tests, extract it. When you copy-paste assertions, create helpers. Clean tests are easier to understand and maintain.
- Monitor test execution times: A slow test suite is a test suite that doesn't get run. Track your test times weekly. When they creep up, investigate. That 30-second test suite that becomes 5 minutes? It'll kill your development velocity.
- Archive old artifacts after 30 days: Screenshots, test reports, build artifacts—they accumulate like digital dust. Set up automated cleanup. Your CI shouldn't fail because the disk is full of month-old screenshots nobody will ever look at.
- Always investigate root cause of failures
- Update practices based on lessons learned
- Share knowledge across team/project
- Improve tooling to prevent similar issues
- Test the fixes to ensure they work
- CI pipeline health and speed
- Test coverage and reliability
- Bug discovery rate by testing phase
- Developer productivity and satisfaction
- AI pair programming effectiveness
Use this checklist to track your progress implementing these practices:
- Zero
any
types - Every type is explicit and meaningful - CI runs in under 15 minutes - Fast feedback on every commit
- 100% test coverage - Essential for verifying AI-generated code (excluding unreachable)
- Zero skipped tests - Every test runs and passes
- All arrays accessed safely -
noUncheckedIndexedAccess: true
- Race condition tests for async operations - Using property-based testing
- AI has helped review your UI - Fresh eyes on every feature
- Living specs that evolve - Documentation that stays current
- You know which loop you're in - Always working with intention
- Your AI can explain your entire architecture - Because specs are complete
- New developers productive in < 1 day - Thanks to clear practices
- Production bugs down 90% - Most bugs now impossible
- You've contributed back - Share your learnings with others
- Which practice had the biggest impact?
- What was hardest to implement?
- What would you add to this guide?
These practices help you build reliable software quickly with AI assistance. Select the ones that fit your workflow and adapt them to your needs.
Remember: You're always in a loop. Choose the right one.
This document represents battle-tested practices from a real-world project using TypeScript, React Native, Rust, and AI pair programming. These practices evolved through iterative development, extensive testing, and continuous improvement based on actual challenges faced during development.