Node outlines by Claude Sonnet 4.5 of Test Doubles, ch. 13 of "Software Engineering at Google" https://abseil.io/resources/swe-book/html/ch13.html

Test Doubles
- Introduction
  - Unit tests critical but difficult for complex code
  - Example: testing function that hits external server + database
  - Test double definition: object/function standing in for real implementation (like stunt double)
  - Avoid term "mocking" (ambiguous)
  - Types: simpler implementations (in-memory DB), validation of system details, triggering rare errors
  - Enable small tests despite production code needing multiple processes/machines
  - Test doubles more lightweight than real implementations
  - Complications and trade-offs introduced
  - Google's experience: benefits when used properly, negative impact when misused
  - Historical lesson: danger of overusing mocking frameworks
    - Initially seemed perfect for every case
    - Easy to write focused, isolated tests
    - Years later: high maintenance cost, rarely found bugs
    - Pendulum swinging back toward realistic tests
  - Practice varies widely across teams at Google
    - Inconsistent knowledge
    - Inertia in existing codebases
    - Short-term ease vs long-term consequences
- The Impact of Test Doubles on Software Development
  - Basic concepts foundation for best practices
  - Testability
    - Code is testable if written to allow unit tests
    - Seam: makes code testable by allowing test doubles
    - Enables using different dependencies in tests vs production
    - Dependency injection
      - Common technique for introducing seams
      - Classes receive dependencies as parameters instead of instantiating directly
      - Enables substitution in tests
      - Example: PaymentProcessor constructor accepts CreditCardService
      - Production passes real implementation, tests pass test double
      - Automated DI frameworks reduce boilerplate (Guice, Dagger at Google)
      - Dynamic languages (Python, JavaScript) can replace functions/methods
      - DI less important in dynamic languages
    - Testability requires upfront investment
      - Critical early in codebase lifetime
      - Later = more difficult to apply
      - Code without testing in mind needs refactoring/rewriting before adding tests
  - Applicability
    - Mocking frameworks
      - Software library for creating test doubles within tests
      - Creates "mock" with inline-specified behavior
      - Reduces boilerplate vs defining new classes
      - Example: Mockito for Java
      - Available for most major languages
      - Google uses: Mockito (Java), googlemock (C++), unittest.mock (Python)
      - Significant caveats: overuse makes codebase harder to maintain
      - Problems covered later in chapter
  - Techniques for Using Test Doubles
    - Three primary techniques
    - Brief intro for quick overview
    - Detailed discussion later
    - Engineer awareness of distinctions helps choose appropriate technique
    - Faking
      - Lightweight API implementation
      - Behaves like real implementation
      - Not suitable for production
      - Example: in-memory database
      - Often ideal technique when test double needed
      - May not exist for needed object
      - Writing one challenging: must ensure similar behavior now and future
    - Stubbing
      - Giving behavior to function with no behavior on its own
      - Specify exact return values
      - Example: when(...).thenReturn(...) in Mockito
      - Typically done through mocking frameworks
      - Reduces boilerplate
      - Quick and simple but has limitations (discussed later)
    - Interaction testing
      - Validate how function is called without calling implementation
      - Test fails if function not called correctly (not at all, too many times, wrong args)
      - Example: verify(...) in Mockito
      - Sometimes called "mocking" (avoid this term - confusing)
      - Typically done through mocking frameworks
      - Reduces boilerplate for tracking calls and arguments
      - Useful in certain situations but avoid when possible
      - Overuse causes brittle tests
- Real Implementations
  - Prefer Realism Over Isolation
    - First choice: use real implementations (same as production)
    - Higher fidelity when executing code as in production
    - Preference developed over time at Google
    - Saw overuse of mocking frameworks pollute tests
      - Repetitive code
      - Out of sync with real implementation
      - Made refactoring difficult
    - Known as "classical testing"
    - Contrast: "mockist testing" prefers mocking frameworks
    - Google found mockist testing difficult to scale
    - Requires strict design guidelines
    - Most Google engineers write code suitable for classical testing
    - Real implementations make system under test more realistic
    - All code in real implementations executed in test
    - Test doubles isolate system under test from dependencies
    - Prefer realistic tests for confidence
    - If unit tests rely too much on test doubles: need integration tests or manual verification
    - Extra tasks slow development, allow bugs to slip through
    - Replacing all dependencies arbitrarily isolates implementation
    - Good test should be independent of implementation
    - Should test API, not implementation structure
    - Test failing from bug in real implementation is good
    - Indicates code won't work in production
    - Bug can cause cascade of test failures
    - Good developer tools (CI) make tracking failures easy
  - When Should You Use a Real Implementation?
    - Preferred if fast, deterministic, simple dependencies
    - Use for value objects (money, date, address, collections)
    - For complex code: often not feasible
    - No exact answer - trade-offs to consider
    - Execution time
      - Unit tests should be fast
      - Want quick feedback during development
      - Want quick finish in CI
      - Test double useful when real implementation slow
      - No exact threshold for "too slow"
        
        1ms added per test: not slow
        
        10ms, 100ms, 1s, etc: depends on context
      - Depends on productivity loss, number of tests using implementation
      - 1s extra reasonable for 5 tests, not for 500
      - Borderline: simpler to use real implementation until too slow
      - Then update to test doubles
      - Parallelization helps reduce execution time
      - Google infrastructure: trivial to split tests across servers
      - Increases CPU cost, large developer time savings
      - Trade-off: real implementation increases build times
      - Must build real implementation + all dependencies
      - Scalable build systems (Bazel) help with caching
    - Determinism
      - Deterministic: for given version, test always same outcome (always pass or always fail)
      - Nondeterministic: outcome can change even if system under test unchanged
      - Nondeterminism leads to flakiness
      - Occasional failures even with no changes
      - Flakiness harms test suite health
      - Developers distrust results, ignore failures
      - If rare flakiness: might not warrant response
      - If frequent: replace real implementation with test double
      - Real implementation more complex than test double
      - Increases nondeterminism likelihood
      - Example: multithreading can cause occasional failures
      - Output differs based on thread execution order
      - Common cause: code not hermetic
      - Dependencies on external services outside test control
      - Example: reading web page can fail (server overloaded, content changes)
      - Use test double instead
      - If not feasible: hermetic server instance (life cycle controlled by test)
      - Hermetic instances discussed in next chapter
      - Another example: code relying on system clock
      - Output differs based on current time
      - Test double can hardcode specific time
    - Dependency construction
      - Real implementation: must construct all dependencies
      - Entire dependency tree: object + its dependencies + their dependencies, etc.
      - Test double often has no dependencies
      - Much simpler to construct
      - Extreme example: new Foo(new A(new B(new C()), new D()), new E(), ..., new Z())
      - Time-consuming to determine construction
      - Tests need constant maintenance when constructors change
      - Tempting to use test double (trivial construction)
      - Example: @Mock Foo mockFoo;
      - Creating test double simpler but significant benefits to real implementation
      - Significant downsides to overusing test doubles
      - Trade-off needed
      - Ideal solution: use same object construction as production
      - Factory method or automated dependency injection
      - Object construction needs flexibility for test doubles
      - Can't hardcode production implementations
- Faking
  - If real implementation not feasible: fake often best option
  - Fake preferred over other techniques
  - Behaves similarly to real implementation
  - System under test can't tell difference
  - Example: fake file system with in-memory storage
  - Why Are Fakes Important?
    - Powerful testing tool
    - Execute quickly
    - Effectively test code without real implementation drawbacks
    - Single fake can radically improve testing experience
    - Many fakes = enormous boost to engineering velocity
    - Where fakes are rare: slower velocity
    - Engineers struggle with real implementations (slow, flaky tests)
    - Or resort to stubbing/interaction testing (unclear, brittle, less effective)
  - When Should Fakes Be Written?
    - Requires more effort and domain experience
    - Must behave similarly to real implementation
    - Requires maintenance when real implementation changes
    - Team owning real implementation should write and maintain fake
    - Trade-off: productivity improvements vs costs of writing/maintaining
    - Few users: might not be worth it
    - Hundreds of users: obvious productivity improvement
    - Create fake only at root of code not feasible for tests
    - Example: if database can't be used, fake the database API itself
    - Not each class calling database API
    - Maintaining fake burdensome if duplicated across languages
    - Solution: single fake service implementation
    - Client libraries send requests to fake service
    - More heavyweight (cross-process communication)
    - Reasonable trade-off if tests still execute quickly
  - The Fidelity of Fakes
    - Most important concept: fidelity
    - How closely fake behavior matches real implementation
    - If behavior doesn't match: test not useful
    - Test might pass but code path might not work in real implementation
    - Perfect fidelity not always feasible
    - Fake necessary because real implementation unsuitable
    - Example: fake database doesn't store on hard drive (uses memory)
    - Primarily: maintain fidelity to API contracts
    - For any input: same output and state changes as real implementation
    - Example: database.save(itemId) saves when ID doesn't exist, errors when exists
    - Fake must conform to same behavior
    - Think of perfect fidelity from test's perspective
    - Example: hashing API fake doesn't need exact same hash values
    - Tests care about unique hash for given input, not specific value
    - If API contract doesn't guarantee specific values: fake still conforming
    - Other examples where perfect fidelity not useful: latency, resource consumption
    - Can't use fake if explicitly testing these constraints (performance tests)
    - Resort to other mechanisms (real implementation)
    - Fake might not need 100% functionality
    - Especially behavior not needed by most tests (rare error handling)
    - Best to fail fast: raise error if unsupported code path executed
    - Communicates fake not appropriate in this situation
  - Fakes Should Be Tested
    - Fake must have own tests
    - Ensures conformance to API of real implementation
    - Without tests: behavior can diverge as real implementation evolves
    - One approach: contract tests
    - Write tests against API's public interface
    - Run tests against both real implementation and fake
    - Tests against real implementation slower
    - Downside minimized: only run by fake owners
  - What to Do If a Fake Is Not Available
    - First: ask API owners to create one
    - Might not be familiar with fakes concept
    - Might not realize benefits
    - If owners unwilling/unable: write your own
    - One way: wrap all API calls in single class
    - Create fake version not talking to API
    - Simpler than faking entire API
    - Often need only subset of API behavior
    - At Google: some teams contributed fake to API owners
    - Allowed other teams to benefit
    - Finally: settle on real implementation (deal with trade-offs)
    - Or resort to other test double techniques (deal with their trade-offs)
    - Think of fake as optimization
    - If tests too slow with real implementation: create fake for speed
    - If speedup doesn't outweigh creation/maintenance work: stick with real implementation
- Stubbing
  - Way for test to hardcode behavior for function with no behavior
  - Often quick and easy to replace real implementation
  - Example: simulating credit card server response
  - Easy to apply: tempting to use when real implementation not trivial
  - Overuse causes major productivity losses for maintenance
  - The Dangers of Overusing Stubbing
    - Tests become unclear
      - Stubbing involves writing extra code to define behavior
      - Extra code detracts from test intent
      - Difficult to understand if unfamiliar with implementation
      - Key sign stubbing inappropriate: mentally stepping through system under test
      - To understand why functions are stubbed
    - Tests become brittle
      - Stubbing leaks implementation details into test
      - When implementation changes: update tests
      - Ideally: test changes only if user-facing behavior changes
      - Should be unaffected by implementation changes
    - Tests become less effective
      - No way to ensure stubbed function behaves like real implementation
      - Example: when(stubCalculator.add(1, 2)).thenReturn(3)
      - Hardcodes part of contract
      - Poor choice if system under test depends on real contract
      - Forced to duplicate contract details
      - No guarantee contract is correct (no fidelity guarantee)
      - No way to store state with stubbing
      - Difficult to test certain aspects
      - Example: database.save(item) then database.get(item.id())
      - Real implementation/fake: both access internal state
      - Stubbing: no way to do this
      - Example of overuse: test with many when() statements
      - Example of refactored test: shorter, no implementation details exposed
      - No special setup needed: credit card server knows how to behave
      - Don't want test talking to external server
      - Fake credit card server more suitable
      - If fake unavailable: real implementation with hermetic server
      - Increases execution time
  - When Is Stubbing Appropriate?
    - Not catch-all replacement for real implementation
    - Appropriate when needing function to return specific value
    - Gets system under test into certain state
    - Example: requiring non-empty list of transactions
    - Function behavior defined inline
    - Can simulate wide variety of return values or errors
    - Might not be possible to trigger from real implementation/fake
    - Each stubbed function should have direct relationship with test assertions
    - Purpose should be clear
    - Test typically should stub small number of functions
    - Many stubbed functions: less clear tests
    - Can be sign of stubbing overuse
    - Or system under test too complex (should refactor)
    - Even when appropriate: real implementations/fakes still preferred
    - Don't expose implementation details
    - Give more correctness guarantees
    - Stubbing reasonable as long as usage constrained
    - Tests shouldn't become overly complex
- Interaction Testing
  - Validate how function is called without calling implementation
  - Mocking frameworks make interaction testing easy
  - Important to perform only when necessary
  - Keeps tests useful, readable, resilient to change
  - Prefer State Testing Over Interaction Testing
    - State testing preferred over interaction testing
    - State testing: call system under test, validate correct return value or state change
    - Example: sorting numbers, validating sorted result
    - Doesn't matter which algorithm used
    - Interaction testing example: can't determine numbers actually sorted
    - Test doubles don't know how to sort
    - Only tells you system under test tried to sort
    - At Google: emphasizing state testing more scalable
    - Reduces test brittleness
    - Easier to change and maintain code over time
    - Primary issue: can't tell system under test working properly
    - Only validates certain functions called as expected
    - Requires assumption about code behavior
    - Example: "If database.save(item) called, assume item saved"
    - State testing validates this assumption
    - Actually saves and queries to validate existence
    - Another downside: utilizes implementation details
    - To validate function called: expose that system under test calls function
    - Similar to stubbing: extra code makes tests brittle
    - Leaks implementation details into tests
    - Some Google engineers call these "change-detector tests"
    - Fail in response to any production code change
    - Even if behavior unchanged
  - When Is Interaction Testing Appropriate?
    - Some cases warrant interaction testing:
    - Cannot perform state testing: unable to use real implementation or fake
    - Real implementation too slow, no fake exists
    - Fallback: interaction testing to validate certain functions called
    - Not ideal but provides basic confidence
    - Differences in number/order of calls would cause undesired behavior
    - Interaction testing useful: difficult to validate with state testing
    - Example: caching feature should reduce database calls
    - Verify database not accessed more than expected
    - Mockito example: verify(databaseReader, atMostOnce()).selectRecords()
    - Interaction testing not complete replacement for state testing
    - If can't perform state testing in unit test: supplement with larger-scoped tests
    - Larger-scope tests perform state testing
    - Example: unit test validates database usage via interaction testing
    - Add integration test performing state testing against real database
    - Larger-scope testing important for risk mitigation
    - Discussed in next chapter
  - Best Practices for Interaction Testing
    - Following practices reduce impact of downsides
    - Prefer interaction testing only for state-changing functions
      - System under test calls dependency function: falls into two categories
      - State-changing: observable side effects (sendEmail, saveRecord, logAccess)
      - Non-state-changing: returns value, no side effects (getUser, findResults, readFile)
      - In general: perform interaction testing only for state-changing functions
      - Non-state-changing interaction testing usually redundant
      - System under test uses return value for other work you can assert
      - Interaction itself not important for correctness (no side effects)
      - Makes test brittle: update test when interaction pattern changes
      - Less readable: additional assertions obscure important assertions
      - State-changing interactions represent useful work changing state
      - Example: testing both types
      - addPermission() state-changing: reasonable to test interaction
      - getPermission() non-state-changing: not needed
      - Clue: getPermission() already stubbed earlier
    - Avoid overspecification in interaction tests
      - Test behaviors rather than methods (from Unit Testing chapter)
      - Test method should verify one behavior
      - Not multiple behaviors in single test
      - Apply same principle to interaction testing
      - Avoid overspecifying which functions and arguments validated
      - Leads to clear, concise tests
      - Tests resilient to changes outside test scope
      - Fewer tests fail if function call changed
      - Example of overspecification: test validates user name in greeting
      - Test fails if unrelated behavior changed
      - Validates all setText() arguments
      - Fails if setIcon() not called (incidental behavior)
      - Example of well-specified tests: behaviors split into separate tests
      - Each test validates minimum necessary for correctness
      - Uses eq() for relevant arguments, any() for others
- Conclusion
  - Test doubles crucial to engineering velocity
  - Help comprehensively test code
  - Ensure tests run fast
  - Misuse: major drain on productivity
  - Can lead to unclear, brittle, less effective tests
  - Important for engineers to understand best practices
  - Often no exact answer: real implementation vs test double
  - Or which test double technique to use
  - Engineer might need trade-offs for their use case
  - Test doubles great for working around difficult dependencies
  - To maximize confidence: still want to exercise dependencies in tests
  - Next chapter: larger-scope testing
  - Uses dependencies regardless of suitability for unit tests
  - Even if slow or nondeterministic
- TL;DRs
  - A real implementation should be preferred over a test double
  - A fake is often the ideal solution if a real implementation can't be used in a test
  - Overuse of stubbing leads to tests that are unclear and brittle
  - Interaction testing should be avoided when possible: it leads to tests that are brittle because it exposes implementation details of the system under test

lmmx/1_compressed_outline_43L.md