Prompts for auditing code quality and test health. Copy the prompt, paste it after a PR / at end of week / monthly.
Review the code we just changed:
1. Did we introduce or worsen duplication?
2. Did we make any file harder to delete (more coupled)?
3. Did we push a decision further from where it's used?
4. Did we make a test harder to write than the code it tests?
5. Did we name anything I'll regret in a month? (e.g. Did I leak UI labels into data models, or data model names into the UI?)
Only suggest extraction when it makes the call site easier to read, not when it just moves complexity somewhere else. Don't suggest abstractions unless a pattern appears 3+ times. If you flag something, tell me whether to fix it now or leave it for later, and why.
Look at everything we shipped this week as a whole, not as individual PRs:
1. Did the same pattern appear 3+ times across different PRs? (extraction candidate)
2. Did any file get touched in 3+ separate PRs? (it's either a god file or a missing abstraction)
3. Are any files pulling away from the pack in size? (approaching 400+ lines)
4. Did we solve the same problem differently in two places? (inconsistency, pick one and align)
5. Did we add a new dependency, util, or pattern without checking if we already had one that does the same thing?
For each finding: name the specific files, propose the refactor, estimate 15-minute fix or dedicated PR, and rank by how much future pain it prevents vs effort to do it now.
Step back from the code and look at the shape of the codebase:
1. Where would a new engineer get confused or need to ask someone? (onboarding friction = design friction)
2. Are there files over 400 lines? What's growing and why?
3. Are there components that started simple but now have 5+ props controlling behavior? (god component emerging)
4. Are we copy-pasting between features instead of composing? (look at recent feature PRs -- how much was new vs copied from another feature?)
5. If we had to swap out a dependency (React Query, Tailwind, shadcn), how many files would we touch? (coupling to libraries should have a boundary)
6. Are there parts of the codebase where the naming or structure no longer matches what the feature actually does? (the code evolved but the names didn't)
Produce a ranked backlog: each item gets a title, the files involved, the specific refactor, and a t-shirt size (S/M/L). Group into "do this sprint," "do next sprint," and "track but don't act yet."
Review the tests we just wrote:
1. Do the tests break if we refactor without changing behavior? (testing implementation details)
2. Are we mocking something we could use for real? Every mock is a place the test diverges from reality.
3. Do the tests read like user actions and outcomes, or like internal function calls? (click button, see result -- not "expect setOpenIndex to be called with 1")
4. Did we skip testing an error path or edge case because it was annoying to set up? (that's the one that'll bite you)
5. Are any tests making real API calls? All network requests must be mocked. A test that hits a real endpoint is a test that will fail at 2am for reasons that have nothing to do with your code.
Tests should resemble the way the software is used. If a test requires more setup than the code it tests, the code's design is telling you something.
Look at the test suite as a whole:
1. Are there tests that keep breaking when we ship unrelated features? (brittle, probably testing implementation details)
2. Are we testing the same thing at multiple levels? (unit test + integration test + e2e all covering the exact same happy path is waste)
3. Did we add a feature this week with no tests? Flag it -- don't backfill everything, but know where the gaps are.
4. Are there tests that never fail? They might not be testing anything real.
1. Run coverage. Not to hit a number, but to find untested use cases. Look at uncovered lines and ask "what user action would hit this?" If the answer is "a critical one," write that test.
2. Are our integration tests actually integrating, or are they unit tests with extra steps? (rendering a component but mocking every child and every hook is a unit test in disguise)
3. What's our most critical user flow? Could a new engineer break it and CI would catch it before merge?
Reference for writing React tests. Not a prompt -- just keep it handy.
- Query by *ByRole with name, not by test ID or class name
- getBy for existence, queryBy only for non-existence, findBy for async
- userEvent over fireEvent
- Never put side effects inside waitFor
- One assertion per waitFor callback
- Never shallow-render
- The Wrong Abstraction -- Sandi Metz
- AHA Programming -- Kent C. Dodds
- Goodbye, Clean Code -- Dan Abramov
- Google Engineering Practices
- Is High Quality Software Worth the Cost? -- Martin Fowler
- Write tests. Not too many. Mostly integration. -- Kent C. Dodds
- Testing Implementation Details -- Kent C. Dodds
- Common Mistakes with React Testing Library -- Kent C. Dodds
- How to Know What to Test -- Kent C. Dodds