Skip to content

Instantly share code, notes, and snippets.

@typpo
Created April 4, 2026 22:13
Show Gist options
  • Select an option

  • Save typpo/b1729b4615545229b0d80730f71893d7 to your computer and use it in GitHub Desktop.

Select an option

Save typpo/b1729b4615545229b0d80730f71893d7 to your computer and use it in GitHub Desktop.
Codex PR Review Toolkit skill: delegated PR review with specialist subagents

Install this gist as a Codex skill by recreating this directory layout under ~/.codex/skills/pr-review-toolkit/:

  • pr-review-toolkit-SKILL.md -> SKILL.md
  • pr-review-toolkit-agents-openai.yaml -> agents/openai.yaml
  • pr-review-toolkit-references-code-reviewer.md -> references/code-reviewer.md
  • pr-review-toolkit-references-code-simplifier.md -> references/code-simplifier.md
  • pr-review-toolkit-references-comment-analyzer.md -> references/comment-analyzer.md
  • pr-review-toolkit-references-delegation-workflow.md -> references/delegation-workflow.md
  • pr-review-toolkit-references-pr-test-analyzer.md -> references/pr-test-analyzer.md
  • pr-review-toolkit-references-silent-failure-hunter.md -> references/silent-failure-hunter.md
  • pr-review-toolkit-references-type-design-analyzer.md -> references/type-design-analyzer.md

This skill implements a delegated PR review workflow for Codex: one coordinator plus specialist subagents for code, comments, tests, errors, types, and simplification.

interface:
display_name: "PR Review Toolkit"
short_description: "Delegated PR review with specialist subagents"
default_prompt: "Use $pr-review-toolkit to run a delegated review of my current changes before I open a pull request."

Code Reviewer

Use this pass for the general review that should happen on every PR.

Review Goal

Check the diff against project-specific rules and look for high-signal bugs, regressions, and maintainability issues. Read CLAUDE.md, AGENTS.md, README, nearby tests, and local patterns before flagging style problems.

What To Look For

  • Violations of explicit project rules
  • Likely correctness bugs
  • Security and data handling problems
  • Concurrency or lifecycle issues
  • Accessibility regressions when UI changes
  • Missing or obviously inadequate tests for risky behavior
  • Large consistency problems with surrounding code

Filtering Rules

  • Prefer issues with confidence >= 0.80.
  • Do not surface low-value nits unless the project explicitly requires them.
  • Do not report pre-existing issues outside the review scope unless they block understanding the diff.
  • Favor concrete evidence over speculative concerns.

Output Shape

For each finding provide:

  • Severity: Critical or Important
  • Confidence score
  • File and line reference
  • The broken rule, bug, or risk
  • A concrete fix or safer direction

If nothing crosses the confidence threshold, say so and summarize residual risk briefly.

Code Simplifier

Use this pass after the main review when the user asks for simplification, refactoring, or a polish pass.

Operating Modes

  • Review mode: identify simplification opportunities without editing code
  • Apply mode: edit code only when the user explicitly asks for the simplifications to be implemented

Review Goal

Improve clarity, consistency, and maintainability while preserving exact behavior.

What To Look For

  • Unnecessary nesting or control-flow indirection
  • Redundant abstractions or helper layers
  • Dense expressions that obscure intent
  • Names that hide the real responsibility
  • Logic split awkwardly across too many tiny units
  • Comments that exist only because the code is hard to read

Guardrails

  • Preserve all externally visible behavior unless the user asks otherwise.
  • Prefer explicit code over clever compression.
  • Do not trade clarity for fewer lines.
  • Avoid refactors that mix unrelated concerns.
  • Follow established local patterns instead of introducing a new style in one file.

Output Shape

In review mode, report:

  • High-value Simplifications
  • Optional Cleanups
  • Keep As-Is cases where extra refactoring would overcomplicate the code

In apply mode, make the smallest clear refactor that improves readability and then summarize what changed.

Comment Analyzer

Use this pass when comments, docstrings, inline documentation, or README-style explanations changed, or when the user explicitly asks for comment review.

Review Goal

Protect the codebase from comment rot. Every comment should be accurate, durable, and worth keeping.

What To Check

  • Factual accuracy against the code as written
  • Missing context for non-obvious behavior
  • Incorrect parameter, return, or type descriptions
  • Claims about edge cases, complexity, or guarantees that the code does not uphold
  • Comments that merely restate obvious code
  • TODO or FIXME notes that are stale or already resolved
  • Transitional or temporary wording that will age badly

Judgment Rules

  • Prefer comments that explain why, assumptions, or surprising behavior.
  • Be skeptical of comments that enumerate step-by-step what when the code is already clear.
  • Flag misleading comments as higher severity than missing comments.
  • Recommend deletion when a comment adds no durable value.

Output Shape

Use these buckets:

  • Critical Issues: inaccurate or misleading comments
  • Improvement Opportunities: comments that need clarification or added context
  • Recommended Removals: comments that should be deleted
  • Positive Findings: comments worth keeping as examples

Do not edit the comment directly unless the user later asks for fixes.

Delegation Workflow

Use this file when coordinating a full PR review through specialist subagents.

Coordinator Responsibilities

  1. Determine one concrete scope for every specialist:
    • repo or working directory
    • diff source: PR diff, staged diff, unstaged diff, or explicit files
    • requested aspects
  2. Launch one specialist subagent per aspect.
  3. Wait for results only after all independent launches are in flight.
  4. Merge, deduplicate, and prioritize findings.
  5. Present one combined report.

Before launching, check whether the spawn_agent tool is available in the current environment. If it is available, use it. Do not skip delegation just because a single-agent review would also work.

Preferred Launch Pattern

Use explorer subagents for read-only aspect analysis. They are well-suited to bounded codebase questions.

Prompt template:

Use $pr-review-toolkit at /absolute/path/to/pr-review-toolkit to review the current changes in /absolute/path/to/repo.

Aspect: <code|comments|tests|errors|types|simplify>
Scope: <PR diff | staged diff | unstaged diff | explicit files>
Files or directories to inspect: <paths if narrowed>
Reference: /absolute/path/to/pr-review-toolkit/references/<matching-file>.md
Mode: review-only

Instructions:
- Follow the referenced aspect guide exactly.
- Review only the requested scope.
- Report only concrete findings that matter.
- Include file and line references for every actionable finding.
- If nothing material is wrong, say so explicitly.
- Do not edit files.

Parallelism

Spawn independent aspects in parallel:

  • code
  • comments
  • tests
  • errors
  • types
  • simplify

Usually code plus any explicitly requested applicable aspects is enough. Skip obviously irrelevant aspects instead of launching low-signal reviews.

Aggregation Rules

  • Merge duplicate findings under the strongest explanation.
  • Preserve the specialist label in the final report.
  • Order by severity first, then by confidence.
  • Surface scope assumptions and skipped aspects after findings.

Fallback

If spawn_agent is unavailable, run the same prompts locally in sequence and state that the review used the fallback single-agent path.

PR Test Analyzer

Use this pass when behavior changed, tests changed, or the user asks whether coverage is good enough.

Review Goal

Evaluate behavioral coverage, not raw line coverage. Focus on tests that would catch real regressions.

What To Look For

  • Untested critical branches in the changed code
  • Missing negative cases for validation and permission logic
  • Missing coverage for retries, async flows, concurrency, or partial failure modes
  • Missing edge cases around boundaries, empty states, and malformed input
  • Tests coupled too tightly to implementation details
  • Assertions that are too weak to catch the intended regression

Rating Guidance

Rate each missing test or weakness from 1-10:

  • 9-10: likely production breakage, data loss, security exposure, or major outage
  • 7-8: important user-facing or business-logic regression
  • 5-6: meaningful edge case or resilience improvement
  • 1-4: optional coverage improvement

Output Shape

Use these buckets:

  • Critical Gaps: rated 8-10
  • Important Improvements: rated 5-7
  • Test Quality Issues: brittle or implementation-bound tests
  • Positive Observations: strong coverage already present

For every recommendation, explain the failure it would catch.

Silent Failure Hunter

Use this pass when the diff changes error handling, retries, fallbacks, logging, remote calls, storage boundaries, or any code that can fail at runtime.

Review Goal

Find errors that could be swallowed, hidden, misreported, or converted into confusing fallback behavior.

What To Check

  • Empty or overly broad catch blocks
  • Logging without enough context to debug later
  • Fallbacks that hide the real problem from the user
  • Returning default values after failures without surfacing the error
  • Optional chaining or null-coalescing that silently skips necessary work
  • Retry flows that exhaust attempts without a clear final error
  • Errors that should propagate but are swallowed locally

Severity Guidance

  • Critical: silent failures, swallowed exceptions, broad catch blocks hiding unrelated faults
  • High: weak user feedback, unjustified fallback behavior, missing or misleading logging
  • Medium: incomplete context or imprecise handling that is still recoverable

Output Shape

For each finding include:

  • Location
  • Severity
  • What can fail and how the current code hides or blurs it
  • User or operator impact
  • A specific remediation path

If you can name concrete hidden failure modes, list them.

Type Design Analyzer

Use this pass when the diff introduces or changes types, schemas, interfaces, DTOs, domain models, or state machines.

Review Goal

Judge whether the type design expresses and enforces the right invariants without adding unnecessary complexity.

Analysis Frame

For each important type, identify:

  • The invariants the type appears to require
  • Where invalid states can still slip through
  • Whether the public surface is too wide
  • Whether construction and mutation points enforce the same rules
  • Whether compile-time structure communicates the rules clearly

Ratings

Rate each dimension from 1-10 and justify briefly:

  • Encapsulation
  • Invariant Expression
  • Invariant Usefulness
  • Invariant Enforcement

Common Problems

  • Illegal states remain representable
  • Validation happens only in comments or calling code
  • Mutable internals leak through the API
  • One type is carrying unrelated responsibilities
  • Runtime checks exist in only some mutation paths

Output Shape

For each important type use:

  • Type
  • Invariants Identified
  • Ratings
  • Strengths
  • Concerns
  • Recommended Improvements

Keep suggestions pragmatic. Favor meaningful guarantees over theoretical purity.

name pr-review-toolkit
description Delegated pull request review workflow for Codex that mirrors Anthropic's pr-review-toolkit by launching specialized subagents for general code quality, comment accuracy, test coverage, silent failures and error handling, type design, and post-review simplification. Use when reviewing the current branch, an open GitHub PR, staged or unstaged changes, or an explicit diff. Trigger on requests like "review this PR", "check test gaps", "look for silent failures", "review the docs/comments", "analyze new types", or "suggest simplifications before merge".

PR Review Toolkit

Run a structured pull request review as several focused passes over the same change set. Default to delegated review: one coordinator spawns specialized subagents, each reviews a single aspect, then the coordinator merges their findings. Do not edit code unless the user explicitly asks to apply fixes or simplifications after the review.

Quick Start

  1. Determine the review scope.
  2. Select the requested review aspects.
  3. Launch one specialist subagent per requested aspect.
  4. Merge findings into one severity-ordered report.

Determine Scope

Prefer explicit user scope first. Otherwise inspect local changes:

git status --short
git diff --name-only --cached
git diff --name-only

If the branch has an open GitHub PR and gh is available, prefer the PR diff:

gh pr view --json number,title,baseRefName,headRefName
gh pr diff --name-only

If gh is unavailable, unauthenticated, or no PR exists, fall back to local git diff immediately and state the assumption. Review the actual changed hunks, not just filenames. Keep the scope as narrow as correctness allows.

Supported Review Aspects

Map the user request onto these aspect keys:

  • code: general code review against project rules and likely bugs
  • comments: accuracy and usefulness of comments and docs
  • tests: behavioral coverage, regression resistance, and missing cases
  • errors: silent failures, swallowed errors, fallback behavior, and logging
  • types: type design, invariants, model boundaries, and illegal states
  • simplify: refactoring or simplification opportunities that preserve behavior
  • all: all applicable aspects

Run code on every review unless the user explicitly narrows scope. Run the other aspects when relevant files changed, matching patterns appear in the diff, or the user requested them directly.

Treat simplify as opt-in for edits. In a normal review, report simplification opportunities in the suggestions section. Only modify code when the user explicitly asks to apply them.

Delegation Model

Default to spawning specialist subagents when the host exposes the spawn_agent tool. This is the primary execution path, not a nice-to-have. The coordinator owns:

  • defining the exact review scope
  • selecting aspects
  • launching specialists in parallel when independent
  • deduplicating overlapping findings
  • producing the final merged review

Use one subagent per aspect. Keep each subagent read-only unless the user explicitly asked for fixes.

If spawn_agent is available, do not silently choose local review. Launch the specialists. Fall back to local review only when spawn_agent is unavailable or the user explicitly forbids delegation.

Prefer these launch settings:

  • agent_type: "explorer" for read-only aspect reviews
  • fork_context: false when you can restate scope explicitly
  • fork_context: true only if the current thread contains essential scope details that are hard to restate
  • reasoning_effort: "medium" by default

Read references/delegation-workflow.md before launching the first specialist.

Specialist Passes

Read only the reference files needed for the current review:

  • references/delegation-workflow.md
  • references/code-reviewer.md
  • references/comment-analyzer.md
  • references/pr-test-analyzer.md
  • references/silent-failure-hunter.md
  • references/type-design-analyzer.md
  • references/code-simplifier.md

Each pass should inspect the same diff through a different lens. Avoid duplicate findings unless a second pass adds materially different reasoning.

If delegation is available, spawn all independent aspect reviews before waiting on results. Only wait when aggregation is the next critical step. If delegation is unavailable, run the same aspect passes locally and say that you fell back to single-agent review.

Output Contract

Follow review mode strictly:

  • Lead with findings, ordered by severity.
  • Include file and line references for every actionable issue.
  • Explain why the issue matters and what change would fix it.
  • Call out assumptions or scope gaps.
  • Include positive observations only after findings.
  • If no findings exist, say so explicitly and mention residual risk or unreviewed areas.

Use this severity structure:

  1. Critical Issues
  2. Important Issues
  3. Suggestions
  4. Positive Observations
  5. Recommended Next Steps

When the host supports inline review comments, emit one inline comment per finding in addition to the summary.

Delegation Rules

For every specialist subagent:

  • pass the exact repo or directory under review
  • pass the exact aspect name
  • pass whether to inspect PR diff, staged diff, unstaged diff, or explicit files
  • point the subagent at the matching reference file
  • require file and line references for every actionable finding
  • require review-only mode unless the user asked for code changes

Use these aspect-to-reference mappings:

  • code -> references/code-reviewer.md
  • comments -> references/comment-analyzer.md
  • tests -> references/pr-test-analyzer.md
  • errors -> references/silent-failure-hunter.md
  • types -> references/type-design-analyzer.md
  • simplify -> references/code-simplifier.md

If an aspect is clearly not applicable, skip it and note why in the final summary rather than forcing a low-value review.

Trigger Examples

  • Use $pr-review-toolkit to review this branch before I open a PR.
  • Use $pr-review-toolkit to check tests and error handling for my current diff.
  • Use $pr-review-toolkit to review new types in this PR.
  • Use $pr-review-toolkit to suggest simplifications after the review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment