Purpose: This checklist is optimized for AI assistants (like Cursor) to perform automated PR reviews. It separates automatable checks from those requiring human judgment, provides specific patterns to detect, and includes commands to run.
- Run all AUTOMATED checks first and report findings with severity levels
- Flag items for HUMAN REVIEW with context and relevant code snippets
- Use the structured output format at the end of this document
- Prioritize by severity: CRITICAL > HIGH > MEDIUM > LOW
- 🔴 CRITICAL: Must be fixed before merge (breaking changes, test failures, security issues)
- 🟠 HIGH: Should be fixed (missing tests, incomplete docs, API violations)
- 🟡 MEDIUM: Recommended fixes (style issues, minor inefficiencies)
- 🔵 LOW: Suggestions (code clarity, optional improvements)
- 🟣 HUMAN: Requires human judgment (design decisions, scope questions)
# Commands to run:
git diff --name-only origin/main
git diff --stat origin/main
Automated Checks:
-
🟠 Modified files are related: All changed files relate to stated purpose (no unrelated changes)
- Pattern: Check if files span multiple unrelated modules
- Action: List any files that seem unrelated to PR description
-
🟠 All modifications in a file are related: All changes in a file relate to stated purpose (no unrelated changes)
- Pattern: Check if unrelated parts of a file are modified
- Action: List any changes that seem unrelated to PR description
-
🟡 File organization: New files in appropriate directories
- Pattern:
.py
files in module dirs, tests intests/
, docs indoc/
- Action: Flag any files in unexpected locations
- Pattern:
# Check commit messages and PR title for prefixes
git log origin/main..HEAD --oneline
Automated Detection:
- Identify PR type from title: FIX, FEAT, ENH, DOC, MAINT, CI, API
- Extract issue numbers from PR description (e.g., "Fixes #1234")
- Check if issue exists and is open:
gh issue view <number>
# Search for problematic imports
grep -r "import \*" sklearn/ --include="*.py"
grep -r "from .* import \*" sklearn/ --include="*.py"
Automated Checks:
-
🔴 No wildcard imports:
import *
not used anywhere- Command:
grep -r "import \*" <changed_files>
- Report: File paths where found
- Command:
-
🟡 Import order: Absolute imports used (not relative where inappropriate)
- Pattern: Check for excessive relative imports in new code
# Patterns to check in changed code
Automated Checks:
-
🟡 Underscore naming: Non-class names use underscores (e.g.,
n_samples
notnsamples
)- Pattern: Look for camelCase in parameter names
- Report: Suspicious variable names
-
🟠 Fitted attributes: Learned attributes have trailing underscore (e.g.,
coef_
)- Pattern: In
fit()
method, check attributes set onself
end with_
- Exception: Input parameters (no underscore)
- Pattern: In
-
🟡 Private attributes: Internal attributes have leading underscore
- Pattern: Check attributes not in docstring have leading
_
- Pattern: Check attributes not in docstring have leading
Automated Checks:
-
🟠 No logic in init: Constructor only assigns parameters
- Pattern: Search for
if
,for
,while
, function calls beyondself.param = param
- Report: Any logic found in
__init__
methods
- Pattern: Search for
-
🟠 Parameters stored as-is:
self.param = param
(no transformation)- Pattern: Check each parameter is assigned verbatim
- Report: Parameters that are modified before assignment
-
🟠 Parameter-attribute matching: Every
__init__
parameter has matching attribute- Pattern: Parse signature and check
self.param_name = param_name
for each - Report: Mismatches
- Pattern: Parse signature and check
# Example pattern to detect:
def __init__(self, param1=1, param2=2):
# BAD - has validation
if param1 > 1: # ❌ Logic found!
param2 += 1
self.param1 = param1
self.param3 = param2 # ❌ Mismatched name!
# Check if test files exist for new modules
find sklearn/ -name "*.py" -not -path "*/tests/*" | while read f; do
test_file=$(echo $f | sed 's/sklearn/sklearn\/tests/' | sed 's/.py$/test_*.py/')
[ -f "$test_file" ] || echo "Missing test: $test_file"
done
Automated Checks:
-
🔴 Tests exist: Every new public function/class has tests
- Command: Check if
tests/test_<module>.py
exists for new modules - Report: Missing test files
- Command: Check if
-
🟠 Test naming: Test functions named
test_*
- Pattern: Check all test functions follow convention
- Report: Non-conforming names
# Run tests on changed modules
pytest <changed_test_files> -v
pytest sklearn/<changed_module> -v --cov=sklearn/<changed_module> --cov-report=term
Automated Checks:
-
🔴 Tests pass: All tests pass locally
- Command: Run pytest on modified modules
- Report: Failed tests with error messages
-
🟠 Test coverage: Coverage >= 80% for changed lines
- Command:
pytest --cov --cov-report=term-missing
- Report: Uncovered lines with line numbers
- Command:
-
🔴 No new warnings: Tests don't generate FutureWarnings or other warnings
- Command:
pytest -Werror::FutureWarning
- Report: Any warnings generated
- Command:
Automated Detection:
- 🟠 Bug fix has regression test: If PR fixes bug, test exists that would fail on main
- Pattern: PR title/description contains "fix" or "bug"
- Action: Check for new test that references issue number
- Report: If fix PR but no obvious regression test found
# Pattern to detect in tests:
def test_issue_12345(): # ✓ Good - references issue
"""Non-regression test for issue #12345."""
# Check all public functions have docstrings
Automated Checks:
-
🟠 Docstrings present: All public classes/functions have docstrings
- Pattern: Check functions/classes without leading
_
have"""..."""
- Report: Missing docstrings with file:line
- Pattern: Check functions/classes without leading
-
🟡 Docstring structure: Follow numpydoc format with Parameters, Returns, etc.
- Pattern: Check for section headers: Parameters, Returns, Examples
- Report: Docstrings missing required sections
Automated Checks:
-
🟡 Parameter documentation: All parameters in signature are documented
- Pattern: Extract parameters from signature and check in docstring
- Report: Undocumented parameters
-
🟡 Type annotations: Parameters specify types (e.g.,
array-like
,int
)- Pattern: Check parameter lines have type specifications
- Report: Parameters without types
-
🟡 Default values: Defaults documented (e.g.,
default=None
)- Pattern: Parameters with defaults should mention them
- Report: Missing default documentation
-
🟡 Examples section: At least one runnable example
- Pattern: Check for
>>>
code blocks in Examples section - Report: Missing or empty Examples section
- Pattern: Check for
# Run doctests on changed files
pytest --doctest-modules sklearn/<changed_module>
Automated Checks:
- 🟡 Doctests pass: Examples in docstrings execute correctly
- Command:
pytest --doctest-modules <files>
- Report: Failed doctests
- Command:
# Check for changelog entry
ls doc/whats_new/upcoming_changes/*.rst 2>/dev/null | wc -l
Automated Checks:
- 🟠 Changelog exists: User-facing changes have entry in
doc/whats_new/upcoming_changes/
- Pattern: PR changes public API → changelog required
- Command: Check for new
.rst
files in upcoming_changes - Report: If user-facing change but no changelog
# Run code quality tools
ruff check sklearn/ --diff
mypy sklearn/<changed_files>
Automated Checks:
-
🟡 Ruff passes: No linting errors
- Command:
ruff check <changed_files>
- Report: Linting errors with line numbers
- Command:
-
🟡 Mypy passes: No new type checking errors
- Command:
mypy <changed_files>
- Report: New type errors
- Command:
-
🟡 Line length: Lines <= 88 characters (for code) or reasonable for docs
- Pattern: Check line lengths in changed code
- Report: Overly long lines
Automated Checks:
-
🟡 Single statement per line: No multiple statements on one line (except common patterns)
- Pattern: Look for
;
in code - Report: Lines with multiple statements
- Pattern: Look for
-
🟡 No commented code: Commented-out code is removed
- Pattern: Lines starting with
#
that look like code - Report: Suspicious commented code
- Pattern: Lines starting with
-
🟡 No comments that just state what the next line(s) of code do:
- Pattern: Lines starting with
#
that look like they just state what the next line(s) are doing - Report: Suspicious comments
- Pattern: Lines starting with
# Search for validation patterns in changed code
Automated Checks:
-
🟠 validate_data used: Estimators use
validate_data()
or similar in fit- Pattern: Check
fit()
methods callvalidate_data
,check_array
, orcheck_X_y
- Report: fit methods without validation
- Pattern: Check
-
🟡 Validation location: Validation in
fit
, not__init__
- Pattern: Check
__init__
doesn't call validation functions - Report: Validation found in
__init__
- Pattern: Check
# Good pattern:
def fit(self, X, y=None):
X, y = validate_data(self, X, y) # ✓ Validation in fit
# Bad pattern:
def __init__(self, param=None):
self.param = check_array(param) # ❌ Validation in __init__
Automated Checks:
-
🟡 random_state parameter: If randomness used,
random_state
parameter exists- Pattern: Search for
np.random
,random.
, check ifrandom_state
in__init__
- Report: Uses randomness but no random_state param
- Pattern: Search for
-
🟡 check_random_state used: Uses
check_random_state()
in fit- Pattern: If has
random_state
param, check forcheck_random_state
call - Report: Has param but doesn't use helper
- Pattern: If has
# Search for deprecation patterns
grep -r "@deprecated\|FutureWarning\|DeprecationWarning" <changed_files>
Automated Checks:
-
🟠 Deprecation decorator: Deprecated functions use
@deprecated
decorator- Pattern: Check for decorator on deprecated functions
- Report: Deprecations without proper decorator
-
🟠 Warning messages complete: Deprecation warnings mention both versions (when deprecated, when removed)
- Pattern: Check warning text contains "in X.Y" and "in X.Z"
- Report: Incomplete version information
-
🟠 Docstring directive: Deprecated items have
.. deprecated::
in docstring- Pattern: If
@deprecated
used, check for docstring directive - Report: Missing docstring deprecation note
- Pattern: If
# Good pattern:
@deprecated(
"Function was deprecated in 1.5 and will be removed in 1.7."
)
def old_function():
"""
.. deprecated:: 1.5
Deprecated in 1.5, will be removed in 1.7.
"""
# Use GitHub CLI to check CI status
gh pr checks
gh pr view --json statusCheckRollup
Automated Checks:
-
🔴 CI passing: All CI checks (Azure, CircleCI, GitHub Actions) are green
- Command: Check PR status via GitHub API
- Report: Failed CI jobs with links
-
🟠 Build artifacts: Wheels build successfully (check CD job)
- Pattern: Check if
[cd build]
workflow passed - Report: Build failures
- Pattern: Check if
Automated Detection:
-
🟡 No np.asanyarray: Code uses
np.asarray
notnp.asanyarray
- Command:
grep "asanyarray" <changed_files>
- Report: Usage locations
- Command:
-
🟡 No np.atleast_2d: Don't use
np.atleast_2d
for validation- Command:
grep "atleast_2d" <changed_files>
- Report: Usage locations
- Command:
-
🟡 No print statements: No
print()
in production code (logging is ok)- Pattern:
grep "^\s*print(" <changed_files>
(excluding tests) - Report: print statements found
- Pattern:
For Classifiers:
- 🟠 classes_ attribute: Classifiers set
self.classes_
in fit- Pattern: Check ClassifierMixin subclasses set this attribute
- Report: Missing classes_ attribute
For Transformers:
- 🟡 get_feature_names_out: Transformers implement this method
- Pattern: Check TransformerMixin subclasses have this method
- Report: Missing method (note: some exceptions exist)
AI Action: Provide context and flag these for human reviewers
-
🟣 Scope appropriateness: Is this feature appropriate for scikit-learn?
- AI Task: Summarize what feature does, search similar issues/PRs
- Flag if: New algorithm, new public API, breaks conventions
-
🟣 Maintenance cost: Is ongoing maintenance burden justified?
- AI Task: Estimate complexity, count lines added, identify dependencies
- Flag if: Large addition (>500 lines), new dependencies, complex logic
-
🟣 API design quality: Is API intuitive and consistent?
- AI Task: Compare with similar estimators, check parameter names match conventions
- Flag if: New unusual parameter names, different patterns than similar classes
-
🟣 SLEP requirement: Does this require a SLEP?
- AI Task: Check if changes: API, dependencies, supported versions
- Flag if: Any of above detected in PR
AI Action: Provide analysis, but human must judge
-
🟣 Algorithmic correctness: Is algorithm implemented correctly?
- AI Task: Search for papers mentioned in docstrings, compare implementation to description
- Flag if: Complex algorithm without clear documentation
-
🟣 Literature review: Is this algorithm well-established or state-of-art?
- AI Task: Extract references, search if paper is highly cited
- Flag if: New algorithm without references or low-citation papers only
-
🟣 Performance optimization appropriate: Are optimizations premature or necessary?
- AI Task: Check if profiling results mentioned, estimate complexity
- Flag if: Complex optimizations without performance justification
AI Action: Summarize design and flag alternatives
-
🟣 Alternative approaches considered: Are there better design alternatives?
- AI Task: Search for similar implementations in codebase
- Flag if: Novel approach differs from existing patterns
-
🟣 Parameter choices: Are default values sensible?
- AI Task: Compare defaults with similar estimators
- Flag if: Unusual defaults, no justification in docstring
AI Action: Simulate user perspective
-
🟣 Error messages helpful: Are error messages clear and actionable?
- AI Task: Extract all exception messages, check for specificity
- Flag if: Generic errors like "Invalid input"
-
🟣 Documentation clarity: Is documentation understandable for target audience?
- AI Task: Check reading level, jargon usage, example complexity
- Flag if: Very technical without intuition, no examples
When AI completes review, output in this format:
# Automated PR Review Results
**PR**: #<number> - <title>
**Type**: <FIX|FEAT|DOC|etc>
**Files Changed**: <count> files (+<additions> -<deletions>)
**Review Date**: <timestamp>
## Summary
<1-2 sentence summary of what PR does>
## Critical Issues (Must Fix) 🔴
<List of CRITICAL findings, or "None found">
1. <Issue description>
- Location: `file.py:line`
- Found: <code snippet>
- Required: <what needs to change>
## High Priority Issues (Should Fix) 🟠
<List of HIGH findings, or "None found">
## Medium Priority Issues (Recommended) 🟡
<List of MEDIUM findings, or "None found">
## Low Priority Suggestions 🔵
<List of LOW findings, or "None found">
## Items Flagged for Human Review 🟣
<List of items needing human judgment with context>
1. **<Item name>**: <Why flagged>
- Context: <relevant info AI gathered>
- Question: <specific question for human>
## Positive Findings ✅
<Things done well - helps provide balanced feedback>
- Tests have good coverage (95%)
- Documentation is comprehensive
- Follows all naming conventions
## CI Status
- Azure Pipelines: ✅ Passed
- CircleCI (docs): ✅ Passed
- GitHub Actions: ⚠️ 1 job failed (link)
## Test Coverage
- Overall: 92%
- New code: 95%
- Uncovered lines: `file.py:123-125`
## Automated Check Summary
- Total checks run: 87
- Passed: 79
- Failed: 5 (3 critical, 2 high)
- Warnings: 3
- Need human review: 8
## Recommended Actions
1. <Priority 1 action>
2. <Priority 2 action>
...
## Additional Notes
<Any other observations AI made>
-
Initialize Review
# Fetch PR information gh pr view <number> --json title,body,files,additions,deletions gh pr diff <number>
-
Run All Automated Checks (Part 1)
- Execute each check in order
- Collect findings with severity levels
- Note line numbers and file locations
-
Analyze for Human Review Items (Part 2)
- Gather context for each item
- Prepare specific questions
- Highlight unusual patterns
-
Generate Structured Report (Part 3)
- Format findings according to template
- Prioritize by severity
- Include positive findings for balanced feedback
-
Suggest Next Steps
- Critical issues block merge
- High issues should be addressed
- Medium/Low can be discussed
- Human review items need maintainer input
# Get PR diff
gh pr diff <number> > /tmp/pr_diff.txt
# Get changed files list
gh pr diff <number> --name-only
# Run tests on changed files
pytest <changed_test_files> -v --tb=short
# Check coverage
pytest <changed_modules> --cov --cov-report=term-missing
# Run linters
ruff check <changed_files>
mypy <changed_files>
# Search for patterns
grep -n "pattern" <files>
# Check if CI passed
gh pr checks <number>
- Check code patterns and style
- Verify test existence and coverage
- Validate documentation structure
- Run automated tools (linters, tests)
- Search for specific patterns
- Compare with existing code
- Flag inconsistencies
- Judge feature appropriateness for scikit-learn
- Assess maintenance cost vs benefit tradeoff
- Evaluate algorithmic correctness deeply
- Make strategic design decisions
- Understand community context and history
- Judge user experience qualitatively
- Review mathematical correctness
- Assess performance without benchmarks
- All strategic decisions (scope, design, API)
- Complex algorithm implementations
- Trade-offs between approaches
- Community guidelines interpretation
- Edge cases requiring domain expertise
- Final merge decision
# PR info
gh pr view <number>
gh pr diff <number>
gh pr checks <number>
# Testing
pytest sklearn/ -k <test_name> -v
pytest --cov=sklearn/<module> --cov-report=term-missing
pytest --doctest-modules sklearn/<module>
# Linting
ruff check sklearn/
mypy sklearn/
# Search patterns
grep -r "pattern" sklearn/ --include="*.py" -n
git grep "pattern" -- "*.py"
# Check git
git diff origin/main --stat
git diff origin/main --name-only
git log origin/main..HEAD --oneline
- ❌
import *
anywhere - ❌ Logic in
__init__
- ❌ Fitted attributes without trailing
_
- ❌ Missing tests for new code
- ❌ No docstrings on public functions
- ❌ Tests failing
- ❌ CI failing
- ❌ No validation in
fit()
- ✅ Complete docstrings with examples
- ✅ High test coverage (>90%)
- ✅ Uses scikit-learn validation utilities
- ✅ Proper deprecation warnings
- ✅ Changelog entry present
- ✅ Clear, focused PR scope
- ✅ References issue numbers
- ✅ Follows naming conventions