Skip to content

Instantly share code, notes, and snippets.

@betatim
Created October 14, 2025 07:21
Show Gist options
  • Save betatim/419efe3eb35da40335a97c0092b959fd to your computer and use it in GitHub Desktop.
Save betatim/419efe3eb35da40335a97c0092b959fd to your computer and use it in GitHub Desktop.

AI-Assisted PR Review Checklist for Scikit-learn

Purpose: This checklist is optimized for AI assistants (like Cursor) to perform automated PR reviews. It separates automatable checks from those requiring human judgment, provides specific patterns to detect, and includes commands to run.


How to Use This Checklist

For AI Agents:

  1. Run all AUTOMATED checks first and report findings with severity levels
  2. Flag items for HUMAN REVIEW with context and relevant code snippets
  3. Use the structured output format at the end of this document
  4. Prioritize by severity: CRITICAL > HIGH > MEDIUM > LOW

Severity Levels:

  • 🔴 CRITICAL: Must be fixed before merge (breaking changes, test failures, security issues)
  • 🟠 HIGH: Should be fixed (missing tests, incomplete docs, API violations)
  • 🟡 MEDIUM: Recommended fixes (style issues, minor inefficiencies)
  • 🔵 LOW: Suggestions (code clarity, optional improvements)
  • 🟣 HUMAN: Requires human judgment (design decisions, scope questions)

PART 1: AUTOMATED CHECKS (AI Can Verify)

1.1 File and Structure Analysis

Check Changed Files

# Commands to run:
git diff --name-only origin/main
git diff --stat origin/main

Automated Checks:

  • 🟠 Modified files are related: All changed files relate to stated purpose (no unrelated changes)

    • Pattern: Check if files span multiple unrelated modules
    • Action: List any files that seem unrelated to PR description
  • 🟠 All modifications in a file are related: All changes in a file relate to stated purpose (no unrelated changes)

    • Pattern: Check if unrelated parts of a file are modified
    • Action: List any changes that seem unrelated to PR description
  • 🟡 File organization: New files in appropriate directories

    • Pattern: .py files in module dirs, tests in tests/, docs in doc/
    • Action: Flag any files in unexpected locations

Detect PR Type

# Check commit messages and PR title for prefixes
git log origin/main..HEAD --oneline

Automated Detection:

  • Identify PR type from title: FIX, FEAT, ENH, DOC, MAINT, CI, API
  • Extract issue numbers from PR description (e.g., "Fixes #1234")
  • Check if issue exists and is open: gh issue view <number>

1.2 Code Structure Checks

Import Analysis

# Search for problematic imports
grep -r "import \*" sklearn/ --include="*.py"
grep -r "from .* import \*" sklearn/ --include="*.py"

Automated Checks:

  • 🔴 No wildcard imports: import * not used anywhere

    • Command: grep -r "import \*" <changed_files>
    • Report: File paths where found
  • 🟡 Import order: Absolute imports used (not relative where inappropriate)

    • Pattern: Check for excessive relative imports in new code

Naming Conventions

# Patterns to check in changed code

Automated Checks:

  • 🟡 Underscore naming: Non-class names use underscores (e.g., n_samples not nsamples)

    • Pattern: Look for camelCase in parameter names
    • Report: Suspicious variable names
  • 🟠 Fitted attributes: Learned attributes have trailing underscore (e.g., coef_)

    • Pattern: In fit() method, check attributes set on self end with _
    • Exception: Input parameters (no underscore)
  • 🟡 Private attributes: Internal attributes have leading underscore

    • Pattern: Check attributes not in docstring have leading _

__init__ Method Validation

Automated Checks:

  • 🟠 No logic in init: Constructor only assigns parameters

    • Pattern: Search for if, for, while, function calls beyond self.param = param
    • Report: Any logic found in __init__ methods
  • 🟠 Parameters stored as-is: self.param = param (no transformation)

    • Pattern: Check each parameter is assigned verbatim
    • Report: Parameters that are modified before assignment
  • 🟠 Parameter-attribute matching: Every __init__ parameter has matching attribute

    • Pattern: Parse signature and check self.param_name = param_name for each
    • Report: Mismatches
# Example pattern to detect:
def __init__(self, param1=1, param2=2):
    # BAD - has validation
    if param1 > 1:  # ❌ Logic found!
        param2 += 1
    self.param1 = param1
    self.param3 = param2  # ❌ Mismatched name!

1.3 Testing Checks

Test File Existence

# Check if test files exist for new modules
find sklearn/ -name "*.py" -not -path "*/tests/*" | while read f; do
  test_file=$(echo $f | sed 's/sklearn/sklearn\/tests/' | sed 's/.py$/test_*.py/')
  [ -f "$test_file" ] || echo "Missing test: $test_file"
done

Automated Checks:

  • 🔴 Tests exist: Every new public function/class has tests

    • Command: Check if tests/test_<module>.py exists for new modules
    • Report: Missing test files
  • 🟠 Test naming: Test functions named test_*

    • Pattern: Check all test functions follow convention
    • Report: Non-conforming names

Run Tests

# Run tests on changed modules
pytest <changed_test_files> -v
pytest sklearn/<changed_module> -v --cov=sklearn/<changed_module> --cov-report=term

Automated Checks:

  • 🔴 Tests pass: All tests pass locally

    • Command: Run pytest on modified modules
    • Report: Failed tests with error messages
  • 🟠 Test coverage: Coverage >= 80% for changed lines

    • Command: pytest --cov --cov-report=term-missing
    • Report: Uncovered lines with line numbers
  • 🔴 No new warnings: Tests don't generate FutureWarnings or other warnings

    • Command: pytest -Werror::FutureWarning
    • Report: Any warnings generated

Non-Regression Tests (for bug fixes)

Automated Detection:

  • 🟠 Bug fix has regression test: If PR fixes bug, test exists that would fail on main
    • Pattern: PR title/description contains "fix" or "bug"
    • Action: Check for new test that references issue number
    • Report: If fix PR but no obvious regression test found
# Pattern to detect in tests:
def test_issue_12345():  # ✓ Good - references issue
    """Non-regression test for issue #12345."""

1.4 Documentation Checks

Docstring Existence

# Check all public functions have docstrings

Automated Checks:

  • 🟠 Docstrings present: All public classes/functions have docstrings

    • Pattern: Check functions/classes without leading _ have """..."""
    • Report: Missing docstrings with file:line
  • 🟡 Docstring structure: Follow numpydoc format with Parameters, Returns, etc.

    • Pattern: Check for section headers: Parameters, Returns, Examples
    • Report: Docstrings missing required sections

Docstring Content

Automated Checks:

  • 🟡 Parameter documentation: All parameters in signature are documented

    • Pattern: Extract parameters from signature and check in docstring
    • Report: Undocumented parameters
  • 🟡 Type annotations: Parameters specify types (e.g., array-like, int)

    • Pattern: Check parameter lines have type specifications
    • Report: Parameters without types
  • 🟡 Default values: Defaults documented (e.g., default=None)

    • Pattern: Parameters with defaults should mention them
    • Report: Missing default documentation
  • 🟡 Examples section: At least one runnable example

    • Pattern: Check for >>> code blocks in Examples section
    • Report: Missing or empty Examples section

Doctest Validation

# Run doctests on changed files
pytest --doctest-modules sklearn/<changed_module>

Automated Checks:

  • 🟡 Doctests pass: Examples in docstrings execute correctly
    • Command: pytest --doctest-modules <files>
    • Report: Failed doctests

Changelog Entry

# Check for changelog entry
ls doc/whats_new/upcoming_changes/*.rst 2>/dev/null | wc -l

Automated Checks:

  • 🟠 Changelog exists: User-facing changes have entry in doc/whats_new/upcoming_changes/
    • Pattern: PR changes public API → changelog required
    • Command: Check for new .rst files in upcoming_changes
    • Report: If user-facing change but no changelog

1.5 Style and Linting

Run Linters

# Run code quality tools
ruff check sklearn/ --diff
mypy sklearn/<changed_files>

Automated Checks:

  • 🟡 Ruff passes: No linting errors

    • Command: ruff check <changed_files>
    • Report: Linting errors with line numbers
  • 🟡 Mypy passes: No new type checking errors

    • Command: mypy <changed_files>
    • Report: New type errors
  • 🟡 Line length: Lines <= 88 characters (for code) or reasonable for docs

    • Pattern: Check line lengths in changed code
    • Report: Overly long lines

Code Patterns

Automated Checks:

  • 🟡 Single statement per line: No multiple statements on one line (except common patterns)

    • Pattern: Look for ; in code
    • Report: Lines with multiple statements
  • 🟡 No commented code: Commented-out code is removed

    • Pattern: Lines starting with # that look like code
    • Report: Suspicious commented code
  • 🟡 No comments that just state what the next line(s) of code do:

    • Pattern: Lines starting with # that look like they just state what the next line(s) are doing
    • Report: Suspicious comments

1.6 Validation and Error Handling

Input Validation Patterns

# Search for validation patterns in changed code

Automated Checks:

  • 🟠 validate_data used: Estimators use validate_data() or similar in fit

    • Pattern: Check fit() methods call validate_data, check_array, or check_X_y
    • Report: fit methods without validation
  • 🟡 Validation location: Validation in fit, not __init__

    • Pattern: Check __init__ doesn't call validation functions
    • Report: Validation found in __init__
# Good pattern:
def fit(self, X, y=None):
    X, y = validate_data(self, X, y)  # ✓ Validation in fit

# Bad pattern:
def __init__(self, param=None):
    self.param = check_array(param)  # ❌ Validation in __init__

Random State Handling

Automated Checks:

  • 🟡 random_state parameter: If randomness used, random_state parameter exists

    • Pattern: Search for np.random, random., check if random_state in __init__
    • Report: Uses randomness but no random_state param
  • 🟡 check_random_state used: Uses check_random_state() in fit

    • Pattern: If has random_state param, check for check_random_state call
    • Report: Has param but doesn't use helper

1.7 Deprecation and Backwards Compatibility

Deprecation Warnings

# Search for deprecation patterns
grep -r "@deprecated\|FutureWarning\|DeprecationWarning" <changed_files>

Automated Checks:

  • 🟠 Deprecation decorator: Deprecated functions use @deprecated decorator

    • Pattern: Check for decorator on deprecated functions
    • Report: Deprecations without proper decorator
  • 🟠 Warning messages complete: Deprecation warnings mention both versions (when deprecated, when removed)

    • Pattern: Check warning text contains "in X.Y" and "in X.Z"
    • Report: Incomplete version information
  • 🟠 Docstring directive: Deprecated items have .. deprecated:: in docstring

    • Pattern: If @deprecated used, check for docstring directive
    • Report: Missing docstring deprecation note
# Good pattern:
@deprecated(
    "Function was deprecated in 1.5 and will be removed in 1.7."
)
def old_function():
    """
    .. deprecated:: 1.5
        Deprecated in 1.5, will be removed in 1.7.
    """

1.8 CI/CD Status

Check CI Status

# Use GitHub CLI to check CI status
gh pr checks
gh pr view --json statusCheckRollup

Automated Checks:

  • 🔴 CI passing: All CI checks (Azure, CircleCI, GitHub Actions) are green

    • Command: Check PR status via GitHub API
    • Report: Failed CI jobs with links
  • 🟠 Build artifacts: Wheels build successfully (check CD job)

    • Pattern: Check if [cd build] workflow passed
    • Report: Build failures

1.9 Specific Anti-Patterns

Code Smells

Automated Detection:

  • 🟡 No np.asanyarray: Code uses np.asarray not np.asanyarray

    • Command: grep "asanyarray" <changed_files>
    • Report: Usage locations
  • 🟡 No np.atleast_2d: Don't use np.atleast_2d for validation

    • Command: grep "atleast_2d" <changed_files>
    • Report: Usage locations
  • 🟡 No print statements: No print() in production code (logging is ok)

    • Pattern: grep "^\s*print(" <changed_files> (excluding tests)
    • Report: print statements found

Estimator-Specific Patterns

For Classifiers:

  • 🟠 classes_ attribute: Classifiers set self.classes_ in fit
    • Pattern: Check ClassifierMixin subclasses set this attribute
    • Report: Missing classes_ attribute

For Transformers:

  • 🟡 get_feature_names_out: Transformers implement this method
    • Pattern: Check TransformerMixin subclasses have this method
    • Report: Missing method (note: some exceptions exist)

PART 2: ITEMS REQUIRING HUMAN REVIEW

2.1 Strategic Questions (Flag for Human)

AI Action: Provide context and flag these for human reviewers

  • 🟣 Scope appropriateness: Is this feature appropriate for scikit-learn?

    • AI Task: Summarize what feature does, search similar issues/PRs
    • Flag if: New algorithm, new public API, breaks conventions
  • 🟣 Maintenance cost: Is ongoing maintenance burden justified?

    • AI Task: Estimate complexity, count lines added, identify dependencies
    • Flag if: Large addition (>500 lines), new dependencies, complex logic
  • 🟣 API design quality: Is API intuitive and consistent?

    • AI Task: Compare with similar estimators, check parameter names match conventions
    • Flag if: New unusual parameter names, different patterns than similar classes
  • 🟣 SLEP requirement: Does this require a SLEP?

    • AI Task: Check if changes: API, dependencies, supported versions
    • Flag if: Any of above detected in PR

2.2 Algorithm and Implementation Quality

AI Action: Provide analysis, but human must judge

  • 🟣 Algorithmic correctness: Is algorithm implemented correctly?

    • AI Task: Search for papers mentioned in docstrings, compare implementation to description
    • Flag if: Complex algorithm without clear documentation
  • 🟣 Literature review: Is this algorithm well-established or state-of-art?

    • AI Task: Extract references, search if paper is highly cited
    • Flag if: New algorithm without references or low-citation papers only
  • 🟣 Performance optimization appropriate: Are optimizations premature or necessary?

    • AI Task: Check if profiling results mentioned, estimate complexity
    • Flag if: Complex optimizations without performance justification

2.3 Design Decisions

AI Action: Summarize design and flag alternatives

  • 🟣 Alternative approaches considered: Are there better design alternatives?

    • AI Task: Search for similar implementations in codebase
    • Flag if: Novel approach differs from existing patterns
  • 🟣 Parameter choices: Are default values sensible?

    • AI Task: Compare defaults with similar estimators
    • Flag if: Unusual defaults, no justification in docstring

2.4 User Experience

AI Action: Simulate user perspective

  • 🟣 Error messages helpful: Are error messages clear and actionable?

    • AI Task: Extract all exception messages, check for specificity
    • Flag if: Generic errors like "Invalid input"
  • 🟣 Documentation clarity: Is documentation understandable for target audience?

    • AI Task: Check reading level, jargon usage, example complexity
    • Flag if: Very technical without intuition, no examples

PART 3: STRUCTURED OUTPUT FORMAT

When AI completes review, output in this format:

# Automated PR Review Results

**PR**: #<number> - <title>
**Type**: <FIX|FEAT|DOC|etc>
**Files Changed**: <count> files (+<additions> -<deletions>)
**Review Date**: <timestamp>

## Summary
<1-2 sentence summary of what PR does>

## Critical Issues (Must Fix) 🔴
<List of CRITICAL findings, or "None found">
1. <Issue description>
   - Location: `file.py:line`
   - Found: <code snippet>
   - Required: <what needs to change>

## High Priority Issues (Should Fix) 🟠
<List of HIGH findings, or "None found">

## Medium Priority Issues (Recommended) 🟡
<List of MEDIUM findings, or "None found">

## Low Priority Suggestions 🔵
<List of LOW findings, or "None found">

## Items Flagged for Human Review 🟣
<List of items needing human judgment with context>
1. **<Item name>**: <Why flagged>
   - Context: <relevant info AI gathered>
   - Question: <specific question for human>

## Positive Findings ✅
<Things done well - helps provide balanced feedback>
- Tests have good coverage (95%)
- Documentation is comprehensive
- Follows all naming conventions

## CI Status
- Azure Pipelines: ✅ Passed
- CircleCI (docs): ✅ Passed  
- GitHub Actions: ⚠️ 1 job failed (link)

## Test Coverage
- Overall: 92%
- New code: 95%
- Uncovered lines: `file.py:123-125`

## Automated Check Summary
- Total checks run: 87
- Passed: 79
- Failed: 5 (3 critical, 2 high)
- Warnings: 3
- Need human review: 8

## Recommended Actions
1. <Priority 1 action>
2. <Priority 2 action>
...

## Additional Notes
<Any other observations AI made>

PART 4: AI REVIEW WORKFLOW

Step-by-Step Process

  1. Initialize Review

    # Fetch PR information
    gh pr view <number> --json title,body,files,additions,deletions
    gh pr diff <number>
  2. Run All Automated Checks (Part 1)

    • Execute each check in order
    • Collect findings with severity levels
    • Note line numbers and file locations
  3. Analyze for Human Review Items (Part 2)

    • Gather context for each item
    • Prepare specific questions
    • Highlight unusual patterns
  4. Generate Structured Report (Part 3)

    • Format findings according to template
    • Prioritize by severity
    • Include positive findings for balanced feedback
  5. Suggest Next Steps

    • Critical issues block merge
    • High issues should be addressed
    • Medium/Low can be discussed
    • Human review items need maintainer input

Example Commands for AI

# Get PR diff
gh pr diff <number> > /tmp/pr_diff.txt

# Get changed files list
gh pr diff <number> --name-only

# Run tests on changed files
pytest <changed_test_files> -v --tb=short

# Check coverage
pytest <changed_modules> --cov --cov-report=term-missing

# Run linters
ruff check <changed_files>
mypy <changed_files>

# Search for patterns
grep -n "pattern" <files>

# Check if CI passed
gh pr checks <number>

PART 5: LIMITATIONS & DISCLAIMERS

What AI Can Do Well ✅

  • Check code patterns and style
  • Verify test existence and coverage
  • Validate documentation structure
  • Run automated tools (linters, tests)
  • Search for specific patterns
  • Compare with existing code
  • Flag inconsistencies

What AI Cannot Do ❌

  • Judge feature appropriateness for scikit-learn
  • Assess maintenance cost vs benefit tradeoff
  • Evaluate algorithmic correctness deeply
  • Make strategic design decisions
  • Understand community context and history
  • Judge user experience qualitatively
  • Review mathematical correctness
  • Assess performance without benchmarks

Human Review Required For 🟣

  • All strategic decisions (scope, design, API)
  • Complex algorithm implementations
  • Trade-offs between approaches
  • Community guidelines interpretation
  • Edge cases requiring domain expertise
  • Final merge decision

APPENDIX: Quick Reference

Common Commands

# PR info
gh pr view <number>
gh pr diff <number>
gh pr checks <number>

# Testing
pytest sklearn/ -k <test_name> -v
pytest --cov=sklearn/<module> --cov-report=term-missing
pytest --doctest-modules sklearn/<module>

# Linting
ruff check sklearn/
mypy sklearn/

# Search patterns
grep -r "pattern" sklearn/ --include="*.py" -n
git grep "pattern" -- "*.py"

# Check git
git diff origin/main --stat
git diff origin/main --name-only
git log origin/main..HEAD --oneline

Critical Patterns to Always Check

  1. import * anywhere
  2. ❌ Logic in __init__
  3. ❌ Fitted attributes without trailing _
  4. ❌ Missing tests for new code
  5. ❌ No docstrings on public functions
  6. ❌ Tests failing
  7. ❌ CI failing
  8. ❌ No validation in fit()

Good Patterns to Recognize

  1. ✅ Complete docstrings with examples
  2. ✅ High test coverage (>90%)
  3. ✅ Uses scikit-learn validation utilities
  4. ✅ Proper deprecation warnings
  5. ✅ Changelog entry present
  6. ✅ Clear, focused PR scope
  7. ✅ References issue numbers
  8. ✅ Follows naming conventions

Scikit-learn Pull Request Review Checklist

This comprehensive checklist is synthesized from the contributing, developer, and maintainer documentation for scikit-learn. It serves as a guide for expert reviewers to ensure thorough and consistent PR reviews.


1. High-Level Strategic Questions

Scope and Fit

  • Is this feature/fix appropriate for scikit-learn? Does it align with the project's scope and selectiveness criteria?
  • Is there evidence of usefulness? For new features, is there evidence it's often useful and well-established in literature or practice?
  • Has duplication been checked? Search issue tracker and PR list to ensure this doesn't duplicate existing work
  • Is the maintenance cost justified? Consider the long-term burden on volunteer maintainers
  • Has a corresponding issue been triaged? For major features, an issue should be discussed and triaged first (label "triage" should be removed)
  • Does this require a SLEP? API changes, dependency changes, or supported version changes require a SLEP

Design and API

  • Is the API consistent with scikit-learn conventions? Public functions/classes/parameters well-named and intuitively designed?
  • Does it add new public methods to an estimator or other class? Adding new public methods is almost always inappropriate and needs very good justification.
  • Does it follow the estimator API? Proper use of fit, predict, transform, etc.
  • Are estimator tags correct? Does __sklearn_tags__() return appropriate values for this estimator's capabilities?
  • Is the interface minimal? Avoid exposing unnecessary internal details

2. Code Quality and Standards

Coding Guidelines

  • PEP8 compliance: Code follows Python style guidelines
  • Naming conventions: Use underscores in non-class names (n_samples not nsamples)
  • No unrelated changes: PR should be focused; don't modify unrelated lines
  • Avoid multiple statements per line: Prefer line return after control flow statements
  • Use absolute imports: Never use import *
  • Proper inheritance order: Mixins on the left, BaseEstimator on the right (for MRO)

Implementation Quality

  • Input validation: Use check_array, check_X_y, or validate_data appropriately
  • Random state handling: If using randomness, accept random_state parameter and use check_random_state
  • No np.matrix compatibility: Use np.asarray, not np.asanyarray or np.atleast_2d
  • Proper __init__ implementation:
    • Accept only keyword arguments with defaults
    • Store parameters as-is (no validation or transformation)
    • Every parameter should correspond to an attribute
    • No logic in __init__, put it in fit
  • Fitted attributes: Use trailing underscore for fitted attributes (e.g., coef_)
  • Private attributes: Use leading underscore for internal attributes not exposed to users
  • Universal attributes: Set n_features_in_ (and feature_names_in_ if applicable) in fit
  • Efficient implementation: No obvious inefficiencies; proper use of vectorized NumPy operations
  • Sparse matrix support: If applicable, handle scipy.sparse matrices correctly
  • Cython usage: For performance-critical sections, is Cython used appropriately?

Code Readability

  • Meaningful comments: Comments explain "why" not "what"; avoid obvious comments
  • Code clarity: Variable names are clear and descriptive
  • Low redundancy: No duplicated code that could be refactored

3. Testing Requirements

Test Coverage

  • Tests exist: Every public function/class is tested
  • Comprehensive coverage: At least 80% coverage, preferably 100% of the diff of the Pull Request
  • Parameter testing: Reasonable set of parameters, values, types, and combinations tested
  • Edge cases: Tests cover boundary conditions and edge cases
  • Common checks: Run estimator checks with check_estimator or parametrize_with_checks
  • Tests pass: All tests pass in CI (check Azure, CircleCI, GitHub Actions)

Test Quality

  • Non-regression tests: Bug fixes include tests that fail on main but pass in PR
  • Non regression tests: If the PR is fixing a bug there should be a test that fails on main and passes for the PR. It should reference the issue that reported the bug.
  • Test correctness: Tests validate code does what documentation says
  • Minimal reproducible examples: Tests are focused and minimal
  • Proper test structure: Tests in appropriate tests/ subdirectory
  • Test imports: Match how users would import (e.g., from sklearn.foo not sklearn.foo.bar.baz)
  • No warnings in tests: FutureWarnings and other warnings properly handled with @pytest.mark.filterwarnings
  • Numerical assertions: Use assert_allclose with appropriate tolerance for continuous values

4. Documentation Requirements

API Documentation (Docstrings)

  • Docstring completeness: All public functions/classes have complete docstrings
  • numpydoc format: Follow numpydoc conventions
  • Section order: Parameters, Returns, See Also, Notes, Examples
  • Parameter formatting: Correct format (e.g., array-like of shape (n_samples, n_features))
  • Type annotations: Proper type specifications for parameters
  • Default values: Defaults clearly documented (e.g., default=None)
  • Return types: Returns section with proper type specification
  • Attributes: Fitted attributes (with trailing _) documented in Attributes section
  • See Also section: Related functions/classes linked with explanations
  • Examples section: Runnable code snippets (1-2 brief examples)
  • Doctest compliance: Run pytest --doctest-modules on modified files

User Guide

  • User guide updates: New features/changes documented in narrative docs (doc/modules/)
  • Mathematical description: Balance between math details and intuitive explanation
  • Complexity noted: Time/space complexity and scalability discussed
  • Figures included: Relevant plots from examples for intuition
  • References included: Links to papers (use :doi: or :arxiv: directives)
  • Cross-references work: Use proper sphinx directives (:func:, :class:, :ref:, etc.)
  • Documentation renders: Check CircleCI build, or build locally with make html
  • No sphinx warnings: Build documentation and check for new warnings

Examples

  • Examples provided: New features demonstrated with full examples in examples/
  • Example quality: Examples show why feature is useful, compare to other methods if applicable
  • Example execution: Examples run without errors
  • No warnings in examples: Examples should not generate warnings

What's New / Changelog

  • Changelog entry: Add RST fragment to doc/whats_new/upcoming_changes/ following README
  • Entry quality: Clearly describes change from user perspective
  • Proper categorization: In correct section (Enhancement, Fix, Documentation, etc.)

5. Backwards Compatibility

Deprecation Process

  • Proper deprecation cycle: Old behavior supported for 2 releases with warnings
  • Deprecation warnings: Use @deprecated decorator or warnings.warn with FutureWarning
  • Warning messages: Include version when deprecated and version when removed
  • Docstring updates: Use .. deprecated:: directive
  • API reference updates: Move from API_REFERENCE to DEPRECATED_API_REFERENCE in doc/api_reference.py
  • Tests for deprecation: Ensure warnings are raised appropriately
  • No warnings in unrelated tests: Catch warnings with @pytest.mark.filterwarnings in other tests

API Changes

  • Parameter defaults: Changing defaults uses interim sentinel value (e.g., "warn") with FutureWarning
  • Docstring version notes: Use .. versionchanged:: directive
  • Pickle compatibility: Consider impact on pickled models from previous versions

6. Performance Considerations

Efficiency

  • No obvious bottlenecks: Profile code if performance-critical
  • Vectorization: Use NumPy vectorized operations instead of Python loops
  • Memory efficiency: Avoid unnecessary copies of large arrays
  • Parallel execution: Consider if joblib.Parallel can be used for coarse-grained parallelism
  • Benchmarks: Performance-sensitive PRs include benchmark results

Optimization

  • Algorithmic soundness: Check if algorithm is state-of-the-art (review literature)
  • Cython profiling: If using Cython, has it been profiled appropriately?
  • Hot path optimization: Focus on actual bottlenecks, not premature optimization

7. Validation and Error Handling

Input Validation

  • Validation in fit: Input validation happens in fit, not __init__
  • Appropriate validators: Use check_array, check_X_y, validate_data with correct parameters
  • Finite check: Arrays checked for NaN/Inf with assert_all_finite (unless intentionally disabled)
  • Shape validation: Check X and y have compatible shapes
  • Type validation: Proper handling of different input types (list, numpy array, sparse matrix, dataframe)

Error Messages

  • Clear error messages: Errors are descriptive and helpful
  • Appropriate exceptions: Use ValueError, TypeError, etc. appropriately
  • Standardized errors: Use helpers like check_is_fitted for standard checks

8. CI/CD and Build

Continuous Integration

  • All CI checks pass: Azure Pipelines, CircleCI, GitHub Actions all green
  • Platform coverage: Tests pass on Linux, macOS, and Windows
  • Python versions: Tests pass on all supported Python versions
  • Dependency variations: Tests pass with min/default dependencies

Commit Markers

  • Appropriate markers: Use [cd build], [doc skip], [lint skip], etc. when appropriate
  • Final commit message: Title is descriptive and will make sense as commit message when squashed

Code Quality Tools

  • Linting passes: Code passes ruff checks
  • Type checking: mypy doesn't produce new errors in modified code
  • Pre-commit hooks: If using pre-commit, hooks pass

9. Special Cases

Meta-estimators

  • get_params/set_params: Properly handle nested parameters with __ separator
  • Cloning: Sub-estimators are cloned appropriately
  • Delegation: Methods properly delegated to sub-estimators

Transformers

  • Transform output: Never changes number of samples
  • feature_names_out: Implement get_feature_names_out if applicable
  • set_output API: Works correctly with pandas/polars output

Classifiers

  • classes_ attribute: Set correctly from unique labels in y
  • Label handling: Accepts string or integer labels; doesn't assume contiguous integers
  • predict_proba: If implemented, output order matches classes_
  • decision_function: If implemented, consistent with predict

Sparse Matrices

  • Sparse support: If advertised in tags, sparse matrices handled correctly
  • Efficient sparse operations: Use sparsefuncs utilities where appropriate
  • Format handling: Specify accepted formats (CSR, CSC, etc.) in validation

Experimental Features

  • Experimental module: New experimental features use sklearn.experimental.enable_* pattern
  • Clear warnings: Documentation clearly states feature is experimental
  • Type checking: Add protected import with if typing.TYPE_CHECKING

10. Process and Communication

PR Description

  • Descriptive title: Clear summary that works as commit message
  • Issue links: Uses "Fixes #123" or "Towards #123" to link related issues
  • Change description: Explains what changed and why
  • Rationale: Explains decisions made
  • Multiple co-authors: If multiple code contributors, includes Co-authored-by: tags

Review Process

  • Constructive feedback: Review is welcoming and constructive
  • Focus on important issues: Don't let perfect be the enemy of good
  • Objective improvements: Distinguish between objective improvements and subjective nits
  • Clear explanations: Suggestions are well-justified
  • Author understanding: PR author demonstrates understanding of their changes (not blind AI-generated)

Draft vs Ready

  • Draft status: If incomplete, marked as Draft with TODO list
  • Ready for review: Only marked ready when author believes it's complete

Maintainer Actions

  • Two approvals: Need two core dev approvals before merge
  • Milestone: Assign to appropriate milestone
  • Labels: Apply relevant labels (Bug, Enhancement, Documentation, etc.)
  • Module tags: Tag relevant modules (e.g., sklearn.linear_model)

11. Security and Safety

Code Security

  • No security vulnerabilities: No obvious security issues
  • Safe dependencies: No new dependencies added (very rare exceptions)
  • Input sanitization: User inputs properly validated

Data Handling

  • No data retention: fit doesn't store X and y unless necessary (e.g., precomputed kernels)
  • Memory safety: No obvious memory leaks (especially in Cython code)

12. Special Considerations

AI-Generated Code

  • Human review: PR author demonstrates they understand AI-generated code
  • Not exploratory: PR is focused, not a blind AI-generated exploration
  • Proper context: Author shows understanding of broader project context

Large PRs

  • Reasonable scope: PR isn't trying to do too much at once
  • Logical commits: If very large, consider if it can be split into multiple PRs
  • Clear structure: Easy to understand the changes

First-time Contributors

  • Welcoming: Be especially welcoming and encouraging
  • Educational: Explain conventions and reasons behind feedback
  • Patience: Allow time for learning curve

Quick Reference: Common Issues

❌ Common Problems to Watch For

  • __init__ includes validation or logic
  • Parameters modified in __init__ instead of stored as-is
  • Estimated attributes don't have trailing underscore
  • Tests don't cover edge cases
  • Documentation missing or incomplete
  • Backwards compatibility broken without deprecation
  • import * used anywhere
  • No changelog entry for user-facing changes
  • Unrelated lines modified
  • Performance not considered for critical paths

✅ Signs of High-Quality PR

  • Clear, focused scope
  • Comprehensive tests with good coverage
  • Complete, clear documentation
  • Follows all scikit-learn conventions
  • Author demonstrates understanding
  • Backwards compatible or properly deprecated
  • Responds to review feedback constructively

Resources

  • Contributing Guide: doc/developers/contributing.rst
  • Developer Guide: doc/developers/develop.rst
  • Coding Guidelines: Search for coding-guidelines in docs
  • Code Review Guidelines: Search for code_review in docs
  • Saved Replies: Use GitHub saved replies for common feedback
  • Glossary: Refer to scikit-learn glossary for terminology

Notes for Reviewers

  1. Start with high-level questions (scope, design) before diving into details
  2. Review the literature for new algorithms before focusing on implementation
  3. Be welcoming and constructive, especially for new contributors
  4. Focus on preventing bugs and technical debt, not just style
  5. Distinguish between blocking issues and nits
  6. Take time to make comments clear and well-justified
  7. Remember you represent the project - be professional even on bad days
  8. Use saved replies to be efficient while remaining polite
  9. Don't rush - quality reviews are worth the time
  10. It's OK to ask for help from other core devs on complex PRs

Checklist Template for PR Comments

## Review Checklist

### High Priority
- [ ] Appropriate scope for scikit-learn
- [ ] Tests comprehensive and passing
- [ ] Documentation complete and correct
- [ ] API consistent with conventions
- [ ] Backwards compatible or properly deprecated

### Code Quality
- [ ] Follows coding guidelines
- [ ] Input validation appropriate
- [ ] No unrelated changes
- [ ] Efficient implementation

### Documentation
- [ ] Docstrings complete and correct
- [ ] User guide updated
- [ ] Changelog entry added
- [ ] Examples provided

### Other
- [ ] Issue properly linked
- [ ] CI passing
- [ ] No new warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment