Skip to content

Instantly share code, notes, and snippets.

View betatim's full-sized avatar
🤠
Not my first rodeo

Tim Head betatim

🤠
Not my first rodeo
View GitHub Profile

AI-Assisted PR Review Checklist for Scikit-learn

Purpose: This checklist is optimized for AI assistants (like Cursor) to perform automated PR reviews. It separates automatable checks from those requiring human judgment, provides specific patterns to detect, and includes commands to run.


How to Use This Checklist

For AI Agents:

  1. Run all AUTOMATED checks first and report findings with severity levels

Summary of Issues

  • Classification Metrics Sparse Support Bug (Issue #32036): A bug where classification metrics in scikit-learn claim sparse matrix support in docstrings but raise an error when used with sparse inputs. The issue is reliably reproducible with provided code steps, expected (support) vs. actual behavior (TypeError), and environment details in the traceback. No major missing elements. Link

  • RandomizedSearchCV Feature Request (Issue #32032): A proposal to add weights for controlling the probability of selecting items in a list of parameter distributions, useful for complex pipelines with interdependent hyperparameters. This is a feature enhancement, not a bug, and includes clear examples and rationale. Link

  • CI Failure on Linux Build (Issue #32022): Reported CI failure on a specific build configuration, with a reference to logs but no detailed steps to rep

@betatim
betatim / log-test.py
Created July 17, 2025 08:17
Associate log output with the (top level) line of a script. Makes it easier to know which log messages where output while a particular line was running. Try it with `python -m log_tracer -o traced_output.txt log-test.py`
import numpy as np
import logging
import time
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('calculations.log'),
import sklearn
import numpy as np
import torch
sklearn.set_config(array_api_dispatch=True)
def my_code(X, cdist=False):
if cdist:
dist = torch.cdist(X, X, p=2)
[215/275] Linking CXX shared library libcuml++.so
FAILED: libcuml++.so
: && /datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/bin/x86_64-conda-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/include -I/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/include -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/lib -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/lib/stubs -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -Wl,-rpath-link,/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -L/datasets/thead/mamb
@betatim
betatim / mamba.log
Created April 8, 2024 12:27
Result of running `mamba env create -n sklearn-min-docs -f .build_tools/circle/doc_min_dependencies_environment.yml`
conda-forge/osx-arm64 Using cache
conda-forge/noarch Using cache
Looking for: ['python=3.9', 'numpy=1.19.5', 'blas', 'scipy=1.6.0', 'cython=3.0.10', 'joblib', 'threadpoolctl', 'matplotlib=3.3.4', 'pandas=1.1.5', 'pyamg', "pytest[version='<8']", 'pytest-xdist', 'pillow', 'pip', 'ninja', 'meson-python', 'scikit-image=0.17.2', 'seaborn', 'memory_profiler', 'compilers', 'sphinx=6.0.0', 'sphinx-gallery=0.15.0', 'sphinx-copybutton=0.5.2', 'numpydoc=1.2.0', 'sphinx-prompt=1.3.0', 'plotly=5.14.0', 'polars=0.19.12', 'pooch', 'pip']
Could not solve for environment specs
The following packages are incompatible
├─ numpy 1.19.5** is installable with the potential options

Building for bare metal

Checkout https://github.com/rapidsai/gpu-xb-ai:

git clone https://github.com/rapidsai/gpu-xb-ai

Create a conda environment from conda/environments/gpu-xb-ai-legate-all.yaml:

conda env create -f conda/environments/gpu-xb-ai-legate-all.yaml
==================================================================================== short test summary info ====================================================================================
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-cupy.array_api-None-None] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy-None-None] - ValueError: kind can only be None or 'stable'
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy.array_api-None-None] - ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-cupy.array_api-None-None] - ValueError: The truth value of an array with
@betatim
betatim / myscript.py
Last active September 22, 2023 07:35
Working out how to `mpirun` dask with cuda
from dask_mpi import initialize
from dask import distributed
def dask_info():
distributed.print("woah i'm running!")
distributed.print("ncores:", client.ncores())
distributed.print()
distributed.print(client.scheduler_info())
@betatim
betatim / output.txt
Created August 9, 2023 07:44
`conda list` output from the newly created conda env in Vertex AI (https://gist.github.com/betatim/15bd6a2a53b561d81a7b27b0405e6c5c). env created with `conda create -n rapids-23.06 -c rapidsai -c conda-forge -c nvidia rapids=23.06 python=3.10 cudatoolkit=11.8`
# packages in environment at /opt/conda/envs/rapids-23.06:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
aiohttp 3.8.5 py310h2372a71_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
anyio 3.7.1 pyhd8ed1ab_0 conda-forge
aom 3.5.0 h27087fc_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge