Skip to content

Instantly share code, notes, and snippets.

@betatim
Created August 28, 2025 11:47
Show Gist options
  • Save betatim/1462d725b4156ed370ed3eb85184b3ff to your computer and use it in GitHub Desktop.
Save betatim/1462d725b4156ed370ed3eb85184b3ff to your computer and use it in GitHub Desktop.

Summary of Issues

  • Classification Metrics Sparse Support Bug (Issue #32036): A bug where classification metrics in scikit-learn claim sparse matrix support in docstrings but raise an error when used with sparse inputs. The issue is reliably reproducible with provided code steps, expected (support) vs. actual behavior (TypeError), and environment details in the traceback. No major missing elements. Link

  • RandomizedSearchCV Feature Request (Issue #32032): A proposal to add weights for controlling the probability of selecting items in a list of parameter distributions, useful for complex pipelines with interdependent hyperparameters. This is a feature enhancement, not a bug, and includes clear examples and rationale. Link

  • CI Failure on Linux Build (Issue #32022): Reported CI failure on a specific build configuration, with a reference to logs but no detailed steps to reproduce, expected behavior, or root cause analysis. More information on the failure context would be helpful for quicker resolution—feel free to add details like error logs or reproduction steps! Link

  • Website Logo Truncation (Issue #32011): A UI issue where the scikit-learn logo appears truncated on the website, with a suggestion to use the existing SVG file for better scaling. It's easily reproducible by visiting the site, and includes visual examples, but no specific environment details are needed. Low-impact cosmetic fix. Link

General Themes and Prioritization

  • Themes: The issues cover core functionality bugs (e.g., sparse data handling), feature enhancements for advanced users (e.g., hyperparameter tuning), infrastructure reliability (e.g., CI failures), and minor UI improvements (e.g., website aesthetics). A common thread is improving usability and accuracy in data handling and development workflows.

  • Prioritization Based on Impact:

    • High Priority: Address the sparse matrix bug and CI failure first, as they could affect user functionality and team productivity (e.g., sparse data is critical for large-scale applications, and CI issues may block merges).
    • Medium Priority: The RandomizedSearchCV feature request could enhance efficiency for complex models, benefiting users with advanced needs.
    • Low Priority: The logo truncation is a quick win for polish but has minimal impact on core operations—consider it if resources allow for minor updates.
@betatim
Copy link
Author

betatim commented Aug 28, 2025

Using openai/gpt-5-mini. All the previous ones used xai/grok-3-mini

  • Classification metrics don't seem to support sparse labels (bug)

    • Issue: scikit-learn/scikit-learn#32036
    • Summary: Docstrings claim classification metrics accept sparse matrices for y, but calling e.g. accuracy_score with a sparse column vector raises a TypeError from check_array ("Sparse data was passed for y, but dense data is required...").
    • Reproducible? Yes — the report includes a minimal code example and full traceback showing where column_or_1d → check_array rejects sparse input. Expected behavior is that sparse label arrays are accepted or docs should be corrected. The author also notes no sparse tests for classification metrics.
    • Impact / priority: Medium–High — doc/behavior mismatch and likely affects users with sparse label representations; adding tests and either fixing validation or docstrings should be prioritized.
  • Allow weighting of list items passed to RandomizedSearchCV (feature request)

    • Issue: scikit-learn/scikit-learn#32032
    • Summary: When param_distributions is a list of dicts, RandomizedSearchCV samples list items uniformly. Request to add a way (e.g. distribution_weights) to control probabilities of selecting each dict so users can bias exploration toward certain pipeline branches. Example use cases and a tentative API are provided.
    • Impact / priority: Low–Medium — helpful for complex/nested pipelines where uniform selection over-represents some configurations; not urgent but useful for advanced users.
  • CI failure: Linux_Nightly.pylatest_pip_scipy_dev job (infrastructure/ci)

    • Issue: scikit-learn/scikit-learn#32022
    • Summary: CI is failing for job Linux_Nightly.pylatest_pip_scipy_dev (last failure listed Aug 28, 2025). Error shown: "Unable to find junit file." Link to Azure logs provided.
    • Missing info / suggested next steps: check the linked job logs/artifacts to see why junit was not produced (test runner crash, path mismatch, or missing artifact upload).
    • Impact / priority: High — CI failures block confidence in merges; investigate soon.
  • Website logo is truncated; propose using SVG from repo (UI/website)

    • Issue: scikit-learn/scikit-learn#32011
    • Summary: The site uses a PNG that appears truncated; reporter suggests replacing it with the repo's SVG to preserve resolution and offers to open a PR. Images attached.
    • Impact / priority: Low — visual/website polish; easy PR if maintainers agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment