-
Classification Metrics Sparse Support Bug (Issue #32036): A bug where classification metrics in scikit-learn claim sparse matrix support in docstrings but raise an error when used with sparse inputs. The issue is reliably reproducible with provided code steps, expected (support) vs. actual behavior (TypeError), and environment details in the traceback. No major missing elements. Link
-
RandomizedSearchCV Feature Request (Issue #32032): A proposal to add weights for controlling the probability of selecting items in a list of parameter distributions, useful for complex pipelines with interdependent hyperparameters. This is a feature enhancement, not a bug, and includes clear examples and rationale. Link
-
CI Failure on Linux Build (Issue #32022): Reported CI failure on a specific build configuration, with a reference to logs but no detailed steps to reproduce, expected behavior, or root cause analysis. More information on the failure context would be helpful for quicker resolution—feel free to add details like error logs or reproduction steps! Link
-
Website Logo Truncation (Issue #32011): A UI issue where the scikit-learn logo appears truncated on the website, with a suggestion to use the existing SVG file for better scaling. It's easily reproducible by visiting the site, and includes visual examples, but no specific environment details are needed. Low-impact cosmetic fix. Link
-
Themes: The issues cover core functionality bugs (e.g., sparse data handling), feature enhancements for advanced users (e.g., hyperparameter tuning), infrastructure reliability (e.g., CI failures), and minor UI improvements (e.g., website aesthetics). A common thread is improving usability and accuracy in data handling and development workflows.
-
Prioritization Based on Impact:
- High Priority: Address the sparse matrix bug and CI failure first, as they could affect user functionality and team productivity (e.g., sparse data is critical for large-scale applications, and CI issues may block merges).
- Medium Priority: The RandomizedSearchCV feature request could enhance efficiency for complex models, benefiting users with advanced needs.
- Low Priority: The logo truncation is a quick win for polish but has minimal impact on core operations—consider it if resources allow for minor updates.
Issue Summary
Below is a concise summary of the provided issues in bullet-point format. I've included links to the original issues. For bug-related issues, I've assessed reproducibility based on the presence of steps to reproduce, expected vs. actual behavior, and environment details. If any elements are missing, I've added a friendly note suggesting what could be improved for better clarity. Finally, I've pulled out general themes and prioritization advice at the end.
Issue #32036: Classification metrics don't seem to support sparse?
accuracy_score
) claim sparse matrix support in docstrings but fail with aTypeError
when used with sparse data. The user provided clear repro steps via a code snippet, expected behavior (no error), actual behavior (error message), and an error trace. This issue is reliably reproducible with the given details.Issue #32032: Setting weights on items when passing list of dicts to RandomizedSearchCV
RandomizedSearchCV
, allowing better control over sampling probabilities (e.g., for complex pipelines). The user described the desired workflow, proposed adistribution_weights
parameter, and provided examples, but this is not a bug—just an enhancement idea.Issue #32022:⚠️ CI failed on Linux_Nightly.pylatest_pip_scipy_dev (last failure: Aug 28, 2025) ⚠️
Issue #32011: Scikit-learn logo on the website is truncated
General Themes and Prioritization Advice
Key Themes:
Core Functionality Bugs: Two issues (sparse matrix support and website logo) highlight gaps in handling sparse data and UI consistency, which could affect user experience in critical areas like metrics and public-facing documentation.
Feature Enhancements: One request focuses on improving hyperparameter tuning (RandomizedSearchCV), emphasizing better support for complex workflows.
Infrastructure Issues: The CI failure points to potential build system problems, which might indicate broader reliability concerns in development processes.
Prioritization Based on Impact:
High Priority: Start with the sparse matrix bug (#32036) as it directly impacts core ML functionality and could affect many users dealing with large datasets. It's well-documented and fixable, potentially preventing errors in production code.
Overall, focus on bugs with clear repro steps first to ensure stability, then tackle features and infra issues. If you have more context or can provide additional details on any issue, it would help refine this further! 😊