This file is the TypeScript-specific Konveyor rule generator for the semver-analyzer project. It transforms a structured analysis report (AnalysisReport<TypeScript>) — which describes breaking API changes, behavioral changes, and manifest changes between two versions of a TypeScript/React component library — into machine-readable Konveyor rules and fix guidance that downstream tooling (Kantra) uses to detect and auto-fix migration issues in consumer codebases.
In concrete terms: given a diff between, say, PatternFly v5 and v6, this module produces YAML rule files that tell a static analysis engine "if a consumer imports <Modal> from @patternfly/react-core, flag it and provide this migration message."
The module has three primary responsibilities:
Converts every breaking change into a KonveyorRule — a structured detection rule with:
- A regex pattern to match affected symbols in consumer code
- A detection location (IMPORT, JSX_COMPONENT, JSX_PROP, filecontent)
- A human-readable migration message
- Labels for categorization (change-type, package, family, has-codemod)
- A fix strategy (Rename, LlmAssisted, PropValueChange, Manual, etc.)
Generates rules that detect outdated package.json dependencies and provide version-update fix strategies using frontend.dependency conditions.
Produces a FixGuidanceDoc — a manifest of all detected changes with per-change fix strategies, confidence levels, search patterns, and replacement text for a downstream fix engine.
Serializes rules into partitioned YAML files (API, CSS, composition, deps) inside a ruleset directory, matching the Konveyor ruleset specification.
The core generate_rules function (~2,000 lines) uses a multi-pass scan architecture:
Input: AnalysisReport<TypeScript>
│
┌───────────┼───────────────────────┐
│ Pre-scan passes │
│ ┌─────────────────────────┐ │
│ │ 1. component→family map │ │
│ │ 2. composition_required │ │
│ │ 3. children→prop merge │ │
│ │ 4. P0-C coverage set │ │
│ │ 5. public_symbols set │ │
│ │ 6. constant collapsing │ │
│ │ 7. hierarchy coverage │ │
│ └─────────────────────────┘ │
│ │ │
│ Main rule emission │
│ ┌─────────────────────────┐ │
│ │ Per-file API changes │ │
│ │ Per-file behavioral │ │
│ │ Manifest changes │ │
│ │ P0-C composition rules │ │
│ └─────────────────────────┘ │
└───────────────┼───────────────────┘
│
Output: (Vec<KonveyorRule>, HashMap<String, FixGuidanceEntry>)
Pre-scan passes build lookup tables and coverage sets that the main emission loop consults to:
- Avoid duplicate rules (P0-C coverage suppresses individual prop rules)
- Collapse high-cardinality changes (2,000+ token constants become one rule)
- Consolidate related patterns (children→prop migrations become one parent-level rule)
- Filter out internal/test symbols
Main emission iterates file changes and delegates to type-specific generators:
api_change_to_rules()— handles renamed, removed, type-changed, signature-changed symbolsbehavioral_change_to_rule()— handles DOM, CSS, a11y, rendering changesmanifest_change_to_rule()— handles peer deps, module system, entry points
| Function | Purpose |
|---|---|
classify_removed_props() |
Maps removed props to child components using disposition data + name-suffix heuristics |
build_migration_message_v2() |
Generates the human-readable migration message for component-level rules |
find_sibling_replacement_in_report() |
Discovers implicit component merges (e.g., Text → Content) via rename correlation |
detect_collapsible_constant_groups() |
Groups thousands of constant changes by (package, change_type, strategy) |
derive_import_path() |
Resolves npm subpath imports (e.g., @pkg/core/deprecated) from qualified names |
api_change_to_strategy() |
Maps API changes to fix strategies (Rename, ImportPathChange, CssVariablePrefix, etc.) — defined in konveyor_core |
build_frontend_condition() |
Constructs the detection condition (frontend.referenced vs builtin.filecontent) |
The module is parameterized over TypeScript — a marker type (zero-sized) that specializes generic core types (AnalysisReport<L>, BehavioralChange<L>, etc.). This is the phantom type pattern from semver_analyzer_core, allowing the core crate to define language-agnostic data structures while this module provides TypeScript-specific behavior.
Multiple pre-scan passes build HashSet<String> coverage sets (covered_components, covered_props, collapsed_symbols) that subsequent loops check via .contains(). This avoids duplicate/redundant rule emission without post-hoc deduplication.
generate_rules accumulates into Vec<KonveyorRule> and HashMap<String, FixGuidanceEntry> via .push() and .insert(), returning both as a tuple. Rules are built inline using struct literals with all fields specified.
Fix strategies are represented as FixStrategyEntry structs with a string strategy field and optional mappings. The downstream fix engine selects behavior based on the strategy name — effectively a data-driven strategy pattern without trait objects.
Functions like manifest_change_to_fix(), behavioral_category_label(), and manifest_effort() use exhaustive match on domain enums to ensure all variants are handled. This is idiomatic Rust and gives compile-time guarantees when new variants are added.
Many classification decisions follow a priority chain:
- Explicit data (e.g.,
RemovalDisposition) - Known member lookup
- Name-suffix heuristic
- Default/unmapped
This is visible in classify_removed_props() and find_sibling_replacement_in_report().
- Exhaustive
matchon enums throughout; the compiler enforces completeness - Correct lifetime usage in
detect_collapsible_constant_groups<'a>— borrows report data without cloning - Standard library collections used appropriately (
HashMap,HashSet,BTreeMap,BTreeSet) Optioncombinators used well (filter_map,map,and_then,unwrap_or)pub(crate)visibility for internal helpers — good encapsulation- Re-export pattern (
pub use semver_analyzer_konveyor_core::*) consolidates the public API tracingfor structured logging — production-grade instrumentation
-
Stringly-typed strategies. Fix strategies use
FixStrategyEntry::new("LlmAssisted"),"Rename","Manual", etc. A typo compiles fine. An enum would catch mismatches at compile time. -
Mechanism field is a string.
PropClassification.mechanismisStringwith values"prop","children","removed","unmapped". This should be an enum. -
Clone-heavy code. Many
name.clone(),pkg_name.clone(),from_pkg.clone()calls where borrows could suffice. The.clone()calls onStringinside hot loops (e.g., per-prop classification) add allocation pressure. -
HashMapfor small lookups. SeveralHashMaps (e.g.,prop_dispositions,prop_to_absorber) hold only a handful of entries where a linear scan of aVecor slice would be faster and avoid hashing overhead. -
No
implblocks. All functions are free-standing. Grouping related functions underimpl KonveyorRuleor introducing aRuleGeneratorstruct would improve discoverability and allow method chaining. -
Inconsistent error handling.
generate_rulessilently continues on errors (.unwrap_or,continue), whilewrite_ruleset_dirreturnsResult<()>withanyhow::Context. The rule generation path would benefit from at least logging when it skips changes due to unexpected data shapes. -
String-based format for migration messages. Messages are built via
format!()+push_str()chains. A structured message type that serializes to text would be more testable and make format changes safer.
- Section comments using
// ── Section Name ──dividers make scrolling through the 10,000+ line file navigable - Doc comments on public functions explain purpose and return types
- Inline rationale comments explain why decisions are made (e.g., why constants with
migration_targetare skipped) - Consistent naming conventions —
snake_casefunctions,PascalCasetypes
-
The file is 10,473 lines. This is the single biggest readability problem. A developer new to this codebase cannot hold the mental model of
generate_rules(2,000+ lines, 7 pre-scan phases, multiple emission loops) in their head. Functions that exceed ~100 lines are hard to review; this function exceeds 2,000. -
generate_rulesdoes too much. It handles pre-scans, constant collapsing, composition consolidation, P0-C rule generation, per-file API/behavioral/manifest rule generation, hierarchy delta coverage, and deprecated import handling — all in one function. Each pre-scan phase could be its own function returning its lookup table. -
Deep nesting. The composition consolidation loop (
for file_changes → for comp_change → match → if → match → if) reaches 5-6 levels of indentation. The P0-C block has a similar nesting depth. -
Implicit dependencies between pre-scan phases. Phase 7 (hierarchy deltas) mutates
covered_componentsandcovered_propsthat were initialized in phase 4. A reader must trace these mutable sets across 300+ lines to understand which phases contribute to them. -
Magic numbers.
CONSTANT_COLLAPSE_THRESHOLDis defined inkonveyor_corebut the threshold values for P0-C qualification (removed >= 3 && removal_ratio > 0.5 || removed >= 5) are inline. These should be named constants. -
Struct built inside a function.
ChildrenToPropMigrationis defined insidegenerate_rules. While valid Rust, it makes the function harder to scan — the struct definition breaks the flow of the pre-scan logic.
- 102 unit tests covering utility functions and rule generation scenarios
make_reporthelper reduces test boilerplate for constructingAnalysisReportinstances- Tests verify rule IDs, categories, effort, labels, and condition types — good coverage of structural output properties
- Tests for duplicate-ID handling (
test_duplicate_rule_ids_get_suffix)
-
generate_rulesis untestable in isolation. Because it's one monolithic function, you can't test individual phases (e.g., "does the P0-C pre-scan correctly compute coverage?") without constructing a fullAnalysisReportand inspecting the final output. The pre-scan logic is buried inside the function. -
No tests for heuristic edge cases.
classify_removed_propshas a name-suffix heuristic (name_lower.starts_with(&suffix)) but the test suite doesn't exercise ambiguous cases (what if two children share a suffix? what if the common prefix consumes the entire name?). -
build_migration_message_v2is tested only via integration. The 280-line message builder has no direct unit tests verifying message format for each branch (migration target, fully removed, restructured, etc.). -
find_sibling_replacement_in_reportscoring logic is untested. The quality scoring (before_contains_comp,prefix_ratio, etc.) determines which sibling is selected, but no tests exercise the ranking. -
No property-based or fuzz testing. The regex pattern construction (
regex_escape,build_pattern,build_token_prefix_pattern) is safety-critical — a bad pattern causes missed detections or panics in the regex engine. Property-based tests would catch edge cases.
-
Extract
generate_rulesinto smaller functions. Each pre-scan phase should return its lookup table:fn build_family_map(report: &AnalysisReport<TypeScript>) -> HashMap<String, String>; fn build_p0c_coverage(report: &AnalysisReport<TypeScript>) -> (HashSet<String>, HashSet<(String, String)>); fn consolidate_children_to_prop(report: &...) -> BTreeMap<(String, String), ChildrenToPropMigration>; fn collapse_constants(report: &..., ...) -> (Vec<KonveyorRule>, HashSet<...>, HashSet<...>);
This makes each phase independently testable and reduces
generate_rulesto an orchestrator. -
Split the file. Natural boundaries:
konveyor_rules.rs—generate_rules,api_change_to_rules,behavioral_change_to_rule,manifest_change_to_rulekonveyor_fix.rs—generate_fix_guidance,api_change_to_fix,behavioral_change_to_fix,manifest_change_to_fixkonveyor_classify.rs—classify_removed_props,find_sibling_replacement_in_report,build_migration_message_v2konveyor_output.rs—write_ruleset_dir, partition logickonveyor_util.rs—extract_clean_type,derive_import_path,extract_prop_name_from_signature, label functions
-
Replace string-typed fields with enums.
enum Mechanism { Prop, Children, Removed, Unmapped } enum FixStrategyKind { Rename, LlmAssisted, Manual, PropValueChange, ... }
-
Move
ChildrenToPropMigrationout ofgenerate_rules. Define it at module scope withpub(crate)visibility. -
Name the magic numbers.
const P0C_MIN_REMOVED: usize = 3; const P0C_MIN_RATIO: f64 = 0.5; const P0C_ABSOLUTE_MIN: usize = 5;
-
Add direct tests for
build_migration_message_v2. Test each branch (has migration target, fully removed, restructured with children, etc.) with minimalTypeSummaryfixtures. -
Reduce cloning. In
classify_removed_props, use&strreferences throughout and only produce ownedStringin the finalPropClassification. Similarly,generate_rulesclonesfrom_pkgper change — consider computing it once per file and borrowing.
-
Introduce a
RuleBuilderto replace the verboseKonveyorRule { ... }struct literals that repeat boilerplate (labels always include"source=semver-analyzer", effort defaults, etc.). -
Add property tests for regex construction. Ensure
regex_escape(s)always produces valid regex, andbuild_patternoutput compiles without panic for arbitrary symbol names. -
Log skipped changes. When
generate_rulesskips a change due to P0-C coverage or constant collapsing, emit atracing::trace!so developers can debug "why didn't rule X get generated?"
konveyor.rs is a domain-heavy, production-grade rule generator that solves a hard problem well: turning structured breaking-change data into actionable migration rules. The code is correct and well-instrumented with tracing. Its main weaknesses are size (10K+ lines in one file), monolithic function structure (one 2,000-line function), and stringly-typed domain values. Breaking it into smaller functions and files would dramatically improve readability and testability without changing any external behavior.