Analysis: `crates/ts/src/konveyor.rs`

Purpose

This file is the TypeScript-specific Konveyor rule generator for the semver-analyzer project. It transforms a structured analysis report (AnalysisReport<TypeScript>) — which describes breaking API changes, behavioral changes, and manifest changes between two versions of a TypeScript/React component library — into machine-readable Konveyor rules and fix guidance that downstream tooling (Kantra) uses to detect and auto-fix migration issues in consumer codebases.

In concrete terms: given a diff between, say, PatternFly v5 and v6, this module produces YAML rule files that tell a static analysis engine "if a consumer imports <Modal> from @patternfly/react-core, flag it and provide this migration message."

What It Does

The module has three primary responsibilities:

1. Rule Generation (`generate_rules`)

Converts every breaking change into a KonveyorRule — a structured detection rule with:

A regex pattern to match affected symbols in consumer code
A detection location (IMPORT, JSX_COMPONENT, JSX_PROP, filecontent)
A human-readable migration message
Labels for categorization (change-type, package, family, has-codemod)
A fix strategy (Rename, LlmAssisted, PropValueChange, Manual, etc.)

2. Dependency Update Rules (`generate_dependency_update_rules`)

Generates rules that detect outdated package.json dependencies and provide version-update fix strategies using frontend.dependency conditions.

3. Fix Guidance (`generate_fix_guidance`)

Produces a FixGuidanceDoc — a manifest of all detected changes with per-change fix strategies, confidence levels, search patterns, and replacement text for a downstream fix engine.

4. Ruleset Output (`write_ruleset_dir`)

Serializes rules into partitioned YAML files (API, CSS, composition, deps) inside a ruleset directory, matching the Konveyor ruleset specification.

How It Does It

Architecture: Multi-Pass Pipeline

The core generate_rules function (~2,000 lines) uses a multi-pass scan architecture:

Input: AnalysisReport<TypeScript>
                │
    ┌───────────┼───────────────────────┐
    │       Pre-scan passes             │
    │  ┌─────────────────────────┐      │
    │  │ 1. component→family map │      │
    │  │ 2. composition_required │      │
    │  │ 3. children→prop merge  │      │
    │  │ 4. P0-C coverage set    │      │
    │  │ 5. public_symbols set   │      │
    │  │ 6. constant collapsing  │      │
    │  │ 7. hierarchy coverage   │      │
    │  └─────────────────────────┘      │
    │               │                   │
    │       Main rule emission          │
    │  ┌─────────────────────────┐      │
    │  │ Per-file API changes    │      │
    │  │ Per-file behavioral     │      │
    │  │ Manifest changes        │      │
    │  │ P0-C composition rules  │      │
    │  └─────────────────────────┘      │
    └───────────────┼───────────────────┘
                    │
Output: (Vec<KonveyorRule>, HashMap<String, FixGuidanceEntry>)

Pre-scan passes build lookup tables and coverage sets that the main emission loop consults to:

Avoid duplicate rules (P0-C coverage suppresses individual prop rules)
Collapse high-cardinality changes (2,000+ token constants become one rule)
Consolidate related patterns (children→prop migrations become one parent-level rule)
Filter out internal/test symbols

Main emission iterates file changes and delegates to type-specific generators:

api_change_to_rules() — handles renamed, removed, type-changed, signature-changed symbols
behavioral_change_to_rule() — handles DOM, CSS, a11y, rendering changes
manifest_change_to_rule() — handles peer deps, module system, entry points

Key Helper Functions

Function	Purpose
`classify_removed_props()`	Maps removed props to child components using disposition data + name-suffix heuristics
`build_migration_message_v2()`	Generates the human-readable migration message for component-level rules
`find_sibling_replacement_in_report()`	Discovers implicit component merges (e.g., Text → Content) via rename correlation
`detect_collapsible_constant_groups()`	Groups thousands of constant changes by (package, change_type, strategy)
`derive_import_path()`	Resolves npm subpath imports (e.g., `@pkg/core/deprecated`) from qualified names
`api_change_to_strategy()`	Maps API changes to fix strategies (Rename, ImportPathChange, CssVariablePrefix, etc.) — defined in `konveyor_core`
`build_frontend_condition()`	Constructs the detection condition (frontend.referenced vs builtin.filecontent)

Patterns Used

1. Marker Type Parameterization

The module is parameterized over TypeScript — a marker type (zero-sized) that specializes generic core types (AnalysisReport<L>, BehavioralChange<L>, etc.). This is the phantom type pattern from semver_analyzer_core, allowing the core crate to define language-agnostic data structures while this module provides TypeScript-specific behavior.

2. Pre-scan / Coverage Set Pattern

Multiple pre-scan passes build HashSet<String> coverage sets (covered_components, covered_props, collapsed_symbols) that subsequent loops check via .contains(). This avoids duplicate/redundant rule emission without post-hoc deduplication.

3. Builder/Accumulator Pattern

generate_rules accumulates into Vec<KonveyorRule> and HashMap<String, FixGuidanceEntry> via .push() and .insert(), returning both as a tuple. Rules are built inline using struct literals with all fields specified.

4. Strategy Pattern (Data-Driven)

Fix strategies are represented as FixStrategyEntry structs with a string strategy field and optional mappings. The downstream fix engine selects behavior based on the strategy name — effectively a data-driven strategy pattern without trait objects.

5. Exhaustive Match on Enums

Functions like manifest_change_to_fix(), behavioral_category_label(), and manifest_effort() use exhaustive match on domain enums to ensure all variants are handled. This is idiomatic Rust and gives compile-time guarantees when new variants are added.

6. Heuristic Fallback Chains

Many classification decisions follow a priority chain:

Explicit data (e.g., RemovalDisposition)
Known member lookup
Name-suffix heuristic
Default/unmapped

This is visible in classify_removed_props() and find_sibling_replacement_in_report().

Idiomatic Rust Grade: B-

Strengths

Exhaustive match on enums throughout; the compiler enforces completeness
Correct lifetime usage in detect_collapsible_constant_groups<'a> — borrows report data without cloning
Standard library collections used appropriately (HashMap, HashSet, BTreeMap, BTreeSet)
Option combinators used well (filter_map, map, and_then, unwrap_or)
pub(crate) visibility for internal helpers — good encapsulation
Re-export pattern (pub use semver_analyzer_konveyor_core::*) consolidates the public API
tracing for structured logging — production-grade instrumentation

Shortcomings

Stringly-typed strategies. Fix strategies use FixStrategyEntry::new("LlmAssisted"), "Rename", "Manual", etc. A typo compiles fine. An enum would catch mismatches at compile time.
Mechanism field is a string. PropClassification.mechanism is String with values "prop", "children", "removed", "unmapped". This should be an enum.
Clone-heavy code. Many name.clone(), pkg_name.clone(), from_pkg.clone() calls where borrows could suffice. The .clone() calls on String inside hot loops (e.g., per-prop classification) add allocation pressure.
HashMap for small lookups. Several HashMaps (e.g., prop_dispositions, prop_to_absorber) hold only a handful of entries where a linear scan of a Vec or slice would be faster and avoid hashing overhead.
No impl blocks. All functions are free-standing. Grouping related functions under impl KonveyorRule or introducing a RuleGenerator struct would improve discoverability and allow method chaining.
Inconsistent error handling. generate_rules silently continues on errors (.unwrap_or, continue), while write_ruleset_dir returns Result<()> with anyhow::Context. The rule generation path would benefit from at least logging when it skips changes due to unexpected data shapes.
String-based format for migration messages. Messages are built via format!() + push_str() chains. A structured message type that serializes to text would be more testable and make format changes safer.

Readability Grade: C+

Strengths

Section comments using // ── Section Name ── dividers make scrolling through the 10,000+ line file navigable
Doc comments on public functions explain purpose and return types
Inline rationale comments explain why decisions are made (e.g., why constants with migration_target are skipped)
Consistent naming conventions — snake_case functions, PascalCase types

Problems

The file is 10,473 lines. This is the single biggest readability problem. A developer new to this codebase cannot hold the mental model of generate_rules (2,000+ lines, 7 pre-scan phases, multiple emission loops) in their head. Functions that exceed ~100 lines are hard to review; this function exceeds 2,000.
generate_rules does too much. It handles pre-scans, constant collapsing, composition consolidation, P0-C rule generation, per-file API/behavioral/manifest rule generation, hierarchy delta coverage, and deprecated import handling — all in one function. Each pre-scan phase could be its own function returning its lookup table.
Deep nesting. The composition consolidation loop (for file_changes → for comp_change → match → if → match → if) reaches 5-6 levels of indentation. The P0-C block has a similar nesting depth.
Implicit dependencies between pre-scan phases. Phase 7 (hierarchy deltas) mutates covered_components and covered_props that were initialized in phase 4. A reader must trace these mutable sets across 300+ lines to understand which phases contribute to them.
Magic numbers. CONSTANT_COLLAPSE_THRESHOLD is defined in konveyor_core but the threshold values for P0-C qualification (removed >= 3 && removal_ratio > 0.5 || removed >= 5) are inline. These should be named constants.
Struct built inside a function. ChildrenToPropMigration is defined inside generate_rules. While valid Rust, it makes the function harder to scan — the struct definition breaks the flow of the pre-scan logic.

Testability Grade: B-

Strengths

102 unit tests covering utility functions and rule generation scenarios
make_report helper reduces test boilerplate for constructing AnalysisReport instances
Tests verify rule IDs, categories, effort, labels, and condition types — good coverage of structural output properties
Tests for duplicate-ID handling (test_duplicate_rule_ids_get_suffix)

Problems

generate_rules is untestable in isolation. Because it's one monolithic function, you can't test individual phases (e.g., "does the P0-C pre-scan correctly compute coverage?") without constructing a full AnalysisReport and inspecting the final output. The pre-scan logic is buried inside the function.
No tests for heuristic edge cases. classify_removed_props has a name-suffix heuristic (name_lower.starts_with(&suffix)) but the test suite doesn't exercise ambiguous cases (what if two children share a suffix? what if the common prefix consumes the entire name?).
build_migration_message_v2 is tested only via integration. The 280-line message builder has no direct unit tests verifying message format for each branch (migration target, fully removed, restructured, etc.).
find_sibling_replacement_in_report scoring logic is untested. The quality scoring (before_contains_comp, prefix_ratio, etc.) determines which sibling is selected, but no tests exercise the ranking.
No property-based or fuzz testing. The regex pattern construction (regex_escape, build_pattern, build_token_prefix_pattern) is safety-critical — a bad pattern causes missed detections or panics in the regex engine. Property-based tests would catch edge cases.

Recommended Improvements

High Impact

Extract generate_rules into smaller functions. Each pre-scan phase should return its lookup table:

fn build_family_map(report: &AnalysisReport<TypeScript>) -> HashMap<String, String>;
fn build_p0c_coverage(report: &AnalysisReport<TypeScript>) -> (HashSet<String>, HashSet<(String, String)>);
fn consolidate_children_to_prop(report: &...) -> BTreeMap<(String, String), ChildrenToPropMigration>;
fn collapse_constants(report: &..., ...) -> (Vec<KonveyorRule>, HashSet<...>, HashSet<...>);

This makes each phase independently testable and reduces generate_rules to an orchestrator.

Split the file. Natural boundaries:
- konveyor_rules.rs — generate_rules, api_change_to_rules, behavioral_change_to_rule, manifest_change_to_rule
- konveyor_fix.rs — generate_fix_guidance, api_change_to_fix, behavioral_change_to_fix, manifest_change_to_fix
- konveyor_classify.rs — classify_removed_props, find_sibling_replacement_in_report, build_migration_message_v2
- konveyor_output.rs — write_ruleset_dir, partition logic
- konveyor_util.rs — extract_clean_type, derive_import_path, extract_prop_name_from_signature, label functions

Replace string-typed fields with enums.

enum Mechanism { Prop, Children, Removed, Unmapped }
enum FixStrategyKind { Rename, LlmAssisted, Manual, PropValueChange, ... }

Medium Impact

Move ChildrenToPropMigration out of generate_rules. Define it at module scope with pub(crate) visibility.

Name the magic numbers.

const P0C_MIN_REMOVED: usize = 3;
const P0C_MIN_RATIO: f64 = 0.5;
const P0C_ABSOLUTE_MIN: usize = 5;

Add direct tests for build_migration_message_v2. Test each branch (has migration target, fully removed, restructured with children, etc.) with minimal TypeSummary fixtures.
Reduce cloning. In classify_removed_props, use &str references throughout and only produce owned String in the final PropClassification. Similarly, generate_rules clones from_pkg per change — consider computing it once per file and borrowing.

Low Impact

Introduce a RuleBuilder to replace the verbose KonveyorRule { ... } struct literals that repeat boilerplate (labels always include "source=semver-analyzer", effort defaults, etc.).
Add property tests for regex construction. Ensure regex_escape(s) always produces valid regex, and build_pattern output compiles without panic for arbitrary symbol names.
Log skipped changes. When generate_rules skips a change due to P0-C coverage or constant collapsing, emit a tracing::trace! so developers can debug "why didn't rule X get generated?"

Summary

konveyor.rs is a domain-heavy, production-grade rule generator that solves a hard problem well: turning structured breaking-change data into actionable migration rules. The code is correct and well-instrumented with tracing. Its main weaknesses are size (10K+ lines in one file), monolithic function structure (one 2,000-line function), and stringly-typed domain values. Breaking it into smaller functions and files would dramatically improve readability and testability without changing any external behavior.

jwmatthews/konveyor-rs-analysis.md

Select an option

No results found

Select an option

No results found

Analysis: `crates/ts/src/konveyor.rs`

Purpose

What It Does

1. Rule Generation (`generate_rules`)

2. Dependency Update Rules (`generate_dependency_update_rules`)

3. Fix Guidance (`generate_fix_guidance`)

4. Ruleset Output (`write_ruleset_dir`)

How It Does It

Architecture: Multi-Pass Pipeline

Key Helper Functions

Patterns Used

1. Marker Type Parameterization

2. Pre-scan / Coverage Set Pattern

3. Builder/Accumulator Pattern

4. Strategy Pattern (Data-Driven)

5. Exhaustive Match on Enums

6. Heuristic Fallback Chains

Idiomatic Rust Grade: B-

Strengths

Shortcomings

Readability Grade: C+

Strengths

Problems

Testability Grade: B-

Strengths

Problems

Recommended Improvements

High Impact

Medium Impact

Low Impact

Summary

jwmatthews/konveyor-rs-analysis.md

Analysis: crates/ts/src/konveyor.rs

Purpose

What It Does

1. Rule Generation (generate_rules)

2. Dependency Update Rules (generate_dependency_update_rules)

3. Fix Guidance (generate_fix_guidance)

4. Ruleset Output (write_ruleset_dir)

How It Does It

Architecture: Multi-Pass Pipeline

Key Helper Functions

Patterns Used

1. Marker Type Parameterization

2. Pre-scan / Coverage Set Pattern

3. Builder/Accumulator Pattern

4. Strategy Pattern (Data-Driven)

5. Exhaustive Match on Enums

6. Heuristic Fallback Chains

Idiomatic Rust Grade: B-

Strengths

Shortcomings

Readability Grade: C+

Strengths

Problems

Testability Grade: B-

Strengths

Problems

Recommended Improvements

High Impact

Medium Impact

Low Impact

Summary

Analysis: `crates/ts/src/konveyor.rs`

1. Rule Generation (`generate_rules`)

2. Dependency Update Rules (`generate_dependency_update_rules`)

3. Fix Guidance (`generate_fix_guidance`)

4. Ruleset Output (`write_ruleset_dir`)