Skip to content

Instantly share code, notes, and snippets.

@jwmatthews
Created May 7, 2026 17:59
Show Gist options
  • Select an option

  • Save jwmatthews/d2569da347dd6f6847b72e2d43148f38 to your computer and use it in GitHub Desktop.

Select an option

Save jwmatthews/d2569da347dd6f6847b72e2d43148f38 to your computer and use it in GitHub Desktop.

Analysis: crates/ts/src/konveyor.rs

Purpose

This file is the TypeScript-specific Konveyor rule generator for the semver-analyzer project. It transforms a structured analysis report (AnalysisReport<TypeScript>) — which describes breaking API changes, behavioral changes, and manifest changes between two versions of a TypeScript/React component library — into machine-readable Konveyor rules and fix guidance that downstream tooling (Kantra) uses to detect and auto-fix migration issues in consumer codebases.

In concrete terms: given a diff between, say, PatternFly v5 and v6, this module produces YAML rule files that tell a static analysis engine "if a consumer imports <Modal> from @patternfly/react-core, flag it and provide this migration message."


What It Does

The module has three primary responsibilities:

1. Rule Generation (generate_rules)

Converts every breaking change into a KonveyorRule — a structured detection rule with:

  • A regex pattern to match affected symbols in consumer code
  • A detection location (IMPORT, JSX_COMPONENT, JSX_PROP, filecontent)
  • A human-readable migration message
  • Labels for categorization (change-type, package, family, has-codemod)
  • A fix strategy (Rename, LlmAssisted, PropValueChange, Manual, etc.)

2. Dependency Update Rules (generate_dependency_update_rules)

Generates rules that detect outdated package.json dependencies and provide version-update fix strategies using frontend.dependency conditions.

3. Fix Guidance (generate_fix_guidance)

Produces a FixGuidanceDoc — a manifest of all detected changes with per-change fix strategies, confidence levels, search patterns, and replacement text for a downstream fix engine.

4. Ruleset Output (write_ruleset_dir)

Serializes rules into partitioned YAML files (API, CSS, composition, deps) inside a ruleset directory, matching the Konveyor ruleset specification.


How It Does It

Architecture: Multi-Pass Pipeline

The core generate_rules function (~2,000 lines) uses a multi-pass scan architecture:

Input: AnalysisReport<TypeScript>
                │
    ┌───────────┼───────────────────────┐
    │       Pre-scan passes             │
    │  ┌─────────────────────────┐      │
    │  │ 1. component→family map │      │
    │  │ 2. composition_required │      │
    │  │ 3. children→prop merge  │      │
    │  │ 4. P0-C coverage set    │      │
    │  │ 5. public_symbols set   │      │
    │  │ 6. constant collapsing  │      │
    │  │ 7. hierarchy coverage   │      │
    │  └─────────────────────────┘      │
    │               │                   │
    │       Main rule emission          │
    │  ┌─────────────────────────┐      │
    │  │ Per-file API changes    │      │
    │  │ Per-file behavioral     │      │
    │  │ Manifest changes        │      │
    │  │ P0-C composition rules  │      │
    │  └─────────────────────────┘      │
    └───────────────┼───────────────────┘
                    │
Output: (Vec<KonveyorRule>, HashMap<String, FixGuidanceEntry>)

Pre-scan passes build lookup tables and coverage sets that the main emission loop consults to:

  • Avoid duplicate rules (P0-C coverage suppresses individual prop rules)
  • Collapse high-cardinality changes (2,000+ token constants become one rule)
  • Consolidate related patterns (children→prop migrations become one parent-level rule)
  • Filter out internal/test symbols

Main emission iterates file changes and delegates to type-specific generators:

  • api_change_to_rules() — handles renamed, removed, type-changed, signature-changed symbols
  • behavioral_change_to_rule() — handles DOM, CSS, a11y, rendering changes
  • manifest_change_to_rule() — handles peer deps, module system, entry points

Key Helper Functions

Function Purpose
classify_removed_props() Maps removed props to child components using disposition data + name-suffix heuristics
build_migration_message_v2() Generates the human-readable migration message for component-level rules
find_sibling_replacement_in_report() Discovers implicit component merges (e.g., Text → Content) via rename correlation
detect_collapsible_constant_groups() Groups thousands of constant changes by (package, change_type, strategy)
derive_import_path() Resolves npm subpath imports (e.g., @pkg/core/deprecated) from qualified names
api_change_to_strategy() Maps API changes to fix strategies (Rename, ImportPathChange, CssVariablePrefix, etc.) — defined in konveyor_core
build_frontend_condition() Constructs the detection condition (frontend.referenced vs builtin.filecontent)

Patterns Used

1. Marker Type Parameterization

The module is parameterized over TypeScript — a marker type (zero-sized) that specializes generic core types (AnalysisReport<L>, BehavioralChange<L>, etc.). This is the phantom type pattern from semver_analyzer_core, allowing the core crate to define language-agnostic data structures while this module provides TypeScript-specific behavior.

2. Pre-scan / Coverage Set Pattern

Multiple pre-scan passes build HashSet<String> coverage sets (covered_components, covered_props, collapsed_symbols) that subsequent loops check via .contains(). This avoids duplicate/redundant rule emission without post-hoc deduplication.

3. Builder/Accumulator Pattern

generate_rules accumulates into Vec<KonveyorRule> and HashMap<String, FixGuidanceEntry> via .push() and .insert(), returning both as a tuple. Rules are built inline using struct literals with all fields specified.

4. Strategy Pattern (Data-Driven)

Fix strategies are represented as FixStrategyEntry structs with a string strategy field and optional mappings. The downstream fix engine selects behavior based on the strategy name — effectively a data-driven strategy pattern without trait objects.

5. Exhaustive Match on Enums

Functions like manifest_change_to_fix(), behavioral_category_label(), and manifest_effort() use exhaustive match on domain enums to ensure all variants are handled. This is idiomatic Rust and gives compile-time guarantees when new variants are added.

6. Heuristic Fallback Chains

Many classification decisions follow a priority chain:

  1. Explicit data (e.g., RemovalDisposition)
  2. Known member lookup
  3. Name-suffix heuristic
  4. Default/unmapped

This is visible in classify_removed_props() and find_sibling_replacement_in_report().


Idiomatic Rust Grade: B-

Strengths

  • Exhaustive match on enums throughout; the compiler enforces completeness
  • Correct lifetime usage in detect_collapsible_constant_groups<'a> — borrows report data without cloning
  • Standard library collections used appropriately (HashMap, HashSet, BTreeMap, BTreeSet)
  • Option combinators used well (filter_map, map, and_then, unwrap_or)
  • pub(crate) visibility for internal helpers — good encapsulation
  • Re-export pattern (pub use semver_analyzer_konveyor_core::*) consolidates the public API
  • tracing for structured logging — production-grade instrumentation

Shortcomings

  1. Stringly-typed strategies. Fix strategies use FixStrategyEntry::new("LlmAssisted"), "Rename", "Manual", etc. A typo compiles fine. An enum would catch mismatches at compile time.

  2. Mechanism field is a string. PropClassification.mechanism is String with values "prop", "children", "removed", "unmapped". This should be an enum.

  3. Clone-heavy code. Many name.clone(), pkg_name.clone(), from_pkg.clone() calls where borrows could suffice. The .clone() calls on String inside hot loops (e.g., per-prop classification) add allocation pressure.

  4. HashMap for small lookups. Several HashMaps (e.g., prop_dispositions, prop_to_absorber) hold only a handful of entries where a linear scan of a Vec or slice would be faster and avoid hashing overhead.

  5. No impl blocks. All functions are free-standing. Grouping related functions under impl KonveyorRule or introducing a RuleGenerator struct would improve discoverability and allow method chaining.

  6. Inconsistent error handling. generate_rules silently continues on errors (.unwrap_or, continue), while write_ruleset_dir returns Result<()> with anyhow::Context. The rule generation path would benefit from at least logging when it skips changes due to unexpected data shapes.

  7. String-based format for migration messages. Messages are built via format!() + push_str() chains. A structured message type that serializes to text would be more testable and make format changes safer.


Readability Grade: C+

Strengths

  • Section comments using // ── Section Name ── dividers make scrolling through the 10,000+ line file navigable
  • Doc comments on public functions explain purpose and return types
  • Inline rationale comments explain why decisions are made (e.g., why constants with migration_target are skipped)
  • Consistent naming conventionssnake_case functions, PascalCase types

Problems

  1. The file is 10,473 lines. This is the single biggest readability problem. A developer new to this codebase cannot hold the mental model of generate_rules (2,000+ lines, 7 pre-scan phases, multiple emission loops) in their head. Functions that exceed ~100 lines are hard to review; this function exceeds 2,000.

  2. generate_rules does too much. It handles pre-scans, constant collapsing, composition consolidation, P0-C rule generation, per-file API/behavioral/manifest rule generation, hierarchy delta coverage, and deprecated import handling — all in one function. Each pre-scan phase could be its own function returning its lookup table.

  3. Deep nesting. The composition consolidation loop (for file_changes → for comp_change → match → if → match → if) reaches 5-6 levels of indentation. The P0-C block has a similar nesting depth.

  4. Implicit dependencies between pre-scan phases. Phase 7 (hierarchy deltas) mutates covered_components and covered_props that were initialized in phase 4. A reader must trace these mutable sets across 300+ lines to understand which phases contribute to them.

  5. Magic numbers. CONSTANT_COLLAPSE_THRESHOLD is defined in konveyor_core but the threshold values for P0-C qualification (removed >= 3 && removal_ratio > 0.5 || removed >= 5) are inline. These should be named constants.

  6. Struct built inside a function. ChildrenToPropMigration is defined inside generate_rules. While valid Rust, it makes the function harder to scan — the struct definition breaks the flow of the pre-scan logic.


Testability Grade: B-

Strengths

  • 102 unit tests covering utility functions and rule generation scenarios
  • make_report helper reduces test boilerplate for constructing AnalysisReport instances
  • Tests verify rule IDs, categories, effort, labels, and condition types — good coverage of structural output properties
  • Tests for duplicate-ID handling (test_duplicate_rule_ids_get_suffix)

Problems

  1. generate_rules is untestable in isolation. Because it's one monolithic function, you can't test individual phases (e.g., "does the P0-C pre-scan correctly compute coverage?") without constructing a full AnalysisReport and inspecting the final output. The pre-scan logic is buried inside the function.

  2. No tests for heuristic edge cases. classify_removed_props has a name-suffix heuristic (name_lower.starts_with(&suffix)) but the test suite doesn't exercise ambiguous cases (what if two children share a suffix? what if the common prefix consumes the entire name?).

  3. build_migration_message_v2 is tested only via integration. The 280-line message builder has no direct unit tests verifying message format for each branch (migration target, fully removed, restructured, etc.).

  4. find_sibling_replacement_in_report scoring logic is untested. The quality scoring (before_contains_comp, prefix_ratio, etc.) determines which sibling is selected, but no tests exercise the ranking.

  5. No property-based or fuzz testing. The regex pattern construction (regex_escape, build_pattern, build_token_prefix_pattern) is safety-critical — a bad pattern causes missed detections or panics in the regex engine. Property-based tests would catch edge cases.


Recommended Improvements

High Impact

  1. Extract generate_rules into smaller functions. Each pre-scan phase should return its lookup table:

    fn build_family_map(report: &AnalysisReport<TypeScript>) -> HashMap<String, String>;
    fn build_p0c_coverage(report: &AnalysisReport<TypeScript>) -> (HashSet<String>, HashSet<(String, String)>);
    fn consolidate_children_to_prop(report: &...) -> BTreeMap<(String, String), ChildrenToPropMigration>;
    fn collapse_constants(report: &..., ...) -> (Vec<KonveyorRule>, HashSet<...>, HashSet<...>);

    This makes each phase independently testable and reduces generate_rules to an orchestrator.

  2. Split the file. Natural boundaries:

    • konveyor_rules.rsgenerate_rules, api_change_to_rules, behavioral_change_to_rule, manifest_change_to_rule
    • konveyor_fix.rsgenerate_fix_guidance, api_change_to_fix, behavioral_change_to_fix, manifest_change_to_fix
    • konveyor_classify.rsclassify_removed_props, find_sibling_replacement_in_report, build_migration_message_v2
    • konveyor_output.rswrite_ruleset_dir, partition logic
    • konveyor_util.rsextract_clean_type, derive_import_path, extract_prop_name_from_signature, label functions
  3. Replace string-typed fields with enums.

    enum Mechanism { Prop, Children, Removed, Unmapped }
    enum FixStrategyKind { Rename, LlmAssisted, Manual, PropValueChange, ... }

Medium Impact

  1. Move ChildrenToPropMigration out of generate_rules. Define it at module scope with pub(crate) visibility.

  2. Name the magic numbers.

    const P0C_MIN_REMOVED: usize = 3;
    const P0C_MIN_RATIO: f64 = 0.5;
    const P0C_ABSOLUTE_MIN: usize = 5;
  3. Add direct tests for build_migration_message_v2. Test each branch (has migration target, fully removed, restructured with children, etc.) with minimal TypeSummary fixtures.

  4. Reduce cloning. In classify_removed_props, use &str references throughout and only produce owned String in the final PropClassification. Similarly, generate_rules clones from_pkg per change — consider computing it once per file and borrowing.

Low Impact

  1. Introduce a RuleBuilder to replace the verbose KonveyorRule { ... } struct literals that repeat boilerplate (labels always include "source=semver-analyzer", effort defaults, etc.).

  2. Add property tests for regex construction. Ensure regex_escape(s) always produces valid regex, and build_pattern output compiles without panic for arbitrary symbol names.

  3. Log skipped changes. When generate_rules skips a change due to P0-C coverage or constant collapsing, emit a tracing::trace! so developers can debug "why didn't rule X get generated?"


Summary

konveyor.rs is a domain-heavy, production-grade rule generator that solves a hard problem well: turning structured breaking-change data into actionable migration rules. The code is correct and well-instrumented with tracing. Its main weaknesses are size (10K+ lines in one file), monolithic function structure (one 2,000-line function), and stringly-typed domain values. Breaking it into smaller functions and files would dramatically improve readability and testability without changing any external behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment