Created
November 11, 2025 23:34
-
-
Save rjurney/10650063c1a4e54a4c8c95e8380b0945 to your computer and use it in GitHub Desktop.
SERF entity resolution - round two results
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ============================================================ | |
| 2025-11-11 15:33:37,107 - abzu.spark.er_eval - INFO - ENTITY RESOLUTION EVALUATION SUMMARY - ITERATION 2 | |
| 2025-11-11 15:33:37,107 - abzu.spark.er_eval - INFO - ============================================================ | |
| 2025-11-11 15:33:37,107 - abzu.spark.er_eval - INFO - Original raw companies (before matching): 13,641 unique | |
| 2025-11-11 15:33:37,107 - abzu.spark.er_eval - INFO - Companies that went into matching: 11,093 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Skipped (singletons/errors): 2,548 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - MATCHING RESULTS: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - BAML-processed companies: 4,930 unique | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Companies merged: 6,163 (55.56%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - IDs dropped by BAML: 82 (0.74%) - recovered via UUID tracking | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - FINAL OUTPUT: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Total companies: 7,478 (4,930 matched + 2,548 skipped) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Total reduction: 6,163 companies (45.18%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - UUID VERIFICATION (BAML-processed companies only): | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Overlap with original: 0 UUIDs (0.00%) - should be 0% | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Overlap with previous: 0 UUIDs (0.00%) - should be 0% | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - SOURCE UUID COVERAGE: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Original companies tracked: 11,253/13,641 (82.49%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Previous iteration tracked: 4,922/7,149 (68.85%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Total unique source_uuids: 13,912 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - SOURCE UUID VALIDATION: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Valid references: 21,410/21,410 (100.00%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Invalid references: 0/21,410 (0.00%) | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - RECOVERY STATISTICS: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Total recovered (match_skip=True): 2,548 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - BAML-processed records: 4,930 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Skipped in iteration 2: 2,548 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - Recovery reasons: | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - - Error recovery: 0 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - - Missing in match output (legacy): 82 | |
| 2025-11-11 15:33:37,108 - abzu.spark.er_eval - INFO - ============================================================ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment