Skip to content

Instantly share code, notes, and snippets.

@jklymak
Created March 8, 2026 15:57
Show Gist options
  • Select an option

  • Save jklymak/cb5ace18dd4d9b2359ec5f9aca1b2fda to your computer and use it in GitHub Desktop.

Select an option

Save jklymak/cb5ace18dd4d9b2359ec5f9aca1b2fda to your computer and use it in GitHub Desktop.

Hash-Based Baseline Image Storage for Matplotlib

Problem: Matplotlib's baseline image tests generate 507MB .git directory (40MB current baselines + 467MB history). With 2,330 baseline images and ~106 baseline-touching commits per year, this continues to grow, making FreeType updates painful and new contributor clones slow.

Solution: Store perceptual hashes (~200KB total) instead of images in the main repo. Download actual images on-demand only for test failures.


Core Architecture

Perceptual Hash Comparison

import imagehash
from PIL import Image

# Generate and compare hashes
baseline_hash = imagehash.phash(Image.open('baseline.png'))
generated_hash = imagehash.phash(Image.open('test_output.png'))

# Compare with tolerance (Hamming distance)
tolerance = 5  # bits difference allowed
if baseline_hash - generated_hash <= tolerance:
    # Test passes - images perceptually similar
    pass

Hash Properties:

  • Perceptual hash (not cryptographic): similar images = similar hashes
  • 64-bit hash = ~16 characters stored
  • Hamming distance comparison allows configurable tolerance
  • Tolerances: 0 (pixel-perfect), 1-3 (minor antialiasing), 5-8 (small differences), 10+ (likely failure)

Storage Schema: lib/matplotlib/tests/baseline_hashes.json

{
  "test_backend_pdf::test_kerning": {
    "primary": "a1b2c3d4e5f6g7h8",
    "variants": {
      "macos-arm64": "a1b2c3d4e5f7g7h8",
      "windows": "a1b2c3d4e5f9g7h8",
      "freetype-2.13": "a1b2c3d4e6f6g7h8"
    },
    "tolerance": 5,
    "metadata": {
      "created": "2024-01-15",
      "last_updated": "2026-03-01",
      "format": "pdf"
    }
  },
  "test_backend_pdf::test_hatching_legend": {
    "primary": "x9y8z7w6v5u4t3s2",
    "tolerance": 3,
    "metadata": {
      "created": "2023-05-20",
      "format": "pdf"
    }
  }
}

Platform Variant Handling

Automatic Tolerance: Most platform differences (antialiasing, minor font rendering) automatically handled by hash tolerance.

Explicit Variants: For legitimate platform-specific rendering:

  1. Detection: Test runs on macOS, generates hash a1b2c3d4e5f7g7h8
  2. Comparison: Primary hash a1b2c3d4e5f6g7h8 has distance = 1 (within tolerance!)
  3. CI Action: Detects acceptable-but-new hash, uploads image, requests approval
  4. Approval: Maintainer reviews side-by-side images, approves via comment
  5. Update: CI commits variant to baseline_hashes.json

Safety: New variants must be within tolerance of primary hash AND require human visual approval.


Developer Workflow

Creating New Baseline

# 1. Write test with @image_comparison decorator
@image_comparison(['my_new_test.pdf'])
def test_my_feature():
    # ... test code ...

# 2. Generate baseline locally
pytest test_backend_pdf.py::test_my_feature --accept-new-baseline

# This creates:
# - result_images/test_backend_pdf/my_new_test.pdf
# - Updates baseline_hashes.json with computed hash

# 3. Commit hash file (not the image!)
git add lib/matplotlib/tests/baseline_hashes.json
git commit -m "Add test_my_feature with baseline"
git push

# 4. CI automatically:
# - Runs test, generates image
# - Computes hash, verifies match with baseline_hashes.json
# - Uploads image to storage with hash-based filename
# - Posts image URL in PR for reviewer inspection

Updating Existing Baseline

# 1. Make code changes that affect rendering
# 2. Regenerate hash
pytest test_backend_pdf.py::test_kerning --accept-new-baseline

# 3. Review diff in baseline_hashes.json
git diff lib/matplotlib/tests/baseline_hashes.json

# 4. Commit and push - CI handles image upload

Running Tests Locally

# Normal test run (no image downloads)
pytest test_backend_pdf.py::test_kerning
# If hash matches: test passes immediately
# If hash misses: downloads baseline, compares pixels, shows diff

# Accept hash on mismatch (update local baseline)
pytest test_backend_pdf.py::test_kerning --accept-hash

# Accept platform variant
pytest test_backend_pdf.py::test_kerning --accept-hash-variant=macos-arm64

CI Workflow

Image Upload (PR CI)

# .github/workflows/tests.yml
- name: Upload new baseline images
  if: github.event_name == 'pull_request'
  run: |
    python tools/upload_baseline_images.py \
      --source result_images/ \
      --hashes lib/matplotlib/tests/baseline_hashes.json \
      --storage github-release
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Script Logic:

  1. Find all images in result_images/ from test run
  2. Check which hashes exist in baseline_hashes.json but not in storage
  3. Upload missing images: storage/{hash}.pdf
  4. Comment on PR with uploaded image list and preview URLs

Platform Variant Detection

⚠️ New platform variant detected for test_kerning
Platform: macos-arm64
Hash distance from primary: 1 (within tolerance 5)
Hash: a1b2c3d4e5f7g7h8

Side-by-side comparison:
Primary:  https://artifacts.github.com/{primary_hash}.pdf
Variant:  https://artifacts.github.com/{variant_hash}.pdf

To approve: Comment "@matplotlib-bot approve-hash test_kerning macos-arm64"

Implementation Files

1. lib/matplotlib/testing/compare.py (additions)

import imagehash
from PIL import Image

def compute_image_hash(image_path, hash_size=16):
    """Compute perceptual hash of an image."""
    img = Image.open(image_path)
    return str(imagehash.phash(img, hash_size=hash_size))

def load_baseline_hashes():
    """Load baseline_hashes.json from tests directory."""
    hash_file = Path(__file__).parent.parent / 'tests/baseline_hashes.json'
    if hash_file.exists():
        return json.loads(hash_file.read_text())
    return {}

def fetch_baseline_image(test_name, image_name, baseline_hash):
    """Download baseline image from storage if not cached locally."""
    cache_dir = Path.home() / '.matplotlib/baseline_cache'
    cache_dir.mkdir(parents=True, exist_ok=True)
    
    cached_path = cache_dir / f"{baseline_hash}.png"
    if cached_path.exists():
        return cached_path
    
    # Download from storage
    url = f"https://storage.example.com/baselines/{baseline_hash}.png"
    # ... download logic ...
    return cached_path

2. lib/matplotlib/testing/decorators.py (modifications)

def image_comparison(...):
    # Wrapper modification
    def compare_with_hash_first(fig, result_path, expected_path):
        # Compute hash of generated image
        result_hash = compute_image_hash(result_path)
        
        # Load expected hashes
        hashes = load_baseline_hashes()
        expected_hash_data = hashes.get(f"{test_module}::{test_name}")
        
        if expected_hash_data:
            # Try primary hash
            primary_hash = expected_hash_data['primary']
            tolerance = expected_hash_data.get('tolerance', 5)
            
            if hash_distance(result_hash, primary_hash) <= tolerance:
                return  # Test passes!
            
            # Try platform variants
            for variant_name, variant_hash in expected_hash_data.get('variants', {}).items():
                if hash_distance(result_hash, variant_hash) <= tolerance:
                    return  # Test passes!
            
            # Hash mismatch - download baseline for pixel comparison
            baseline_path = fetch_baseline_image(test_name, image_name, primary_hash)
        
        # Fall back to traditional pixel comparison
        compare_images(expected_path, result_path, tol=tol)

3. lib/matplotlib/tests/conftest.py (pytest configuration)

def pytest_addoption(parser):
    parser.addoption('--accept-new-baseline', action='store_true',
                     help='Accept new baseline images and update hashes')
    parser.addoption('--accept-hash', action='store_true',
                     help='Accept current output hash as new baseline')
    parser.addoption('--accept-hash-variant', 
                     help='Accept hash as platform variant (e.g., macos-arm64)')
    parser.addoption('--baseline-source', default='storage',
                     choices=['storage', 'ci-artifacts', 'local'],
                     help='Where to fetch baseline images from')

4. tools/upload_baseline_images.py (new)

#!/usr/bin/env python
"""Upload baseline images to storage for hash entries."""

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--source', required=True, help='result_images/ directory')
    parser.add_argument('--hashes', required=True, help='baseline_hashes.json path')
    parser.add_argument('--storage', choices=['github-release', 's3', 'gcs'])
    args = parser.parse_args()
    
    hashes = json.loads(Path(args.hashes).read_text())
    
    for test_name, hash_data in hashes.items():
        # Find corresponding image in result_images/
        # Compute its hash, verify match
        # Upload if not already in storage
        # ...

Migration Strategy

Phase 1: Infrastructure (1-2 weeks)

  • Add imagehash dependency
  • Implement hash computation/comparison in compare.py
  • Create baseline_hashes.json schema
  • Add pytest flags (--accept-new-baseline, etc.)

Phase 2: Storage Setup (1 week)

  • Decide: GitHub releases vs separate repo vs cloud storage
  • Implement upload_baseline_images.py script
  • Configure CI workflow for image uploads

Phase 3: Pilot Migration (2-3 weeks)

  • Migrate PDF backend tests (~20 tests)
  • Generate hashes for existing baselines
  • Upload existing images to storage
  • Test workflows on real PRs

Phase 4: Gradual Rollout (ongoing)

  • Migrate backend-by-backend
  • Monitor hash tolerance effectiveness
  • Collect platform variant data
  • Adjust tolerances per test as needed

Benefits

Repo Size: 507MB → ~50MB (remove baseline images, keep hashes)

Clone Speed: 40MB less data for new contributors

FreeType Updates: Update single JSON file instead of regenerating 100+ images

Platform Flexibility: Tolerance automatically handles minor differences

Developer UX: Same @image_comparison decorator, faster tests (no downloads on pass)

Backward Compatible: Traditional pixel comparison still available as fallback


Dependencies

  • imagehash library (MIT license, 10KB, no heavy deps)
  • pillow (already a matplotlib dependency)
  • Storage solution (GitHub releases = free, no new infrastructure)

Open Questions

  1. Storage location: GitHub releases (free, simple) vs dedicated repo (cleaner) vs cloud (more complex)?
  2. Hash tolerance defaults: Single global default or per-backend defaults?
  3. Variant approval: Fully automated for core devs or always require review?
  4. Migration timeline: All at once vs gradual backend-by-backend?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment