Skip to content

Instantly share code, notes, and snippets.

@junkblocker
Forked from mbafford/README.md
Created July 4, 2024 15:57

Revisions

  1. Matthew Bafford revised this gist Jul 2, 2024. 1 changed file with 15 additions and 9 deletions.
    24 changes: 15 additions & 9 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -1,14 +1,20 @@
    PDF tools for comparing PDFs visually (overlaying two PDFs to see changed areas) and using a perceptual hash (numerical value indicating visual difference between the two files).

    Useful for command line review of PDFs and de-duplication. Configure `git` to use these tools for better PDF history / comparison in `git`.

    These scripts require `imagemagick` and `poppler`. Both installed from homebrew.

    ---

    Setup `git` to use a custom diff using:

    .gitattributes:
    `.gitattributes`:

    *.pdf binary diff=pdf
    *.pdf binary diff=pdf

    .gitconfig:
    `.gitconfig`:

    [diff "pdf"]
    ; textconv = ~/bin/pdf2layout
    command = ~/bin/git-diff-pdf


    These scripts require `imagemagick` and `poppler`. Both installed from homebrew.
    [diff "pdf"]
    ; textconv = ~/bin/pdf2layout
    command = ~/bin/git-diff-pdf

  2. Matthew Bafford revised this gist Jul 2, 2024. 2 changed files with 29 additions and 0 deletions.
    14 changes: 14 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,14 @@
    Setup `git` to use a custom diff using:

    .gitattributes:

    *.pdf binary diff=pdf

    .gitconfig:

    [diff "pdf"]
    ; textconv = ~/bin/pdf2layout
    command = ~/bin/git-diff-pdf


    These scripts require `imagemagick` and `poppler`. Both installed from homebrew.
    15 changes: 15 additions & 0 deletions git-diff-pdf
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,15 @@
    #!/bin/bash

    if [[ -z "$1" || -z "$2" ]]; then
    echo "Usage: $0 <pdf1> <pdf2>"
    exit 1
    fi

    echo "comparing [$1] and [$2]"

    # pdf2layout from poppler on homebrew (brew install poppler)
    echo "*** text content"
    diff <(~/bin/pdf2layout "$1") <(~/bin/pdf2layout "$2")

    echo "*** image perceptual hash"
    ~/bin/pdf-compare-phash "$1" "$2"
  3. Matthew Bafford revised this gist Jul 2, 2024. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions pdf-compare-phash.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,8 @@
    #!/bin/bash

    if [[ -z "$1" || -z "$2" ]]; then
    echo "Usage: $0 <pdf1> <pdf2>"
    exit 1
    fi

    convert -metric phash "$1" null: "$2" -compose Difference -layers composite -format '%[fx:mean]\n' info:
  4. Matthew Bafford created this gist Jul 2, 2024.
    52 changes: 52 additions & 0 deletions pdf-compare-visual.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,52 @@
    #!/bin/bash

    if [[ -z "$1" || -z "$2" ]]; then
    echo "Usage: $0 <pdf1> <pdf2>"
    exit 1
    fi

    TMP=$(mktemp --suffix=.png)
    echo "Comparing [$1] to [$2]"
    echo "Saving difference in $TMP"
    echo

    DENSITY=100

    # this supports both simple file names and page indexed file names like:
    # file[0] file[1] - will either return one line for each page, or a single
    # line if a single page is specified
    PAGES1=$(magick identify "$1" | wc -l)
    PAGES2=$(magick identify "$2" | wc -l)

    if (( PAGES1 != PAGES2 )); then
    echo "Number of pages between documents does not match: $PAGES1 != $PAGES2"
    echo "Only comparing the first page."

    magick compare -density "$DENSITY" -background white "$1[0]" "$2[0]" "$TMP"
    PHASH_DIFF=$(~/bin/pdf-compare-phash "$1[0]" "$2[0]")
    elif (( PAGES1 > 5 )); then
    echo "Too many pages ($PAGES1 > 5) to create hyper-image with all pages."
    echo "Only comparing first page."
    magick compare -density "$DENSITY" -background white "$1[0]" "$2[0]" "$TMP"
    PHASH_DIFF=$(~/bin/pdf-compare-phash "$1[0]" "$2[0]")
    else
    # convert the PDFs into a single image with the pages vertically stacked
    ALL1=$(mktemp --suffix=.png)
    magick convert -density "$DENSITY" "$1" -append "$ALL1"
    ALL2=$(mktemp --suffix=.png)
    magick convert -density "$DENSITY" "$2" -append "$ALL2"

    magick compare -density "$DENSITY" -background white "$ALL1" "$ALL2" "$TMP"
    PHASH_DIFF=$(~/bin/pdf-compare-phash "$ALL1" "$ALL2")
    fi

    if [ "$TERM_PROGRAM" = "iTerm.app" ]; then
    echo "Visual difference between images:"
    echo "--------------------------------"
    imgcat-small "$TMP"
    echo "--------------------------------"
    else
    open "$TMP"
    fi

    echo "Perceptual hash difference (0 is exactly the same): $PHASH_DIFF"