Created
December 14, 2021 22:48
-
-
Save letorbi/31a6ea3cdc41ebc80ad2c7f5351c0556 to your computer and use it in GitHub Desktop.
A shell script that tries to remove exploits and malware from PDFs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# References: | |
# https://security.stackexchange.com/questions/103323/effectiveness-of-flattening-a-pdf-to-remove-malware | |
# https://superuser.com/a/373740 | |
TEMPFILE=$(mktemp /tmp/pdfsanitize.XXXXXXXXX) | |
OUTFILE=${1/.PDF/.pdf} | |
OUTFILE=${OUTFILE/.pdf/_sanitized.pdf}; | |
# Re-write PDF and uncompress any images to remove image meta data (EXIF) | |
gs -sDEVICE=pdfwrite -dColorConversionStrategy=/LeaveColorUnchanged -dPassThroughJPEGImages=false -dPassThroughJPXImages=false -dEncodeColorImages=false -dEncodeGrayImages=false -dEncodeMonoImages=false -dNOPAUSE -dBATCH -sOutputFile="$TEMPFILE" "$1" | |
# Re-compress images and downgrade PDF version to destroy (hopefully) all malware and exploits | |
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sOutputFile="$OUTFILE" "$TEMPFILE" | |
# Clean up | |
rm "$TEMPFILE" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment