Last active
May 18, 2016 22:04
-
-
Save mhoye/99b10089cb53a038c942 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This will give all the ICSE2015/2016 papers filenames extracted from the titles of the PDF metadata, | |
# and put them in a folder on your desktop. It's been tested on OSX and depends on wget and pdfinfo, | |
# available via the macports xpdf package on pre-Yosemite OSX, and here for Yosemite users: | |
# | |
# ftp://ftp.foolabs.com/pub/xpdf/xpdfbin-mac-3.04.tar.gz | |
# | |
# It's a manual install, unfortunately, but I trust you're geared up for that. | |
# | |
# The username and password are available from these public-facing pdfs: | |
# http://atlantis.isti.cnr.it/ICSE2015ProgramBrochureOnLineVersion.pdf (2015 - icse15/conf15) | |
# http://2016.icse.cs.txstate.edu/static/downloads/conference-brochure.pdf (2016 - icse16/conf16) | |
# The "grep [ab]" part is what filters out the papers from schedules, posters and other PDFs, which | |
# may or may not have Title metadata. Because someone thought "we'll use the fifth character in the | |
# filename as meaningful metadata. That is entirely sane and reasonable." | |
# | |
# This script was first written in 2015, and has been updated for ICSE 2016 mostly by replacing all | |
# instances of "15" with "16". | |
# | |
# I wish I was joking. | |
wget -c --user=icse16 --password=conf16 http://conferences.computer.org/icse/2016/content/icsefull.zip && unzip -d icsefull icsefull.zip | |
cd icsefull/content/papers/ | |
mkdir sanity && cd sanity | |
for x in `ls ../*.pdf | grep [ab]`; do cp $x "`pdfinfo -meta $x | grep "^Title:" | sed 's/Title: //' | sed 's/\//-/g'`.pdf" ; done | |
cd .. && mv sanity ~/Desktop/ICSE2016 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's an alternative which..:
fuse-zip
so the archive never needs to be decompressed (except when actually reading a PDF from within it), avoiding the file size tripling of the original (zip + decompressed +cp
'd);