Last active
March 10, 2017 22:26
-
-
Save greglinch/608001fa0ae39834af18354c9e8c6f09 to your computer and use it in GitHub Desktop.
Set a list of congressional bio directory IDs in order to download members' photos. I used wget instead of requests because of a TLS handshake issue. For getting the IDs, see https://gist.github.com/greglinch/5197267b6ff8fcb19192ba5443f1f71d
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
# dimensions = '225x275' | |
dimensions = 'original' | |
## add a list of IDs here based on http://bioguide.congress.gov/biosearch/biosearch.asp | |
id_list = [] | |
images_downloaded = 0 | |
file_path = '~/Downloads/images/' | |
urls = '' | |
for bio_id in id_list: | |
img_url = 'https://theunitedstates.io/images/congress/%s/%s.jpg' % (dimensions, bio_id) | |
# print img_url | |
file_name = '%s.jpg' % (bio_id) | |
file = file_path + file_name | |
command = 'wget -O %s %s' % (file, img_url) | |
try: | |
os.system(command) | |
images_downloaded += 1 | |
# urls += img_url + ',' | |
except: | |
pass | |
# print 'Error:\t\t' + bio_id + '\n' | |
# print urls | |
print 'Images downloaded:\t\t' + str(images_downloaded) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment