Created
January 31, 2011 07:26
-
-
Save mbuckbee/803737 to your computer and use it in GitHub Desktop.
Queries the RSS/XML feeds of Reddit to save post titles to a file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'open-uri' | |
require 'nokogiri' | |
count = 0 | |
def get_submissions(ident, count) | |
xml = Nokogiri::XML(open("http://www.reddit.com/new/.xml?count=25&after=t3_#{ident}")) | |
puts "\n\nhttp://www.reddit.com/new/.xml?count=25&after=t3_#{ident}" | |
titles = xml.xpath("//channel/item/title") | |
links = xml.xpath("//channel/item/link") | |
output = [] | |
titles.each do |title| | |
output << title.inner_text + "\n" | |
end | |
puts output.to_s | |
open('reddit_posts.txt', 'a') { |f| | |
f.puts output.to_s | |
} | |
# Retrieve the last reddit post identifer of the results page to use in the next query | |
ident = links[24].to_s.split("/")[6] | |
count = count + 1 | |
# Reddit API restriction of no more than 1 call / 2 seconds | |
sleep(5) | |
get_submissions(ident, count) if count < 500 | |
end | |
get_submissions("fb883",0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment