Skip to content

Instantly share code, notes, and snippets.

@mbuckbee
Created January 31, 2011 07:26
Show Gist options
  • Save mbuckbee/803737 to your computer and use it in GitHub Desktop.
Save mbuckbee/803737 to your computer and use it in GitHub Desktop.
Queries the RSS/XML feeds of Reddit to save post titles to a file.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
count = 0
def get_submissions(ident, count)
xml = Nokogiri::XML(open("http://www.reddit.com/new/.xml?count=25&after=t3_#{ident}"))
puts "\n\nhttp://www.reddit.com/new/.xml?count=25&after=t3_#{ident}"
titles = xml.xpath("//channel/item/title")
links = xml.xpath("//channel/item/link")
output = []
titles.each do |title|
output << title.inner_text + "\n"
end
puts output.to_s
open('reddit_posts.txt', 'a') { |f|
f.puts output.to_s
}
# Retrieve the last reddit post identifer of the results page to use in the next query
ident = links[24].to_s.split("/")[6]
count = count + 1
# Reddit API restriction of no more than 1 call / 2 seconds
sleep(5)
get_submissions(ident, count) if count < 500
end
get_submissions("fb883",0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment