Created
May 6, 2010 20:10
-
-
Save makoto/392643 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ruby clone of megruo map reduce sample file | |
# http://www.sevenforge.com/meguro | |
# | |
#!/usr/bin/ruby | |
require 'rubygems' | |
require 'json' | |
require 'pp' | |
timelines = File.open("stream.log").readlines | |
# mapper = {"#hello" => 1, "#hello" => 1, "#world" => 1} | |
mapper = [] | |
# start_time = Time.now | |
timelines.each{|value| | |
json = JSON.parse(value) | |
text = json["text"] | |
next unless text | |
words = text.split(/[\s\.:?!]+/) | |
words.each{|word| | |
mapper << { word => 1 } if word[0] && word[0].chr == '#' | |
} | |
} | |
# end_time = Time.now | |
mapper.reduce({}){|total, current| | |
key = current.keys.first | |
value = current.values.first | |
total[key] = total[key] ? total[key] + value : value | |
total | |
} | |
output = File.open("out.rb.out", "w") | |
mapper.each{|k, v| | |
output.write "#{k} \t\t #{v} \n" | |
} | |
# p "#{start_time} - #{end_time} (#{end_time - start_time} sec)" | |
# Result | |
# | |
# [tmp]$ time meguro -j our.js -o stream.log | |
# ---------------Meguro------------------ | |
# Javascript: our.js | |
# Mapper output: map.out | |
# Reducer output: reduce.out | |
# Number of threads: 2 | |
# Javascript runtime memory size: 95.37M | |
# Mapper buckets: 1.00M | |
# Mapper memory size: 64.00M | |
# Mapping stream.log: 100% | |
# Mapper Complete: 5.88K Emits | |
# Estimated Map File Size: 58.87K | |
# Reducing: 100% | |
# | |
# real 0m2.791s | |
# user 0m3.580s | |
# sys 0m0.536s | |
# | |
# [tmp]$ time ruby our.rb | |
# | |
# real 0m5.755s | |
# user 0m5.361s | |
# sys 0m0.298s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment