-
-
Save jubos/393644 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/ruby | |
# Give this a file and an enlargement factor and pipe the resulting output to a bigger file | |
lines = File.open(ARGV.shift).readlines | |
factor = ARGV.shift.to_i | |
factor.times do | |
lines.each do |line| | |
puts line | |
end | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ruby clone of megruo map reduce sample file | |
# http://www.sevenforge.com/meguro | |
# | |
#!/usr/bin/ruby | |
require 'rubygems' | |
require 'json' | |
require 'pp' | |
timelines = File.open("stream.log").readlines | |
# mapper = {"#hello" => 1, "#hello" => 1, "#world" => 1} | |
mapper = [] | |
# start_time = Time.now | |
timelines.each{|value| | |
json = JSON.parse(value) | |
text = json["text"] | |
next unless text | |
words = text.split(/[\s\.:?!]+/) | |
words.each{|word| | |
mapper << { word => 1 } if word[0] && word[0].chr == '#' | |
} | |
} | |
# end_time = Time.now | |
mapper.reduce({}){|total, current| | |
key = current.keys.first | |
value = current.values.first | |
total[key] = total[key] ? total[key] + value : value | |
total | |
} | |
output = File.open("out.rb.out", "w") | |
mapper.each{|k, v| | |
output.write "#{k} \t\t #{v} \n" | |
} | |
# p "#{start_time} - #{end_time} (#{end_time - start_time} sec)" | |
# Result | |
# | |
# [tmp]$ time meguro -j our.js -o stream.log | |
# ---------------Meguro------------------ | |
# Javascript: our.js | |
# Mapper output: map.out | |
# Reducer output: reduce.out | |
# Number of threads: 2 | |
# Javascript runtime memory size: 95.37M | |
# Mapper buckets: 1.00M | |
# Mapper memory size: 64.00M | |
# Mapping stream.log: 100% | |
# Mapper Complete: 5.88K Emits | |
# Estimated Map File Size: 58.87K | |
# Reducing: 100% | |
# | |
# real 0m2.791s | |
# user 0m3.580s | |
# sys 0m0.536s | |
# | |
# [tmp]$ time ruby our.rb | |
# | |
# real 0m5.755s | |
# user 0m5.361s | |
# sys 0m0.298s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ time ruby count-hashtags.rb large-stream.log > output | |
f | |
real 4m4.605s | |
user 3m16.452s | |
sys 0m7.309s | |
$ time meguro -j count-hashtags.js large-stream.log | |
---------------Meguro------------------ | |
Javascript: count-hashtags.js | |
Mapper output: map.out | |
Reducer output: reduce.out | |
Number of threads: 2 | |
Javascript runtime memory size: 95.37M | |
Mapper buckets: 1.00M | |
Mapper memory size: 64.00M | |
Mapping large-stream.log: 100% | |
Mapper Complete: 176.55K Emits | |
Reducing: 100% | |
real 2m19.187s | |
user 3m9.071s | |
sys 0m17.464s | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment