Last active
November 12, 2016 02:57
-
-
Save ravyg/e2195674c20a8b524688fccf76c5bdff to your computer and use it in GitHub Desktop.
Data Sciences Commands
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Command pipeline | |
# Non hadoop way using only linux commands. | |
cat data | map | sort | reduce | |
# some text to mapper. | |
echo "foo foo quux labs foo bar quux" | <dirto/>mapper.py | |
# Some text to mapper then sort and then reducer. | |
echo "foo foo quux labs foo bar quux" | <dirto/>mapper.py | sort-K1, 1 | <dirto/>reducer.py | |
# Doing with file. | |
cat myfilename.txt | <dirto/>mapper.py | sort-K1, 1 | <dirto/>reducer.py | |
# hadoop specific commands | |
# use shell script to start/stop hadoop. | |
# Copy the file to Hadoop | |
# HDFS: | |
>> hadoop dfs -copyFromLocal /my/tmp/location /user/hduser/guttenburg | |
>> hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \ | |
-file /home/hduser/mapper.py -mapper /home/hduser/mapper.py \ | |
-file /home/hduser/reducer.py -reducer /home/hduser/reducer.py \ | |
-input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For macOS | |
# Install homebrew | |
Install berkeley-db4 through brew | |
# Don't install using "berkeley-db" latest version it requires license. | |
brew install berkeley-db4 | |
#Set the BERKELEY_DB environment variable to point to the brew installation before running pip install bsddb3 | |
export BERKELEYDB_DIR=/usr/local/Cellar/berkeley-db4/4.8.30 | |
pip install bsddb3 | |
pip install gutenberg | |
# you should be good to go. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment