mkdir spark_install && cd spark_install
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar -xzvf spark-2.1.0-bin-hadoop2.7.tgz
cd spark-2.1.0-bin-hadoop2.7/./bin/spark-shellIf you want to install python under your home dir, get the tarball from here and use ./configure --prefix=any/dir/of/your/choice/where/you/have/write/access . Then, you need to make install and add python's bin to the $PATH environment variable.
To install virtualenv
pip install virtualenv
cd ~virtualenv jupyter_pyspark
source jupyter_pyspark/bin/activate pip install numpy
pip install scipy
pip install scikit-learn
pip install pandasnano ~/.bashrcpaste the following in spark-2.1.0-bin-hadoop2.7/conf/spark-env.sh (this file doesn't originally exist, you have to create it)
export SPARK_HOME=/path/to/spark-2.1.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_DRIVER_PYTHON=/path/to/virtualenv/python27/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
export PYSPARK_PYTHON=/path/to/virtualenv/python27/bin/pythoncd spark_install/spark-2.1.0-bin-hadoop2.7
./bin/pyspark --master local[4]You can open a ssh tunnel as follows. This way, you can open the jupyter notebook in your local browser instead of having to use the browser on the remote machine via ssh -X. In case of the following tunnel, you need to open your local browser at http://localhost:8889 and enter the token printed in your terminal in the previous step.
ssh -N -f -L localhost:8889:localhost:8888 yourusername@remotehost(Above gist has been successfully tested with Ubuntu 14.04 LTS on Intel Xeon E5-2620 and Intel Celeron N3160)
Hi, thanks for this. Please could you tell me how to do:
park-2.1.0-bin-hadoop2.7/conf/spark-env.sh (this file doesn't originally exist, you have to create it)
?