Created
April 17, 2017 06:27
-
-
Save jovianlin/b37b6734f920bf55cbd41464506ba310 to your computer and use it in GitHub Desktop.
How to run multiple jobs in one sparkcontext from separate threads in pyspark?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Soure http://stackoverflow.com/questions/30214474/how-to-run-multiple-jobs-in-one-sparkcontext-from-separate-threads-in-pyspark | |
# Prereqs: | |
# set | |
# spark.dynamicAllocation.enabled true | |
# spark.shuffle.service.enabled true | |
# in spark-defaults.conf | |
import threading | |
from pyspark import SparkContext, SparkConf | |
def task(sc, i): | |
print sc.parallelize(range(i*10000)).count() | |
def run_multiple_jobs(): | |
conf = SparkConf().setMaster('local[*]').setAppName('appname') | |
# Set scheduler to FAIR: http://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application | |
conf.set('spark.scheduler.mode', 'FAIR') | |
sc = SparkContext(conf=conf) | |
for i in range(4): | |
t = threading.Thread(target=task, args=(sc, i)) | |
t.start() | |
print 'spark task', i, 'has started' | |
run_multiple_jobs() | |
# OUTPUT: | |
# spark task 0 has started | |
# spark task 1 has started | |
# spark task 2 has started | |
# spark task 3 has started | |
# 30000 | |
# 0 | |
# 10000 | |
# 20000 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment