Skip to content

Instantly share code, notes, and snippets.

@hclockzz
Last active March 27, 2020 18:49
Show Gist options
  • Save hclockzz/e820c195a0d08bc45aeee3247df9256d to your computer and use it in GitHub Desktop.
Save hclockzz/e820c195a0d08bc45aeee3247df9256d to your computer and use it in GitHub Desktop.
spark = SparkSession.builder\
.master("local")\
.appName("Structured Streaming - Twitter Sentiment")\
.getOrCreate()
pythonSchema = StructType() \
.add("id", StringType(), True) \
.add("tweet", StringType(), True) \
.add("ts", StringType(), True)
awsAccessKeyId = "" # update the access key
awsSecretKey = "" # update the secret key
kinesisStreamName = "" # update the kinesis stream name (need to set up the stream first and ingest data)
kinesisRegion = ""
kinesisDF = spark \
.readStream \
.format("kinesis") \
.option("streamName", kinesisStreamName)\
.option("region", kinesisRegion) \
.option("initialPosition", "LATEST") \
.option("format", "json") \
.option("awsAccessKey", awsAccessKeyId)\
.option("awsSecretKey", awsSecretKey) \
.option("inferSchema", "true") \
.load()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment