Skip to content

Instantly share code, notes, and snippets.

@ceteri
Last active May 14, 2020 13:12
Show Gist options
  • Save ceteri/8ae5b9509a08c08a1132 to your computer and use it in GitHub Desktop.
Save ceteri/8ae5b9509a08c08a1132 to your computer and use it in GitHub Desktop.
Intro to Apache Spark: code example for RDD animation
// base RDD
lines = spark.textFile("hdfs://…")
// transformed RDDs
errors = lines.filter(lambda s: s.startswith("ERROR"))
messages = errors.map(lambda s: s.split("\t")[2])
messages.cache()
// actions
messages.filter(lambda s: "mysql" in s).count()
messages.filter(lambda s: "php" in s).count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment