Skip to content

Instantly share code, notes, and snippets.

@ceteri
Last active May 14, 2020 13:12
Show Gist options
  • Save ceteri/8ae5b9509a08c08a1132 to your computer and use it in GitHub Desktop.
Save ceteri/8ae5b9509a08c08a1132 to your computer and use it in GitHub Desktop.
Intro to Apache Spark: code example for RDD animation
// load error messages from a log into memory
// then interactively search for various patterns
// base RDD
val lines = sc.textFile("logtxt")
// transformed RDDs
val errors = lines.filter(_.startsWith("ERROR") )
errors = lines.filter(lambda s: s.startswith("ERROR"))
messages = errors.map(lambda s: s.split("\t")[2])
messages.cache()
// actions
messages.filter(lambda s: "mysql" in s).count()
messages.filter(lambda s: "php" in s).count()
val messages = errors.map(_.split("\t")).map(r => r(1))
messages.cache()
messages.filter(_.contains("mysql")).count()
messages.filter(_.contains("php")).count()
ERROR php: dying for unknown reasons
WARN dave, are you angry at me?
ERROR did mysql just barf?
WARN xylons approaching
ERROR mysql cluster: replace with spark cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment