Skip to content

Instantly share code, notes, and snippets.

@ceteri
Last active May 14, 2020 13:12

Revisions

  1. ceteri revised this gist May 18, 2014. 1 changed file with 0 additions and 4 deletions.
    4 changes: 0 additions & 4 deletions 0.foo.java
    Original file line number Diff line number Diff line change
    @@ -1,4 +0,0 @@
    Tuple2 pair = new Tuple2(a, b);

    pair._1 // => a
    pair._2 // => b
  2. ceteri renamed this gist May 18, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion 0.foo.scala → 0.foo.java
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    val pair = (a, b)
    Tuple2 pair = new Tuple2(a, b);

    pair._1 // => a
    pair._2 // => b
  3. ceteri revised this gist May 18, 2014. 2 changed files with 4 additions and 4 deletions.
    4 changes: 0 additions & 4 deletions 0.foo.py
    Original file line number Diff line number Diff line change
    @@ -1,4 +0,0 @@
    pair = (a, b)

    pair[0] # => a
    pair[1] # => b
    4 changes: 4 additions & 0 deletions 0.foo.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    val pair = (a, b)

    pair._1 // => a
    pair._2 // => b
  4. ceteri revised this gist May 18, 2014. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions 0.foo.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    pair = (a, b)

    pair[0] # => a
    pair[1] # => b
  5. ceteri revised this gist May 18, 2014. 1 changed file with 7 additions and 0 deletions.
    7 changes: 7 additions & 0 deletions z.console.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,7 @@
    scala> messages.toDebugString
    res5: String =
    MappedRDD[4] at map at <console>:16 (1 partitions)
    MappedRDD[3] at map at <console>:16 (1 partitions)
    FilteredRDD[2] at filter at <console>:14 (1 partitions)
    MappedRDD[1] at textFile at <console>:12 (1 partitions)
    HadoopRDD[0] at textFile at <console>:12 (1 partitions)
  6. ceteri revised this gist May 18, 2014. 1 changed file with 5 additions and 6 deletions.
    11 changes: 5 additions & 6 deletions log.scala
    Original file line number Diff line number Diff line change
    @@ -2,17 +2,16 @@
    // then interactively search for various patterns

    // base RDD
    val lines = sc.textFile("logtxt")
    val lines = sc.textFile("log.txt")

    // transformed RDDs
    val errors = lines.filter(_.startsWith("ERROR") )
    errors = lines.filter(lambda s: s.startswith("ERROR"))
    messages = errors.map(lambda s: s.split("\t")[2])
    val errors = lines.filter(_.startsWith("ERROR"))
    val messages = errors.map(_.split("\t")).map(r => r(1))
    messages.cache()

    // actions
    messages.filter(lambda s: "mysql" in s).count()
    messages.filter(lambda s: "php" in s).count()
    messages.filter(_.contains("mysql")).count()
    messages.filter(_.contains("php")).count()



  7. ceteri revised this gist May 18, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion log.scala
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@
    // then interactively search for various patterns

    // base RDD
    val lines = sc.textFile("logftxt")
    val lines = sc.textFile("logtxt")

    // transformed RDDs
    val errors = lines.filter(_.startsWith("ERROR") )
  8. ceteri revised this gist May 18, 2014. 2 changed files with 16 additions and 2 deletions.
    13 changes: 11 additions & 2 deletions sample.scala → log.scala
    Original file line number Diff line number Diff line change
    @@ -2,13 +2,22 @@
    // then interactively search for various patterns

    // base RDD
    lines = spark.textFile("hdfs://…")
    val lines = sc.textFile("logftxt")

    // transformed RDDs
    val errors = lines.filter(_.startsWith("ERROR") )
    errors = lines.filter(lambda s: s.startswith("ERROR"))
    messages = errors.map(lambda s: s.split("\t")[2])
    messages.cache()

    // actions
    messages.filter(lambda s: "mysql" in s).count()
    messages.filter(lambda s: "php" in s).count()
    messages.filter(lambda s: "php" in s).count()



    val messages = errors.map(_.split("\t")).map(r => r(1))
    messages.cache()

    messages.filter(_.contains("mysql")).count()
    messages.filter(_.contains("php")).count()
    5 changes: 5 additions & 0 deletions log.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,5 @@
    ERROR php: dying for unknown reasons
    WARN dave, are you angry at me?
    ERROR did mysql just barf?
    WARN xylons approaching
    ERROR mysql cluster: replace with spark cluster
  9. ceteri revised this gist May 18, 2014. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions sample.scala
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,6 @@
    // load error messages from a log into memory
    // then interactively search for various patterns

    // base RDD
    lines = spark.textFile("hdfs://…")

  10. ceteri created this gist May 5, 2014.
    11 changes: 11 additions & 0 deletions sample.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,11 @@
    // base RDD
    lines = spark.textFile("hdfs://…")

    // transformed RDDs
    errors = lines.filter(lambda s: s.startswith("ERROR"))
    messages = errors.map(lambda s: s.split("\t")[2])
    messages.cache()

    // actions
    messages.filter(lambda s: "mysql" in s).count()
    messages.filter(lambda s: "php" in s).count()