Created
April 16, 2016 14:22
-
-
Save geoHeil/3dff11860ae042792cea6970447c4592 to your computer and use it in GitHub Desktop.
Spark window function fails in SQL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.{ Row, SQLContext } | |
import org.apache.spark.sql.expressions.Window | |
val sql = new org.apache.spark.sql.SQLContext(sc) | |
val dataset = Seq( | |
("Thin", "cell phone", 6000), | |
("Normal", "tablet", 1500), | |
("Mini", "tablet", 5500), | |
("Ultra thin", "cell phone", 5000), | |
("Very thin", "cell phone", 6000), | |
("Big", "tablet", 2500), | |
("Bendable", "cell phone", 3000), | |
("Foldable", "cell phone", 3000), | |
("Pro", "tablet", 4500), | |
("Pro2", "tablet", 6500)) | |
val df = dataset.toDF("product", "category", "revenue") | |
df.show | |
df.where('category === "tablet").show | |
val overCategory = Window.partitionBy('category).orderBy('revenue.desc) | |
val rank = dense_rank.over(overCategory) | |
val ranked = df.withColumn("rank", dense_rank.over(overCategory)) | |
ranked.show | |
ranked.where('rank <= 2).show | |
################ | |
Plain SQL fails | |
################ | |
df.registerTempTable("productRevenue") | |
sql.sql("SELECT product,category,revenue FROM (SELECT product,category,revenue,dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank FROM productRevenue) tmp WHERE rank <= 2") | |
sql.sql("SELECT product,category,revenue,dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank FROM productRevenue") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I just found https://forums.databricks.com/questions/7409/can-not-run-code-in-blog-post-introducing-window-f.html apparently it only works in a Hive-context. Stil need to verify this.