Skip to content

Instantly share code, notes, and snippets.

@FurcyPin
Created January 23, 2022 11:46
Show Gist options
  • Save FurcyPin/bf496fbef9c2a014488b647c1500101d to your computer and use it in GitHub Desktop.
Save FurcyPin/bf496fbef9c2a014488b647c1500101d to your computer and use it in GitHub Desktop.
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
spark = SparkSession.builder.master("local[1]").getOrCreate()
df = spark.sql("""
SELECT 1 as id, "Bulbasaur" as name, ARRAY("Grass", "Poison") as types, NULL as other_col
UNION ALL
SELECT 2 as id, "Ivysaur" as name, ARRAY("Grass", "Poison") as types, NULL as other_col
""")
df.select("id", "name", "types").createOrReplaceTempView("pokedex")
df2 = spark.sql("""SELECT * FROM pokedex""")\
.withColumn("nb_types", f.expr("SIZE(types)"))\
.withColumn("name", f.expr("LOWER(name)"))
df2.show()
# +---+---------+---------------+--------+
# | id| name| types|nb_types|
# +---+---------+---------------+--------+
# | 1|bulbasaur|[Grass, Poison]| 2|
# | 2| ivysaur|[Grass, Poison]| 2|
# +---+---------+---------------+--------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment