Created
February 9, 2019 21:26
-
-
Save Zifah/ba0c3771069a11ba53969b000b038b82 to your computer and use it in GitHub Desktop.
Write data directly to an Azure blob storage container from an Azure Databricks notebook
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Configure blob storage account access key globally | |
spark.conf.set( | |
"fs.azure.account.key.%s.blob.core.windows.net" % storage_name, | |
sas_key) | |
output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name) | |
output_blob_folder = "%s/wrangled_data_folder" % output_container_path | |
# write the dataframe as a single file to blob storage | |
(dataframe | |
.coalesce(1) | |
.write | |
.mode("overwrite") | |
.option("header", "true") | |
.format("com.databricks.spark.csv") | |
.save(output_blob_folder)) | |
# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-') | |
files = dbutils.fs.ls(output_blob_folder) | |
output_file = [x for x in files if x.name.startswith("part-")] | |
# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container | |
# While simultaneously changing the file name | |
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path) |
I believe that was the same case when I ran the script for myself. One thing I would suggest is to write an additional script to delete the temporary files in the Azure blob once the data frame has been written to Azure successfully. I want to believe that there is a simple way that you can achieve that using some other dbutils method.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This code is creating so many temporary files. For instance _committed....., _started........... and success. How I can avoid this.