Skip to content

Instantly share code, notes, and snippets.

@RajaShyam
Created May 28, 2018 12:19
Show Gist options
  • Save RajaShyam/e0990290d345c44876769667e194cafa to your computer and use it in GitHub Desktop.
Save RajaShyam/e0990290d345c44876769667e194cafa to your computer and use it in GitHub Desktop.
Parquet Benfits:
===============
- Columnar storage
- Efficient storage
- Efficient data IO and cpu utilisation.
- Reads less no:of blocks
- Key concepts
Block size
Row Group - columns data
page
- Parquet uses default compression as GZIP
Save & write as parquet:
========================
df.save.parquet("path_to_save")
df.read.parquet("path_to_read")
Params:
=======
Change block size: Use parquet.block.size
size of the block and row group dictates overall performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment