RajaShyam · May 28, 2018 12:19
diff --git a/Parquet_with_Spark b/Parquet_with_Spark
 Parquet Benfits:
 ===============
 - Columnar storage
 - Efficient storage
 - Efficient data IO and cpu utilisation.
 - Reads less no:of blocks
 - Key concepts
  Block size
  Row Group - columns data
  page
 - Parquet uses default compression as GZIP



 Save & write as parquet:
 ========================
 df.save.parquet("path_to_save")
 df.read.parquet("path_to_read")

 Params:
 =======
 Change block size: Use parquet.block.size
 size of the block and row group dictates overall performance.
	Parquet Benfits:
	===============
	- Columnar storage
	- Efficient storage
	- Efficient data IO and cpu utilisation.
	- Reads less no:of blocks
	- Key concepts
	Block size
	Row Group - columns data
	page
	- Parquet uses default compression as GZIP



	Save & write as parquet:
	========================
	df.save.parquet("path_to_save")
	df.read.parquet("path_to_read")

	Params:
	=======
	Change block size: Use parquet.block.size
	size of the block and row group dictates overall performance.