RajaShyam · October 5, 2018 16:02
diff --git a/Spark memory model b/Spark memory model
 Notes taken from Spark summit 2018 Europe:(By Wenchen Fan, Databricks)

 Executor:
 =========
 1. Each executor contains Memory manager and Thread pool
 2. The 5 key areas in Memory model of executor are
   1. Data source - Such as json, csv, parquet etc
   2. Internal format - Data represented in Binary format
   3. Operators - Such as filter, join, substr, regexp etc..
   4. Memory manager - 
   5. Cache manager - 
   
 3. Internal format: Data is stored interms of Objects i.e binary data
   For ex: Row(123,"data","bricks") should have atleast 5 memory locations
   i.e Row is an object and required 1 memory location
       All 3 records are stored in an Array and it requires 1 mem location
       123 - Integer - 1 memory location
       "data" - String - 1 memory location
       "bricks" - String - 1 memory location
   1. Sort and Hash are 2 important algorithms used in Bigdata
   2. The Native sort: Each comparision needs to access 2 different memory locations, which makes it hard for CPU cache to pre-fetch data, poor cache locality
   3. Use cache-aware sort - Go through the key prefixes in a linear fashion, good cache locality.
   4. Native hash map - Each lookup needs many pointer dereferences and key comparision when hash collision happens, and jumps b/w 2 memory regions, bad cache locality.
   5. Cache aware hash map -
	Notes taken from Spark summit 2018 Europe:(By Wenchen Fan, Databricks)

	Executor:
	=========
	1. Each executor contains Memory manager and Thread pool
	2. The 5 key areas in Memory model of executor are
	1. Data source - Such as json, csv, parquet etc
	2. Internal format - Data represented in Binary format
	3. Operators - Such as filter, join, substr, regexp etc..
	4. Memory manager -
	5. Cache manager -

	3. Internal format: Data is stored interms of Objects i.e binary data
	For ex: Row(123,"data","bricks") should have atleast 5 memory locations
	i.e Row is an object and required 1 memory location
	All 3 records are stored in an Array and it requires 1 mem location
	123 - Integer - 1 memory location
	"data" - String - 1 memory location
	"bricks" - String - 1 memory location
	1. Sort and Hash are 2 important algorithms used in Bigdata
	2. The Native sort: Each comparision needs to access 2 different memory locations, which makes it hard for CPU cache to pre-fetch data, poor cache locality
	3. Use cache-aware sort - Go through the key prefixes in a linear fashion, good cache locality.
	4. Native hash map - Each lookup needs many pointer dereferences and key comparision when hash collision happens, and jumps b/w 2 memory regions, bad cache locality.
	5. Cache aware hash map -