Strata 2014 Notes

Hadoop 2.0

Rich Raposa from Hortonworks

"Data Operating System"

Not just a framework for storing & processing data
More like an OS
Write applications to run "on top of Hadoop"

Hadoop 1.x

Toolbelt with 1 tool: mapreduce
Batch applications
4000 node limit
- Not a bottleneck with multiple namespaces and federation

YARN

Yet Another Resource Negotiator
Complete re-imagination, re-architecture
"Cluster Resource Management"
MapReduce was being misapplied, e.g. LinkedIn uses it to do graph processing
No longer tied to MapReduce now, still possible but no longer gets "grand treatment"
Yarn sits between HDFS and MapReduce, MapReduce could be any data processing application
HBase on Yarn
- Instead of installing HBase on cluster, you can run "disposable" HBase jobs on Yarn
- Normally a very big footprint, now that's gone
"Customers are scratching their head wondering what to do with it"
- Not writing custom YARN applications
- Having hard time wrapping their heads around YARN
- But, excited about HBase and Spark on YARN
- It's a "brand new" OS
Customers struggle with
- Cluster utilization
- Cluster access control
Ecosystem
- Still pretty immature
- Doesn't have too many apps out there
New Java process: Resource Manager
- JobTracker is gone
- TaskTrackers are gone
- Allocates resources with a "pluggable" scheduling algorithm
- No HA yet, does not scale yet. Just like JobTracker
- NodeManager: slave to resource manager, runs on every node that you'd like to allocated resources from
Works in concert with ApplicationMasters
- Wait. RM's have ApplicationManagers that run and manage ApplicationMasters
- Wait. Runs on an arbitarary node in the cluster
  - Always container 0
- Does the work a JobTracker would have done.
- Asks ResourcesManager for a "Container"
  - Has an amount of memory, and a number of cores
  - Very limited write now
  - cgroups
- Can be always running: implies it can be on-demand lke jobs. Hmm.
- 1 per job.
Data Locality
- DataNodes are gone
- Look at source for file input in Hadoop
- Look at source for MapReduce on YARN
- Ask for containers that contain specific data blocks - ???
Upgrading "just the binary" from Hadoop 1.x is completely backwards compatible
- Caveat: Java code might need to be recompiled
Writing a YARN application: (basically a framework)
1. Write a Client: 100s of LOC, submits job (Resource) to RM
  - Client sends ApplicationRequest, RM returns max resources to client (memory and cores per container)
  - Client sends Resource request, creates ApplicationSubmissionContext (to RM) & ContainerLaunchContext (to NM)
  - Client sets the command to run and queue to submit to
  - Can talk directly to ApplicationMaster via URL provided by RM
  - Client can "kill" a job
2. Write an ApplicationMaster: Requests resources from RM, works with NMs to execute and monitor Containers
NM heartbeats to RM
- Deemed dead if no heartbeats for 10 mins
- RM tells NM to start/stop AM or kill Container
RM generates secret key to manage access to containers: not security, resource control
Distributed Shell application is a good starting base for a new YARN application
- 700 LOC, hello world of Yarn applications
Log aggregations
- logs from all containers get put in a single location on HDFS
- yarn logs

Sidebar: He asked if anybody has any questions. Somebody asked how YARN compared to Mesos, presenter reacted by making fun of him and accusing him of "having an agenda"

Capacity Scheduler

Yahoo
- has a 10,000 node cluster
- was only using 55% of this cluster
- with YARN they got to 90% utilization at peak
Memory is the only resource YARN currently knows about
Cores not yet implemented
- but present in the API
Queues
- configured with ACLs (in capacity-scheduler.xml)
- have capacity
- jobs are submitted here
- old JobTracker also had these
Can enforce SLAs by killing containers
Question: Multi-tenancy & security
vs Fair Scheduler
- still exists
- works well with lots of short jobs
- favors killing jobs
- whereas capacity scheduler favors scheduling only things that are likely to complete
Works in Hadoop 1.x

Sidebar: Giraph, "graph processing" without using Map Reduce, good candidate for a YARN (or Mesos, but he won't say that) framework.

Sidebar: Tez executes Directed Acyclic Graphs.

HDFS Federation

NameNodes are independent: namespaces
Allows multi-tenancy of sorts
- Use to require seperate clusters (e.g. 1 for engineering, 1 for r&d)
- Now NameNodes allow dynamic partitioning of a single cluster
- More efficient use of execess capacity
- But it isn't a hard partition, so doesn't provide security between namespaces
- Any application can grab any file from any namespace if it knows the URL
- You'd to have use Kerberos or something outside of Hadoop to prevent access to specific NamedNodes
Balancer works with multiple NameNodes
This is not specific to YARN

HDFS High Availability

NameNode HA is part of 2.0
In the past you'd use Vmware & RedHat for HA, no longer needed
2 sets of identical hardware
- One becomes "active"
- One becomes "standby"
- Relies on ZooKeeper to track a lock?
Some handwaving about quorums and majorities
- JournalNodes, many of these.
- Edits that are replicated to a majority of JournalNodes they are incorporated into Standby NN
- JournalNodes do not provide failure, they provide fault-tolerant replication to the standby
- With ZooKeeper you can do automatic failover (more hand waving)
  - More about needing "odd numbers" to establish a majority (WAT? I need to learn more about this. But what if 1 node goes down and we're now even?)
Fencing (for correct failover at client)
- "fence off" active: e.g. default kills the Java process for the old active NameNode
- can be any shell script
- fencing might fail if there's a network partition
- lots of hand waving, write your own scripts etc.
- to be fair, it's an "intro" talk

HDFS Snapshots

Point in time copies,
Read-only (copy on write?)
Entire filesystem or subtree
Instantaneous
No adverse affect on regular HDFS operations
Folders need to be made "snapshottable"

.....

My brain has panic'd due to an XML overdose.

hiremaga/strata2014-yarn-tutorial.md