Rich Raposa from Hortonworks
- Not just a framework for storing & processing data
- More like an OS
- Write applications to run "on top of Hadoop"
- Toolbelt with 1 tool: mapreduce
- Batch applications
- 4000 node limit
- Not a bottleneck with multiple namespaces and federation
- Yet Another Resource Negotiator
- Complete re-imagination, re-architecture
- "Cluster Resource Management"
- MapReduce was being misapplied, e.g. LinkedIn uses it to do graph processing
- No longer tied to MapReduce now, still possible but no longer gets "grand treatment"
- Yarn sits between HDFS and MapReduce, MapReduce could be any data processing application
- HBase on Yarn
- Instead of installing HBase on cluster, you can run "disposable" HBase jobs on Yarn
- Normally a very big footprint, now that's gone
- "Customers are scratching their head wondering what to do with it"
- Not writing custom YARN applications
- Having hard time wrapping their heads around YARN
- But, excited about HBase and Spark on YARN
- It's a "brand new" OS
- Customers struggle with
- Cluster utilization
- Cluster access control
- Ecosystem
- Still pretty immature
- Doesn't have too many apps out there
- New Java process: Resource Manager
- JobTracker is gone
- TaskTrackers are gone
- Allocates resources with a "pluggable" scheduling algorithm
- No HA yet, does not scale yet. Just like JobTracker
- NodeManager: slave to resource manager, runs on every node that you'd like to allocated resources from
- Works in concert with ApplicationMasters
- Wait. RM's have ApplicationManagers that run and manage ApplicationMasters
- Wait. Runs on an arbitarary node in the cluster
- Always container 0
- Does the work a JobTracker would have done.
- Asks ResourcesManager for a "Container"
- Has an amount of memory, and a number of cores
- Very limited write now
- cgroups
- Can be always running: implies it can be on-demand lke jobs. Hmm.
- 1 per job.
- Data Locality
- DataNodes are gone
- Look at source for file input in Hadoop
- Look at source for MapReduce on YARN
- Ask for containers that contain specific data blocks - ???
- Upgrading "just the binary" from Hadoop 1.x is completely backwards compatible
- Caveat: Java code might need to be recompiled
- Writing a YARN application: (basically a framework)
- Write a Client: 100s of LOC, submits job (Resource) to RM
- Client sends ApplicationRequest, RM returns max resources to client (memory and cores per container)
- Client sends Resource request, creates ApplicationSubmissionContext (to RM) & ContainerLaunchContext (to NM)
- Client sets the command to run and queue to submit to
- Can talk directly to ApplicationMaster via URL provided by RM
- Client can "kill" a job
- Write an ApplicationMaster: Requests resources from RM, works with NMs to execute and monitor Containers
- Write a Client: 100s of LOC, submits job (Resource) to RM
- NM heartbeats to RM
- Deemed dead if no heartbeats for 10 mins
- RM tells NM to start/stop AM or kill Container
- RM generates secret key to manage access to containers: not security, resource control
- Distributed Shell application is a good starting base for a new YARN application
- 700 LOC, hello world of Yarn applications
- Log aggregations
- logs from all containers get put in a single location on HDFS
yarn logs
Sidebar: He asked if anybody has any questions. Somebody asked how YARN compared to Mesos, presenter reacted by making fun of him and accusing him of "having an agenda"
- Yahoo
- has a 10,000 node cluster
- was only using 55% of this cluster
- with YARN they got to 90% utilization at peak
- Memory is the only resource YARN currently knows about
- Cores not yet implemented
- but present in the API
- Queues
- configured with ACLs (in capacity-scheduler.xml)
- have capacity
- jobs are submitted here
- old JobTracker also had these
- Can enforce SLAs by killing containers
- Question: Multi-tenancy & security
- vs Fair Scheduler
- still exists
- works well with lots of short jobs
- favors killing jobs
- whereas capacity scheduler favors scheduling only things that are likely to complete
- Works in Hadoop 1.x
Sidebar: Giraph, "graph processing" without using Map Reduce, good candidate for a YARN (or Mesos, but he won't say that) framework.
Sidebar: Tez executes Directed Acyclic Graphs.
- NameNodes are independent: namespaces
- Allows multi-tenancy of sorts
- Use to require seperate clusters (e.g. 1 for engineering, 1 for r&d)
- Now NameNodes allow dynamic partitioning of a single cluster
- More efficient use of execess capacity
- But it isn't a hard partition, so doesn't provide security between namespaces
- Any application can grab any file from any namespace if it knows the URL
- You'd to have use Kerberos or something outside of Hadoop to prevent access to specific NamedNodes
- Balancer works with multiple NameNodes
- This is not specific to YARN
- NameNode HA is part of 2.0
- In the past you'd use Vmware & RedHat for HA, no longer needed
- 2 sets of identical hardware
- One becomes "active"
- One becomes "standby"
- Relies on ZooKeeper to track a lock?
- Some handwaving about quorums and majorities
- JournalNodes, many of these.
- Edits that are replicated to a majority of JournalNodes they are incorporated into Standby NN
- JournalNodes do not provide failure, they provide fault-tolerant replication to the standby
- With ZooKeeper you can do automatic failover (more hand waving)
- More about needing "odd numbers" to establish a majority (WAT? I need to learn more about this. But what if 1 node goes down and we're now even?)
- Fencing (for correct failover at client)
- "fence off" active: e.g. default kills the Java process for the old active NameNode
- can be any shell script
- fencing might fail if there's a network partition
- lots of hand waving, write your own scripts etc.
- to be fair, it's an "intro" talk
- Point in time copies,
- Read-only (copy on write?)
- Entire filesystem or subtree
- Instantaneous
- No adverse affect on regular HDFS operations
- Folders need to be made "snapshottable"
.....
My brain has panic'd due to an XML overdose.