Skip to content

Instantly share code, notes, and snippets.

@hiremaga
Last active August 29, 2015 13:56
Show Gist options
  • Save hiremaga/8943428 to your computer and use it in GitHub Desktop.
Save hiremaga/8943428 to your computer and use it in GitHub Desktop.
Strata 2014 Notes

Hadoop 2.0

Rich Raposa from Hortonworks

"Data Operating System"

  • Not just a framework for storing & processing data
  • More like an OS
  • Write applications to run "on top of Hadoop"

Hadoop 1.x

  • Toolbelt with 1 tool: mapreduce
  • Batch applications
  • 4000 node limit
    • Not a bottleneck with multiple namespaces and federation

YARN

  • Yet Another Resource Negotiator
  • Complete re-imagination, re-architecture
  • "Cluster Resource Management"
  • MapReduce was being misapplied, e.g. LinkedIn uses it to do graph processing
  • No longer tied to MapReduce now, still possible but no longer gets "grand treatment"
  • Yarn sits between HDFS and MapReduce, MapReduce could be any data processing application
  • HBase on Yarn
    • Instead of installing HBase on cluster, you can run "disposable" HBase jobs on Yarn
    • Normally a very big footprint, now that's gone
  • "Customers are scratching their head wondering what to do with it"
    • Not writing custom YARN applications
    • Having hard time wrapping their heads around YARN
    • But, excited about HBase and Spark on YARN
    • It's a "brand new" OS
  • Customers struggle with
    • Cluster utilization
    • Cluster access control
  • Ecosystem
    • Still pretty immature
    • Doesn't have too many apps out there
  • New Java process: Resource Manager
    • JobTracker is gone
    • TaskTrackers are gone
    • Allocates resources with a "pluggable" scheduling algorithm
    • No HA yet, does not scale yet. Just like JobTracker
    • NodeManager: slave to resource manager, runs on every node that you'd like to allocated resources from
  • Works in concert with ApplicationMasters
    • Wait. RM's have ApplicationManagers that run and manage ApplicationMasters
    • Wait. Runs on an arbitarary node in the cluster
      • Always container 0
    • Does the work a JobTracker would have done.
    • Asks ResourcesManager for a "Container"
      • Has an amount of memory, and a number of cores
      • Very limited write now
      • cgroups
    • Can be always running: implies it can be on-demand lke jobs. Hmm.
    • 1 per job.
  • Data Locality
    • DataNodes are gone
    • Look at source for file input in Hadoop
    • Look at source for MapReduce on YARN
    • Ask for containers that contain specific data blocks - ???
  • Upgrading "just the binary" from Hadoop 1.x is completely backwards compatible
    • Caveat: Java code might need to be recompiled
  • Writing a YARN application: (basically a framework)
    1. Write a Client: 100s of LOC, submits job (Resource) to RM
      • Client sends ApplicationRequest, RM returns max resources to client (memory and cores per container)
      • Client sends Resource request, creates ApplicationSubmissionContext (to RM) & ContainerLaunchContext (to NM)
      • Client sets the command to run and queue to submit to
      • Can talk directly to ApplicationMaster via URL provided by RM
      • Client can "kill" a job
    2. Write an ApplicationMaster: Requests resources from RM, works with NMs to execute and monitor Containers
  • NM heartbeats to RM
    • Deemed dead if no heartbeats for 10 mins
    • RM tells NM to start/stop AM or kill Container
  • RM generates secret key to manage access to containers: not security, resource control
  • Distributed Shell application is a good starting base for a new YARN application
    • 700 LOC, hello world of Yarn applications
  • Log aggregations
    • logs from all containers get put in a single location on HDFS
    • yarn logs

Sidebar: He asked if anybody has any questions. Somebody asked how YARN compared to Mesos, presenter reacted by making fun of him and accusing him of "having an agenda"

Capacity Scheduler

  • Yahoo
    • has a 10,000 node cluster
    • was only using 55% of this cluster
    • with YARN they got to 90% utilization at peak
  • Memory is the only resource YARN currently knows about
  • Cores not yet implemented
    • but present in the API
  • Queues
    • configured with ACLs (in capacity-scheduler.xml)
    • have capacity
    • jobs are submitted here
    • old JobTracker also had these
  • Can enforce SLAs by killing containers
  • Question: Multi-tenancy & security
  • vs Fair Scheduler
    • still exists
    • works well with lots of short jobs
    • favors killing jobs
    • whereas capacity scheduler favors scheduling only things that are likely to complete
  • Works in Hadoop 1.x

Sidebar: Giraph, "graph processing" without using Map Reduce, good candidate for a YARN (or Mesos, but he won't say that) framework.

Sidebar: Tez executes Directed Acyclic Graphs.

HDFS Federation

  • NameNodes are independent: namespaces
  • Allows multi-tenancy of sorts
    • Use to require seperate clusters (e.g. 1 for engineering, 1 for r&d)
    • Now NameNodes allow dynamic partitioning of a single cluster
    • More efficient use of execess capacity
    • But it isn't a hard partition, so doesn't provide security between namespaces
    • Any application can grab any file from any namespace if it knows the URL
    • You'd to have use Kerberos or something outside of Hadoop to prevent access to specific NamedNodes
  • Balancer works with multiple NameNodes
  • This is not specific to YARN

HDFS High Availability

  • NameNode HA is part of 2.0
  • In the past you'd use Vmware & RedHat for HA, no longer needed
  • 2 sets of identical hardware
    • One becomes "active"
    • One becomes "standby"
    • Relies on ZooKeeper to track a lock?
  • Some handwaving about quorums and majorities
    • JournalNodes, many of these.
    • Edits that are replicated to a majority of JournalNodes they are incorporated into Standby NN
    • JournalNodes do not provide failure, they provide fault-tolerant replication to the standby
    • With ZooKeeper you can do automatic failover (more hand waving)
      • More about needing "odd numbers" to establish a majority (WAT? I need to learn more about this. But what if 1 node goes down and we're now even?)
  • Fencing (for correct failover at client)
    • "fence off" active: e.g. default kills the Java process for the old active NameNode
    • can be any shell script
    • fencing might fail if there's a network partition
    • lots of hand waving, write your own scripts etc.
    • to be fair, it's an "intro" talk

HDFS Snapshots

  • Point in time copies,
  • Read-only (copy on write?)
  • Entire filesystem or subtree
  • Instantaneous
  • No adverse affect on regular HDFS operations
  • Folders need to be made "snapshottable"

.....

My brain has panic'd due to an XML overdose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment