Skip to main content

Yarn

                                                 Yarn

YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as large-scale distributed operating system used for Big Data processing.


Why YARN?

In Hadoop version 1.0 which is also referred to as MRV1(MapReduce Version 1), MapReduce performed both processing and resource management functions. It consisted of a Job Tracker which was the single master. The Job Tracker allocated the resources, performed scheduling and monitored the processing jobs. It assigned map and reduce tasks on a number of subordinate processes called the Task Trackers. The Task Trackers periodically reported their progress to the Job Tracker. This design resulted in scalability bottleneck due to a single Job Tracker.


Architecture

YARN architecture basically separates resource management layer from the processing layer.

Apache Hadoop YARN Architecture consists of the following main components :


  • Resource Manager : Runs on a master daemon and manages the resource allocation in the cluster.
  • Node Manager : They run on the slave daemons and are responsible for the execution of a task on every single Data Node.
  • Application Master : Manages the user job lifecycle and resource needs of individual applications. It works along with the Node Manager and monitors the execution of tasks.
  • Container : Package of resources including RAM, CPU, Network, HDD etc on a single node.
The Resource Manager has two main components: Scheduler and Applications Manager. 1.The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. 2. The Applications Manager is responsible for accepting job-submissions, negotiating the first container for executing the application specific Application Master and provides the service for restarting the Application Master container on failure. The per-application Application Master has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

Application Workflow in Hadoop YARN

Client submits an application Resource Manager allocates a container to start Application Manager Application Manager registers with Resource Manager Application Manager asks containers from Resource Manager Application Manager notifies Node Manager to launch containers Application code is executed in the container Client contacts Resource Manager/Application Manager to monitor application’s status Application Manager unregisters with Resource Manager



Yarn Features

Scalability
Compatability
Cluster Utilization
Multi-tenancy

Tools for Yarn Developers


Comments