Skip to main content

Zookeeper

                                            Zookeeper

The ZooKeeper framework was originally built at “Yahoo!”. But after that for organizing services used by Hadoop, HBase, and other distributed frameworks, Apache ZooKeeper became a standard. For instance, to track the status of distributed data, Apache HBase uses ZooKeeper.

Now talking about Zookeeper, Apache Zookeeper is a coordination service for distributed application that enables synchronization across a cluster.

So, in case of Hadoop, ZooKeeper will help you with coordination between Hadoop nodes.

For example, it makes it easier to:

1. Manage configuration across nodes- If you have dozens or hundreds of nodes, it becomes hard to keep configuration in sync across nodes and quickly make changes. ZooKeeper helps you quickly push configuration changes.
2. Implement reliable messaging- With ZooKeeper, you can easily implement a producer/consumer queue that guarantees delivery, even if some consumers or even one of the ZooKeeper servers fails.
3. Implement redundant services- With ZooKeeper, a group of identical nodes (e.g. database servers) can elect a leader/master and let ZooKeeper refer all clients to that master server. If the master fails, ZooKeeper will assign a new leader and notify all clients.
4. Synchronize process execution-  With ZooKeeper, multiple nodes can coordinate the start and end of a process or calculation. This ensures that any follow-up processing is done only 
after all nodes have finished their calculations.



What is a Distributed Application?

In order to complete a particular task in a fast and efficient manner, a distributed application can run on multiple systems in a network at a given time (simultaneously). In addition, by configuring the distributed application to run on more systems, the time to complete the task can be further reduced. There is a cluster, which is basically a group of systems in which a distributed application is running. And in a cluster there are machines running, those machine running in a cluster is what we call a Node. What are the challenges Of Distributed Applications? As same as benefits, there are several challenges also: 1. Race Condition 2. Deadlock 3. Inconsistency

Architecture of Zookeeper

Zookeeper follows a Client-Server Architecture All systems store a copy of the data Leaders are elected at startup Server: The server sends an acknowledge when any client connects. In the case when there is no response from the connected server, the client automatically redirects the message to another server. Client: Client is one of the nodes in the distributed application cluster. It helps you to accesses information from the server. Every client sends a message to the server at regular intervals that helps the server to know that the client is alive. Leader: One of the servers is designated a Leader. It gives all the information to the clients as well as an acknowledgment that the server is alive. It would performs automatic recovery if any of the connected nodes failed. Follower: Server node which follows leader instruction is called a follower. Client read requests are handled by the correspondingly connected Zookeeper server The client writes requests are handled by the Zookeeper leader.

Ensemble/Cluster: Group of Zookeeper servers which is called ensemble or a Cluster. You can use ZooKeeper infrastructure in the cluster mode to have the system at the optimal value when you are running the Apache.

ZooKeeper WebUI: If you want to work with ZooKeeper resource management, then you need to use WebUI. It allows working with ZooKeeper using the web user interface, instead of using the command line. It offers fast and effective communication with the ZooKeeper application.


Concept
In addition, there are two modes in which Zookeeper runs: standalone and quorum. On defining Standalone mode, it has a single server, and ZooKeeper state is not replicated here. And, on defining quorum mode, in this mode there is a group of ZooKeeper servers, also what we call it ZooKeeper ensemble, which replicates the state, further, they serve client requests, together.
However, one ZooKeeper client is connected to one ZooKeeper server, at any given time.

As the best feature, each server handles a large number of client connections simultaneously. And, in a periodic manner, each client sends pings to the ZooKeeper server it is connected in order to make sure that it is alive and connected to the server. Further, with an acknowledgment of the ping, indicating the server is alive as well, the ZooKeeper server responds. However, the client connects to another server in the ensemble, when the client doesn’t receive an acknowledgment from the server within the specified time. As a result, the client session is transparently transferred over to the new ZooKeeper server.


Why Apache ZooKeeper?

Basically, to make coordination between (the group of nodes) and maintain shared data with robust synchronization techniques, a cluster uses an Apache ZooKeeper. Here we are listing the common services offered by ZooKeeper, such as - a. Naming service b. Configuration management c. Cluster management d. Leader election e. Locking and synchronization service f. The highly reliable data registry

Explain the CLI In Zookeeper?

In order to interact with the ZooKeeper ensemble for development purpose, we use ZooKeeper Command Line Interface (CLI). Firstly, turn on the ZooKeeper server (“bin/zkServer.sh start”) and then, the ZooKeeper client (“bin/zkCli.sh”), in order to perform ZooKeeper CLI operations.

Znodes

The term ZNode is referred to every node in a ZooKeeper tree. The main purpose of the Znode is to maintain a stat structure.

Types of Zookeeper Nodes

There are three types of Znodes: Persistence znode: This type of znode is alive even after the client which created that specific znode, is disconnected. By default, in zookeeper, all nodes are persistent if it is not specified. Ephemeral znode: This type of zookeeper znode are alive until the client is alive. Therefore, when the client gets a disconnect from the zookeeper, it will also be deleted. Moreover, ephemeral nodes are not allowed to have children. Sequential znode: Sequential znodes can be either ephemeral or persistent. So when a new znode is created as a sequential znode. You can assign the path of the znode by attaching a 10 digit sequence number to the original name.

Characteristics of Znodes

1. Watches 2. Data Acess 3. Epemeral Nodes 4. Sequence Nodes-Unique Naming

Comments