Skip to main content

Apache Avro

                                        Apache Avro

Apache Avro is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop. While it comes to serialize data in Hadoop(Data Serialization), 
Avro is the most preferred tool.



What is Data Serialization?

Basically, in order to translate data in the computer environment into binary or textual form, a mechanism is used that is what we call Data Serialization. 
Though, this serialization process helps to transport data over network or store in some persistent storage media in an easy way.Both Hadoop and Java offer serialization APIs, 
which are Java based. However, Avro is not only language independent but also it is schema-based.It is because Avro deals with data formats that can be further processed by multiple languages.

Key Points About Apache Avro

  • Avro is a language-neutral data serialization system.
  • Avro uses JSON based schemas.
  • Also, uses RPC calls to send data.
  • Here, during the data exchange, Schema’s sent.


Data Types in Avro




General Working of Avro

To use Avro, you need to follow the given workflow -

Step 1 - Create schemas. Here you need to design Avro schema according to your data.

Step 2 - Read the schemas into your program. It is done in two ways -

By Generating a Class Corresponding to Schema - Compile the schema using Avro. This generates a class file corresponding to the schema

By Using Parsers Library - You can directly read the schema using parsers library.

Step 3 - Serialize the data using the serialization API provided for Avro, which is found in the package org.apache.avro.specific.

Step 4 - Deserialize the data using deserialization API provided for Avro, which is found in the package org.apache.avro.specific.

Comments