Skip to main content

Introduction

What is Big Data

So before I explain what is Big Data, let me also tell you what it is not! The most common myth associated with Big Data is that it is just about the size or volume of data. But actually, it’s not just about the “big” amounts of data being collected.


Big Data is much more than a collection of datasets with different formats, it is an important asset which can be used to obtain enumerable benefits.


"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.


Big data also encompasses a wide variety of data types, including:

-Structured data in SQL databases, data lakes and data warehouses;

-Unstructured data -- such as text and document files held in Hadoop clusters or NoSQL systems; and
-Semi-structured data -- such as web server logs or streaming data from sensors.

Further, big data includes multiple, simultaneous data sources, which may not otherwise be integrated.


Importance of big data

Big data has the potential to provide companies with valuable insights into their customers which can be used to refine marketing campaigns and techniques and increase customer engagement and conversion rates. Brands and businesses who utilize big data hold a competitive advantage over those who ignore the data since they have the ability to make faster and more informed business decisions.


The V's of Big Data

The above image depicts the five V’s of Big Data. 

Velocity

First let’s talk about velocity.  Obviously, velocity refers to the speed at which vast amounts of data are being generated, collected and analyzed.  Every day the number of emails, twitter messages, photos, video clips, etc. increases at lighting speeds around the world. Every second of every day data is increasing.  Not only must it be analyzed, but the speed of transmission, and access to the data must also remain instantaneous to allow for real-time access to website, credit card verification and instant messaging.  Big data technology allows us now to analyze the data while it is being generated, without ever putting it into databases.

Volume

Volume refers to the incredible amounts of data generated each second from social media, cell phones, cars, credit cards, M2M sensors, photographs, video, etc. The vast amounts of data have become so large in fact that we can no longer store and analyze data using traditional database technology.  We now use distributed systems, where parts of the data is stored in different locations and brought together by software.  With just Facebook alone there are 10 billion messages, 4.5 billion times that the “like” button is pressed, and over 350 million new pictures are uploaded every day.  Collecting and analyzing this data is clearly an engineering challenge of immensely vast proportions. 

Value

When we talk about value, we’re referring to the worth of the data being extracted.  Having endless amounts of data is one thing, but unless it can be turned into value it is useless.  While there is a clear link between data and insights, this does not always mean there is value in Big Data.  The most important part of embarking on a big data initiative is to understand the costs and benefits of collecting and analyzing the data to ensure that ultimately the data that is reaped can be monetized. 

Variety

Variety is defined as the different types of data we can now use.  Data today looks very different than data from the past.  We no longer just have structured data (name, phone number, address, financials, etc) that fits nice and neatly into a data table.  Today’s data is unstructured.  In fact, 80% of all the world’s data fits into this category, including photos, video sequences, social media updates, etc.  New and innovative big data technology is now allowing structured and unstructured data to be harvested, stored, and used simultaneously. 

Veracity


Last, but certainly not least there is veracity.  Veracity is the quality or trustworthiness of the data.  Just how accurate is all this data?  For example, think about all the Twitter posts with hash tags, abbreviations, typos, etc., and the reliability and accuracy of all that content.  Gleaning loads and loads of data is of no use if the quality or trustworthiness is not accurate.  Another good example of this relates to the use of GPS data.  Often the GPS will “drift” off course as you peruse through an urban area.  Satellite signals are lost as they bounce off tall buildings or other structures.  When this happens, location data has to be fused with another data source like road data, or data from an accelerometer to provide accurate data.

Big Data Analytics
Basically, Big Data Analytics is largely used by companies to facilitate their growth and development. This majorly involves applying various data mining algorithms on the given set of data, which will then aid them in better decision making.


There are multiple tools for processing Big Data such as Hadoop, Pig, Hive,Cassandra,Spark,Kafka etc. depending upon the requirement of the organisation.

Big Data Applications
These are some of the following domains where Big Data Applications has been revolutionized:

  • Entertainment
  • Insurance
  • Driver-less Cars
  • Education
  • Automobile
  • Government



Comments

Post a Comment