Big Data Technologies

Big Data Technologies

In this article, I am going to discuss Big Data Technologies. Please read our previous article, where we discussed the Architecture of Big Data. At the end of this article, you will understand everything about Big Data Technologies.

Big Data Technologies

Big data technologies are the bunch of utilities, tools used in big data to analyze, process, and extract data from various sources. Many new technologies are emerging day by day which are faster than any other traditional tools.

Types of Big Data Technologies:

Big Data Technology is mainly classified into two types:

  1. Operational Big Data Technologies
  2. Analytical Big Data Technologies
Operational Big Data Technologies

The operational big data is also called raw data. This can be collected from any source like online ticket booking, social media, online shopping websites, etc. This data is generated on a day-to-day basis.

Analytical Big Data technologies

It is an advanced level of operational big data technologies. Here the real-time business decisions are taken and analytical is done. Examples include analytical data like weather forecasts, patient’s health monitoring systems, etc.

Top Big Data Technologies

Top big data technologies are divided into 4 fields which are classified as follows:

  1. Data Storage
  2. Data Mining
  3. Data Analytics
  4. Data Visualization

Top Big Data Technologies

Data Storage

Hadoop

Hadoop Framework was designed to store and process data in a Distributed Data Processing Environment with commodity hardware with a simple programming model. It can Store and Analyse the data present in different machines with High Speeds and Low Costs.

  1. Developed by: Apache Software Foundation in the year 2011 10th of Dec.
  2. Written in: JAVA
  3. Current stable version: Hadoop 3.11
MongoDB

The NoSQL Document Databases like MongoDB, offer a direct alternative to the rigid schema used in Relational Databases. This allows MongoDB to offer Flexibility while handling a wide variety of Data Types at large volumes and across Distributed Architectures.

  1. Developed by: MongoDB in the year 2009 11th of Feb
  2. Written in: C++, Go, JavaScript, Python
  3. Current stable version: MongoDB 4.0.10
RainStor

RainStor is a software company that developed a Database Management System of the same name designed to Manage and Analyse Big Data for large enterprises. It uses Deduplication Techniques to organize the process of storing large amounts of data for reference.

  1. Developed by: RainStor Software company in the year 2004.
  2. Works like: SQL
  3. Current stable version: RainStor 5.5

Mining

Presto

Presto is an open-source Distributed SQL Query Engine for running Interactive Analytic Queries against data sources of all sizes ranging from Gigabytes to Petabytes. Presto allows querying data in Hive, Cassandra, Relational Databases, and Proprietary Data Stores.

  1. Developed by: Apache Foundation in the year 2013.
  2. Written in: JAVA
  3. Current stable version: Presto 0.22
RapidMiner

RapidMiner is a Centralized solution that features a very powerful and robust Graphical User Interface that enables users to Create, Deliver, and maintain Predictive Analytics. It allows creating very Advanced Workflows, Scripting support in several languages.

  1. Developed by: RapidMiner in the year 2001
  2. Written in: JAVA
  3. Current stable version: RapidMiner 9.2
Elasticsearch

Elasticsearch is a Search Engine based on the Lucene library. It provides a Distributed, MultiTenant-capable, Full-Text Search Engine with an HTTP Web Interface and Schema-free JSON documents.

  1. Developed by: Elastic NV in the year 2012.
  2. Written in: JAVA
  3. Current stable version: ElasticSearch 7.1

Analytics

Kafka

Apache Kafka is a Distributed Streaming platform. A streaming platform has Three Key Capabilities that are as follows:

  1. Publisher
  2. Subscriber
  3. Consumer

This is similar to a Message Queue or an Enterprise Messaging System.

  1. Developed by: Apache Software Foundation in the year 2011
  2. Written in: Scala, JAVA
  3. Current stable version: Apache Kafka 2.2.0
Splunk

Splunk captures, indexes, and correlates real-time data in a Searchable Repository from which it can generate Graphs, Reports, Alerts, Dashboards, and Data Visualizations. It is also used for Application Management, Security, and Compliance, as well as Business and Web Analytics.

  1. Developed by: Splunk INC in the year 2014 6th May
  2. Written in: AJAX, C++, Python, XML
  3. Current stable version: Splunk 7.3
Knime

KNIME allows users to visually create Data Flows, selectively execute some or All Analysis steps, and Inspect the Results, Models, and Interactive views. KNIME is written in Java and based on Eclipse and makes use of its Extension mechanism to add Plugins providing Additional Functionality.

  1. Developed by: KNIME in the year 2008
  2. Written in: JAVA
  3. Current stable version: KNIME 3.7.2

Visualization

Tableau

Tableau is a Powerful and Fastest growing Data Visualization tool used in the Business Intelligence Industry. Data analysis is very fast with Tableau and the Visualizations created are in the form of Dashboards and Worksheets.

  1. Developed by: TableAU 2013 May 17th
  2. Written in: JAVA, C++, Python, C
  3. Current stable version: TableAU 8.2
Plotly

Mainly used to make creating Graphs faster and more efficient. API libraries for Python, R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can also be used to style Interactive Graphs with Jupyter notebook.

  1. Developed by: Plotly in the year 2012
  2. Written in: JavaScript
  3. Current stable version: Plotly 1.47.4

In the next article, I am going to discuss the Big Data Challenges and Requirements. Here, in this article, I try to explain Big Data Technologies and I hope you enjoy this Big Data Technologies article.

Leave a Reply

Your email address will not be published. Required fields are marked *