Introduction to Data Structure
Let us start with the Introduction to the data structure. Data is an integral part of our applications or programs. If we define a program, then a program is nothing but a set of instructions that perform operations on data to get some results. So, without data, there is no need or no use of instructions.
The term ‘data’, we use at many places like data structure, databases, data warehouse, big data, etc. So, from this what is actually data structure? For this, if you have a little bit of knowledge about all these things, that would be better. First, we will start with data structure and then we will also give a brief introduction to databases, data warehouses as well as big data.
What is the data structure?
Data Structure can be defined as an arrangement of a collection of data items so that they can be utilized efficiently i.e. operations on data can be done efficiently. So, Data Structure is all about the arrangement of data and efficient operations on the data.
But the question is where? The answer is, inside the main memory, during the execution of a program. So, without data structure there cannot be any application. Every application will have a set of instructions that perform operations on data. So, data is mandatory.
Where the data and program are kept?
The data is Kept inside the main memory as well as the program will also be kept inside the main memory. During the execution of a program how the program manages the data inside the main memory and performs the operation, which is called a data structure. Let us see an example and understand how the program utilizes data and how they put the data inside the main memory in detail. For this please have a look at the following diagram.
As you can in the above diagram, there are three components are as follows:
- CPU: CPU stands for the central processing unit. This is going to be our microprocessor or Processor. CPU is the processor that will execute our programs i.e. it will execute the instructions.
- Main Memory: This is also called RAM. The main memory is temporary memory or we can say working memory (it is also called primary memory)
- HDD Storage: This is our hard disk or external storage or simply called Storage where the program files and data files are stored. This is also called permanent memory.
If you look at your PC then the PC must have a processor, RAM, and hard disk or if you look at a mobile phone then there will be a processor, some RAM like 2GB, 3GB, or 4GB and there is some storage like 16GB, 32GB, and 128GB. Then the question is where do we keep our program?
When we install any program on our PC or on our mobile phone, then the program or application will get installed on the storage (HDD Storage), and in our diagram, we are showing it as program files.
Then where do we need to keep our data? We also keep our data on the hard disk (HDD Storage). Suppose, we have any photos or videos or any documents, that all will keep on our hard disk or in mobile phones it will keep in storage, and in our diagram, we are showing it as Data files.
Let us understand this with one example.
One of the commonly used applications nowadays is M.S. Word. Assume that there is an MS Word program file and one document file i.e. docx file. Now, through this example, we will see where the data structure comes into the picture and how MS Word may need data structure. For better understanding, please have a look at the following diagram.
Now, if we want to run MS Word on our P.C. or on our mobile phone, we are touching an icon or double-clicking an icon of MS Word. Once you touch or click the MS Word Icon, then the MS Word Program will be brought into the main memory. So, all the instructions of the MS Word program are brought into the main memory. In the Diagram, we are showing the instructions as a line but in reality, these are the machine language codes or instructions of the MS Word program. So, the point that you need to remember is, whenever you want to run an application, then the application code or the program code has to be brought into the main memory.
Once, the instruction set (Program code or application code in machine code) brought into the Main memory, then the CPU will start executing this MS Word application. And we will see a window appearing on the screen and then all the menu options come up and we can start using MS Word. For better understanding please have a look at the below diagram.
Now, if we want to open a document file in your MS Word application, Suppose, the MS Word application wants to access the Data of the Data file (document file), then this data also has to be brought into the main memory. For better understanding, please have a look at the following image.
From this, we can say that a program has to be brought into the main memory for its execution as well as data has to be brought into the main memory for processing on the data. So, the instructions perform the operation on the data.
The Program cannot directly process the data from the storage or from the hard disk. The Data must have to be brought into the main memory. From this, we can say that every application must deal with some data whether it is MS Word or Notepad. If it is notepad then the text file (.txt) has to be brought into the main memory.
How we can organize the data inside the main memory so that it can be efficiently used by the application?
The arrangement or organizing of data inside the main memory for efficient utilization by the application. That arrangement of data is called a Data Structure. So, the data structure is formed in the main memory during the execution time of a program. When the program runs, it needs the data.
Now, the question is how it will arrange the data in the main memory for performing its operations. That arrangement is called data structure and the Data Structure is a part of the running program. You may be knowing some data structures like arrays, linked lists, trees or hash tables, etc. Whatever the data structure is suitable, the application can use that particular data structure for arranging its data.
The Data may be text data or multimedia data like images or videos, a lot of contents may be there in the form of data. So, all those contents, how they are organized in the main memory, we have to design some data structure so that the application can use the data perfectly or more efficiently and the application should work faster or process faster over that data.
Without data structure, we cannot develop any application. Let me give some idea about the database,
When the data is larger in size or commercial data i.e. used in businesses like banks or retail stores or manufacturing firms, they will have a lot of data and they will have some organized data in the form of database tables or relational data and all those data is stored on the disk. For better understanding, please have a look at the below diagram.
Now, you can see there that on the disk or on the hard disk there is a table i.e. database table. Here, we are showing only one table but there may be many database tables and the data is organized in the form of tables and the tables are having a relationship between them.
Mostly, the commercial data is stored in the form of data tables. If any application or program is using that database, then the data has to be brought into the main memory, so that the application can use it. Then again, you need a data structure here. When you are pulling the data from the hard disk or from the storage to the main memory during execution, then we definitely need data structure. So, the arrangement of data there in the main memory is the data structure.
How the data is organized in form of a table on the disk is a database. This is just a brief introduction we are giving to the database. Database means arranging the data in some models like the relational model in the permanent storage so that it can be retrieved or accessed by the applications easily. That arrangement in the hard disk or in the permanent storage is called a database.
The commercial data i.e. the data used in businesses, will have a huge amount of data coming daily like many customers are making transactions or a lot of goods are manufactured and sold. So, the data size will be growing day by day. The large size data or very large data may not be used daily like 1-year old data or 10-year-old data that may not be used now.
The commercial data can be categorized into two that is operational data or legacy data.
- Operational data: The operational data that is used daily
- Legacy data: Old data
The legacy data can be kept stored somewhere and if required we can fetch that data and use it or we can say historical data. Those data can be kept on an array of the disk. The large size data which is kept there on the array disk is acting as historical data for any commercial firm, this is called a data warehouse
Most commercial firms will have their data warehouse that is helpful for analyzing the business or making the policies or starting a new trend or giving offers to the customers, dealing with the customers. So, the old data will help the organization in making decisions. This large size data is a data warehouse and the algorithms written for analyzing those are called data mining algorithms.
So, data structure inside the main memory during the execution. Database on the disk and large size data which is inactive and when required it is utilized that is Data Warehouse. For a better understanding of the above three terms please have a look at the following image.
With the start of the internet, huge-sized data is accumulating day by day on the internet. That data is about things, about peoples, about places. a lot of data is available on the internet and by analyzing that data we can take a lot of decisions that is for management, governance, and for businesses. The analysis is very useful for that data. Storing and utilizing that very large-sized data and that study is Big Data.
Why Data Structure?
In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
In simple words, Data Structure is nothing but it is a particular way of organizing and storing data in a computer so that it can be accessed and modified in an efficient and easy way.
Let us understand the above definition with some examples. For example, if you have a scenario where you want to read the data sequentially as shown in the below image. Here, you can create a linked list data structure. In the Linked list data structure, the data points to the next data.
Now another scenario where you need to display the organization structure where you have CEO, inside CEO, you have Manager and inside Manager, Technicians are there and inside technician, Helpers are there as shown in the below image. Then to represent such organization data, you need to go for Tree or graph data structure.
Let say you have another scenario, where you want the data in a queue way. Queue means in the sequence the data is pushed into the data structure; in the same sequence, it should be retrieved from the data structure as shown in the below image. In such cases, you can create a Queue Data Structure or FIFO data structure.
So, Data Structure is nothing but it is all about organizing and storing data as per scenarios and needs so that it can be accessed and modified in an efficient and easy way.
Advantages of Data Structures?
We need data structures, a few of them are as follows:
- They are essential ingredients in creating a fast and powerful algorithm.
- They help to manage and organize data.
- They make code cleaner and easier to understand.
In the next article, I am going to discuss the Physical vs Logical Data Structure in detail. Here, in this article, I try to give a brief introduction to Data Structure and I hope you enjoy this Introduction to Data Structure article. I would like to have your feedback. Please post your feedback, question, or comments about this Introduction to Data Structure article.
About the Author: Pranaya Rout
Pranaya Rout has published more than 3,000 articles in his 11-year career. Pranaya Rout has very good experience with Microsoft Technologies, Including C#, VB, ASP.NET MVC, ASP.NET Web API, EF, EF Core, ADO.NET, LINQ, SQL Server, MYSQL, Oracle, ASP.NET Core, Cloud Computing, Microservices, Design Patterns and still learning new technologies.