Data Management Approaches | File-based vs Database
In this article, I am going to discuss the different approaches to Data Management. Basically, what we will discuss is, how we store data in earlier days and what problems we face, and how we overcome those problems using the Database approach.
In everyday life, we come across data. Data is the raw fact. Every day in our work or profession we gain data. We collect information. But what is the difference between data and information? We will make you learn about data.
Basically, all the facts about things are termed Data. We always deal with data. All the details around us are termed as data, like name, phone no, address. So, in simple words, we can say that it is a Raw Fact i.e. Characters, Numbers, special characters. For Example, Empid is data, Ename is data, Salary is data, DOJ is data, etc.
Data is never giving accurate or meaningful statements or information to users. For example, From the above data, we cannot say that whether Warner is the name of an employee, or name of a customer, or the name of a Product because Warner is simply data.
Among all, the meaningful data is called Information. We fetch only the information from all the facts. So, in simple words, we can say that processing the data or raw facts is called information. And the information will provide meaningful statements.
Note: information is always provided accurate or meaningful data of particular employee, customer, student, and product, etc. For example, from the information, we can say that Warner is the name of an Employee. 10022 is the Employee Id of the employee whose name is Miler.
Managing data is always a factor in our daily life events. We do different kinds of activities according to requirements. Some areas like data modeling, data mining, data integration, data governance, master data management, etc.
- Data modeling: In this concept data are being designed through the different models, the relationship between the data and other details are portrayed through this concept.
- Data Mining: It is used for transforming raw data into information. It has wide use in Industries. It is a major concept for handling data.
- Data integration: It combines different data from different sources and also analyzes those data for the processing of information.
- Data governance: Data handling policies are made under this concept; it also confirms data fetching consistency and other related issues.
There is another term called data quality management, for fixing errors and other issues of data.
Data Storages | Data Management Approaches:
It is a location where we can store data/information. We have different types of data storage.
- Books & Papers
- Flat file / Text files (File Management System)
- DBMS / Database (Software)
Disadvantages of Books & Papers:
- It is a completely manual process/system.
- It required more manpower.
- Maintenance is very cost
- There is no security
- Store a very small data/information
- Retrieval v is very difficult as well as time-consuming.
File-Based Approach for Data Management:
In the file management system, data can be stored in files with help of the Operating System. In the conventional method, data were being stored in files. Also, the fetching of data and modification of data is done with this file. Moreover, the files contain information with all other records.
Earlier in any enterprise, data fetching was a big issue. For every incident, one had to go through all the records. These records were being kept in files. A file is a collection of data.
The system of maintenance and managing the files is called a file system. This was to create and manage all the data. The conventional file system was an important part of any enterprise.
In a File-based system, every data is stored in the form of a file. The earlier system to the database was file-based systems. Previously database is using a file-based system. In this, a large number of files are needed to perform various tasks so, each and every data is stored in the form of a file only. Group of files used for storing data of an organization here different files are used to store a data of an organization. So multiple files will be used like file 1, file 2, file 3, ———- file n. for example, in an organization 1st file is for employee information 2nd file is for employee personal details 3rd file is for employee company related details, and so on. Each and every file is used to store different types of information. Here each file is independent of another file. One single file is called a Flat File. Each file contained and processed information for one specific task. All these files are designed by using C/C++ language. So, if you stored the information, complete information will be in the form of files then what are the drawbacks we’ll see.
What is a File?
A file is a collection of related data stored in memory. Each file is used to store different information. Here each file is independent of another file. One single file is called a Flat File.
Drawbacks of File-Based Approach for Data Management:
If you want to retrieve data from flat files then we must develop an application program in high-level languages whereas if you want to retrieve data from a database then we are using SQL language. For example, to retrieve data from flat files, we need to develop an application program by using HLL such as C, C++, Java, .Net, etc.
To retrieve data from Database, we use SQL queries such as Select * from <table name>;
Data Redundancy & Data Inconsistency:
These problems come into the picture when we store data in multiple files where the changes are made in one file will not be reflected in another copy of the file. So, Data Redundancy means duplicate data/information i.e. we can store the same information in multiple files and Data Inconsistency means data confusion.
But in the case of a database, we can maintain a number of copies of the same data, and still, the changes made in one copy will be reflected in another copy because internally maintain acid properties by default in the database.
Data redundancy means duplication of data values i.e.; the same information is duplicated in several files. This makes the data redundant; the same information appears in different files in different ways. If we maintain duplication then it means wastage of time, wastage of money, and storage space also. So, in your DBMS main drawback is redundancy.
Data Inconsistency means different copies of the same data are not matching. For Ex, in 1 file employee A’s phone no. is 9764734221 and in another file that employee A’s same phone number is having a different meaning (i.e., phone number is saved as an ID number). So, different copies of the same data are not matching, that is nothing but a data inconsistency. Same basic data existing in different files with different meanings then you can say that is a data inconsistency. Example: Phone no. of the customer is different at different files.
Data isolation means data is scattered in different files, and files in different formats, writing a new application program to retrieve data is difficult. Each and every file is formatting in a different way then retrieving information from these files is very difficult that is nothing but data isolation.
Data integrity means data values may need to satisfy some integrity constraints. For example, if you are maintaining some bank database so balance is one attribute so bank balance values, suppose it is maintaining some integrity constraints like each and every customer should have the 1000/- rs. Minimum balance so here bank balance value should be 1000/- rs. Minimum, this is nothing but the integrity constraints.
Example: If you want to fill some application form here age should be like 18 yrs. this is nothing but is some integrity constraints. So, each and every data value must satisfy some integrity constraints.
In the file-based approach to handling the above condition, we need to go through the program code whereas in the database approach we can declare integrity constraint along with the definition whereas in your file-based approach if you maintain some integrity constraint you need to write the programming code. In this database approach just simply, you can mention the integrity constraint along with the query language.
It is difficult to ensure atomicity in the file processing system. For example, two accounts are their A and B both are the customers, A and B both are having accounts and A wants to transfer 100/- rs. to B so here from A’s account 100/- rs. is deducted but it is not credited in the B’s account due to some failure, so that is nothing but atomicity.
Data Concurrent Access Violation:
If multiple users are updating the same data simultaneously, it will result in an inconsistent data state. In a file processing system, it is very difficult to handle using programming code.
Enforcing security constraints in a file processing system is very difficult. For example, in the banking system, payroll personal need only the part of the database that has information about various bank employees. They don’t need access to information about customer account. If you see in the bank if anybody asks the payroll information then like customer name, customer age, customer address, customer bank balance every information will be there so if I asked my details, I should see only my details if another person details, I am able to see then it is not maintaining security.
Data is never secure under books and flat-file whereas databases are providing an excellent concept is called a role-based security mechanism for accessing data from databases in a secure manner with the help of authentication and authorization.
Indexes are used for accessing data much faster but flat files do not provide any index mechanism whereas databases will provide an indexing mechanism. To access the required data from a location fastly indexing are used. The file is not supporting indexes.
So, organizations suffering from flat-file mechanisms to store data or information’s to overcome these problems. Organizations introduce special software which is used to store data permanently in secondary storage devices. This software is also called DBMS Software.
Database Approach for Data Management:
Considering all the above factors, there a need was created for better management of data. The situation demanded proper management of data. At this point of time, a new technology was introduced i.e. Database.
Storing data to a database, fetch from it, and updating the database is the main aim for more accuracy of data. The management system of this database is called a database management system. DBMS removes the main constraint for handling data. It provides data integrity, data consistency. Redundant data was also removed from it. It allows users to have a hassle-free process for data fetching.
It is a collection of inter-related data which contains the information of an organization/enterprise. It is obtained by collecting data from all the data sources of an organization. The database is a computer-based record-keeping system whose overall purpose is to record and maintain information. The database is a single, large repository of data that can be used simultaneously by many users.
It is a collection of interrelated information by using the database we can store, modify, select and delete data from the database in a secure manner.
Types of Databases:
- OLTP (Online Transaction Processing)
- OLAP (Online Analytical Processing)
OLTP: Organizations are maintaining OLTP for storing “day-to-day transactions information” i.e. basically using it for “running a business”. Example: SQL Server, Oracle, MySQL, etc.
OLAP: It is used for data analysis (or) data summarized (or) history of data of particular business. Example: Datawarehouse.
It is the software that is used to manage & maintain data/information in the database. By using DBMS, we can create new databases, new tables, insert, update, delete and select the data from the database.
User ——->DBMS——-> Curd operation ——>Database
Advantages of Database Approach for Data Management:
Program Data Independence: If a database approach is used, data is stored in a central location called a repository. The process of the database allows an organization’s data to change the database without modifying the application programs which are able to process this data.
Minimal Data Redundancy: Data redundancy exists when the same data are stored unnecessarily at different places. The database approach does not eliminate redundancy completely, but it provides the facilities to the designer to carefully control the amount of redundancy.
Improved Data Consistency: If the amount of data redundancy is controlled, it will reduce the data inconsistency also. It is also highly recommended to maintain the same version of data at all locations.
Improved Data Sharing: A database is designed as a sharable component. DBMS helps in creating an environment in which end users have better access to more data and better manages data. Users are allowed to utilize the services of the database by authentication and authorization.
Enforcement of Standards: To provide services to database management, every database administrator designs procedures & enforcement standards. Procedures are the instructions and rules that govern the design and use of a database system.
Improved Quality: The database approach provides an optimum number of tools & processes to improve data quality. Every data designer can specify a rule called integrity constraints which users can’t violate.
What are the Advantages of DBMS?
- To avoid data redundancy
- To avoid data inconsistency
- Easy to manipulate data
- Easy to accessing data
- Supporting data integrity rules (data validations)
- Supporting indexes mechanism
- Retrieval data is fast
- Supporting transactions with ACID properties
- Supporting data sharing
- Provide security to data (Authentication & Authorization)
The main advantages of DBMS are:
- a) Creating the database
- b) Retrieval of database
- c) Updating of database
The main motto of the database is to maintain the ACID Property of the database. What does really ACID mean?
- ‘A’ stands for Atomicity. All the data in the database is to be Atom in nature. Any kind of data redundancy is not acceptable in any condition. Duplicate data are to be removed from the database.
- ‘C’ stands for consistency. Any kind of inconsistency of data may lead to failure. so all the inconsistent data are to be removed from the database.
- ‘I’ stands for Integrity. Data are to be integrated in order to manage the stability of the database.
- The last part of the database is Durability. The effect of the change is made in the Database is to be sustained in terms of results. A durable result is desired to be found in the database.
In the next article, I am going to discuss Commonly used Database Management Terminology. Here, in this article, I try to explain the different data management approaches and why we should go for the Database approach for data management and I hope you enjoy this Data Management approach article.
About the Author: Pranaya Rout
Pranaya Rout has published more than 3,000 articles in his 11-year career. Pranaya Rout has very good experience with Microsoft Technologies, Including C#, VB, ASP.NET MVC, ASP.NET Web API, EF, EF Core, ADO.NET, LINQ, SQL Server, MYSQL, Oracle, ASP.NET Core, Cloud Computing, Microservices, Design Patterns and still learning new technologies.