De-normalization in SQL Server

De-normalization in SQL Server with Example

In this article, I am going to discuss De-normalization in SQL Server with Example. Please read our previous article before proceeding to this article where we discussed normalization in SQL Server with one real-time example.

In our previous article, we discussed the advantages of normalization and we also discussed how normalization helps us to avoid redundant or error-prone data to enter into the database system. So, the three normal forms become the golden rules for database design. But the other side of the story that we also need to understand before applying these golden rules for database design. Before understanding the other part of the story, first, we need to understand the larger classification of IT systems from the data perspective.

Classification of Data:

From the data perspective, we can classify the IT system into two classifications. They are as follows:

  1. OLTP (Online Transaction Processing System)
  2. OLAP (Online Analytical Processing System)
OLTP (Online Transaction Processing System):

The OLTP Systems are those systems that deal with regular transactions. In other words, the insert, update and delete kind of queries are more suitable for OLTP systems i.e. all those operations which either bring data into the system or remove data from the system. So, when the data is coming into your system, you would like to avoid redundant data, erroneous data or you like to avoid data that does not have any kind of integrity and how we can achieve by using normalization. So, for OLTP systems definitely normalization i.e. applying 1st normal form, second normal form, and third normal form is the right way to go ahead why because we don’t data to enter into the system which is redundant, error-prone, duplicate, etc.

OLAP (Online Analytical Processing System)

Now, let us consider some different operations, which are not daily transaction operations. For example, the manager wants to take a large number of backdated historical data. He wants to analyze the data and he wants to do some forecasting on the data. That means he wants to so some kind of sales management on the data. In these kinds of operations, the data is not coming into the system, but there are some heavy operations done on the data such as selecting the data, creating a complicated report of data, etc. These kinds of systems are called OLAP systems.

Please have a look at the following image which summarizes the difference between OLTP and OLAP systems.

De-normalization in SQL Server with Example

Why we need de-normalization?

The main goal of the OLTP system is Data Manipulation whereas the main goal of the OLAP system is search and data analysis.

Normalization is not suited for the OLAP system as they bring down search performance. This is because in normalization we break down the table into multiple tables. In the OLAP system, our main goal is a faster search. As the search is now going and pull the data from multiple tables, the search start becoming slower.

So, in the OLAP system, rather than doing normalization we need to do the opposite i.e. de-normalization.

Why normalization is not suited for the OLAP system?

Let us understand why normalization is not suited for the OLAP system with an example. Please execute the following SQL Script to create the required database tables (Customers, Orders, and OrderDetail).

CREATE TABLE Customers
(
  CustomerID INT Primary Key,
  CustomerName Varchar(50)
)
GO

CREATE TABLE Orders
(
       OrderID INT Primary Key,
    CustomerID INT REFERENCES Customers(CustomerID),
    OrderDate DateTime
)
GO

CREATE TABLE OrderDetails
(
       OrderDetailsID INT Primary Key,
    OrderID INT REFERENCES Orders(OrderID),
    ProductID INT,
    UnitPrice INT,
    Quantity INT
)
GO

Please have a look at the following image which shows the database diagram of the three tables we just created. This is a perfect normalized database design example.

Why normalization is not suited for the OLAP system

As you can see in the above image, we have the Customers table which holds the customer name. We have Orders table which tells how much orders a customer has placed. We have one more table called OrderDetails which holds the detail of order like how much products the customer bought and the quantity of each product and its unit price etc.

Filling tables with data:

Let us fill all the three tables with some dummy data. Please execute the below SQL script.

-- Data into Customers table
INSERT INTO Customers VALUES (1, 'James');
INSERT INTO Customers VALUES (2, 'Pam');
INSERT INTO Customers VALUES (3, 'Sara');
INSERT INTO Customers VALUES (4, 'David');
INSERT INTO Customers VALUES (5, 'John');
INSERT INTO Customers VALUES (6, 'Smith');

-- Data into Orders table
INSERT INTO Orders VALUES(101, 1, GETDATE());
INSERT INTO Orders VALUES(102, 6, GETDATE());
INSERT INTO Orders VALUES(103, 5, GETDATE());
INSERT INTO Orders VALUES(104, 2, GETDATE());
INSERT INTO Orders VALUES(105, 4, GETDATE());
INSERT INTO Orders VALUES(106, 3, GETDATE());

-- Data into Order Details table
INSERT INTO OrderDetails VALUES(1, 101, 1001, 100, 2);
INSERT INTO OrderDetails VALUES(2, 101, 1002, 200, 3);
INSERT INTO OrderDetails VALUES(3, 102, 1001, 100, 2);
INSERT INTO OrderDetails VALUES(4, 102, 1002, 200, 3);
INSERT INTO OrderDetails VALUES(5, 102, 1003, 300, 1);
INSERT INTO OrderDetails VALUES(6, 103, 1002, 200, 2);
INSERT INTO OrderDetails VALUES(7, 104, 1001, 100, 2);
INSERT INTO OrderDetails VALUES(8, 104, 1003, 300, 2);
INSERT INTO OrderDetails VALUES(9, 105, 1002, 200, 4);
INSERT INTO OrderDetails VALUES(10, 105, 1003, 300, 5);
INSERT INTO OrderDetails VALUES(11, 106, 1001, 100, 3);
INSERT INTO OrderDetails VALUES(12, 106, 1002, 200, 2);
INSERT INTO OrderDetails VALUES(13, 106, 1003, 300, 2);
Creating Report:

Now, we need to create a very simple report as shown below. We want to show the Customer ID, Name, the unit price, quantity and the total amount (Unit Price * Quantity).

De-normalization example in SQL Server

To achieve the above report, what we need to do is, we need to join the three tables as shown below.

SELECT C.CustomerID, C.CustomerName, OD.UnitPrice, OD.Quantity, (OD.UnitPrice * OD.Quantity) AS TotalAmount
FROM Customers C
INNER JOIN Orders O ON C.CustomerID = O.CustomerID
INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

Ok. Let us execute the above query using the statistics as shown below.

SET STATISTICS IO ON

SELECT C.CustomerID, C.CustomerName, OD.UnitPrice, OD.Quantity, (OD.UnitPrice * OD.Quantity) AS TotalAmount
FROM Customers C
INNER JOIN Orders O ON C.CustomerID = O.CustomerID
INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID

Once you execute the above query, open the message window that appears next to the result window as shown below.

Report with normalization in SQL Server

As you can see in the above image, in order to get the data, it makes inner join with the Customers, Orders and OrderDetails tables. As well as you can see there are lots of logical reads and physical reads are also happening.

In OLAP systems, the search operation should be faster. Reading the data from so many tables is not an efficient way, it definitely slowdowns your select query. So, how about rather than keeping the data in all these tables, combine them and put it into one table i.e. some de-normalized table. So, what happens is, rather than going and make a select query into these three tables, we can go and make the select query on that single de-normalized table which can improve the performance of the search operation.

Creating a de-normalized table:

There are multiple ways to create de-normalized tables that we will discuss in our next article. Here, to make the thing simple what I am doing is, I am creating a de-normalized table by executing the following query.

SELECT * INTO DenormalizedCustomer FROM (SELECT C.CustomerID, C.CustomerName, OD.UnitPrice, OD.Quantity, (OD.UnitPrice * OD.Quantity) AS TotalAmount
FROM Customers C
INNER JOIN Orders O ON C.CustomerID = O.CustomerID
INNER JOIN OrderDetails OD ON O.OrderID = OD.OrderID ) Tab1

The above query will create the DenormalizedCustomer table with the required data. Now execute the following query with statistics which will give you the same output as the previous inner join example.

SET STATISTICS IO ON
SELECT * FROM DenormalizedCustomer

Now it will get the data from a single table rather than three tables. Once you execute the query open the message window as shown below.

Report with de-normalization in SQL Server

Now, you can see there is only one logical read which ultimately improves the search performance.

What is De-normalization in SQL Server?

De-normalization is all about combining the data into one big table rather than going and fetching data from multiple tables. In de-normalization, we have duplicate data, redundancy data, aggregated data, etc. i.e. we actually violated the three normal forms.

It does not mean that when you want faster searches you start applying de-normalization i.e. if you have three tables, let’s combine them and make it one table. That is not the way. For de-normalization, there are two great techniques (Star Schema and Snow Flake) which we can apply and makes the OLAP system much better.

In the next article, we are going to discuss Star Schema and Snow Flake Design in detail. Here, in this article, I try to explain database de-normalization in SQL Server with one simple example. I hope you enjoy this article.

Leave a Reply

Your email address will not be published. Required fields are marked *