Database Normalization in SQL Server

Database Normalization in SQL Server

In this article, I am going to discuss Database Normalization in SQL Server with real-time examples. As part of this article, I am going to discuss the following pointers.

  1. Understanding the First, Second, and Third normal forms.
  2. What are the different Database Design Problems developers face?
  3. Understanding Duplicate and Redundancy and how both are different.
  4. Understanding the Atomic and repeating group problem.
  5. What are partial dependency and Transitional dependency Design Issues?
  6. How to Design many to many relationships?

Here we are going to discuss what are the different design problems database developers face and how we can solve them using the first, second, and third normal forms in SQL Server.

Database Normalization in SQL Server

The database is a centralized place to store the data and it should not accept duplicate, bad, and redundant data to store in it. So, that the end-user can trust the data i.e. the Data Integrity should be there. In order to achieve this, the first and most important thing is how you design your database? What kind of database design principles you are following?

Before going to understand the first, second, and third normal forms, let us first understand what are the basic design mistakes database developers normally does. Once you understand the design mistakes, then you can easily understand the first, second and third normal forms.

Design Mistake 1: No Proper Primary and Unique Key

The first database design mistake the developers are doing is they don’t put the proper primary key and unique keys or candidate keys. In order to understand this, please have a look at the following customer table.

CREATE TABLE Customer(
 CustomerId INT IDENTITY (1, 1),
 Name VARCHAR(200),
 Code VARCHAR(200),
 PhoneNumber VARCHAR(200),
 SSN VARCHAR(200),
 Product VARCHAR(200)
)

The customer Code and SSN (Social Security Number) value should be unique for a customer. It should now accept any duplicate values. But here, as you can see, we have not applied any key (primary or unique) on these two columns, so it accepts duplicate values in it as shown in the following image.

Design Mistake 1: No Proper Primary and Unique Key

Now, let us solve the above problem by adding a unique key on Code and SSN column. But before that, we need to truncate the table. So, please execute the following SQL Script.

TRUNCATE TABLE Customer;

ALTER TABLE Customer  
ADD CONSTRAINT UK_Customer_Code UNIQUE (Code);  

ALTER TABLE Customer  
ADD CONSTRAINT UK_Customer_SSN UNIQUE (SSN);

Now, with the above design changes, it will not accept duplicate values in Code and SSN column and when you try to insert duplicate data, it will give you the following error by saying that you are violating the uniqueness of customer code.

Adding primary key and unique constraint in SQL Server

Design Mistake 2: Multiple values into a single column

The second design mistake people do is, they are adding multiple values into a single column. For example, let say a customer buys multiple products, then what people are doing is, they are adding all products in a single column separated them either by comma or pipe symbol as shown below.

Design Mistake 2: Multiple values into a single column

As you can see in the Product column we are adding three values. We need to understand that, it’s a column, not a whole. It should have an atomic value. The problem is that you can put duplicate data.

In other to solve the above problem, what people are doing is, they are adding three columns to the table as shown below.

Database Normalization Multiple values into a single column

And then insert each product into the respective table as shown below.

Database Normalization in SQL Server Multiple values into a single column

This is not a good design approach. The problem here is redundant data. In our example, the Product1, Product2, and Product3 are redundant columns. In the first record, there are three products and in the second record, there are 2 products but the Product3 column also exists and it takes Null value which is nothing but redundant value. Because of our bad database design, we came into redundant problems. This problem also technically termed as repeating group problems.

Design Mistake 3: Repeating Group problems

Repeating group problems means we are creating columns that are exactly the same (Product1, Product2, and Product3). Why these three columns are existing, now one knows.

Again, tomorrow, if you want to add the fourth product, then again you need to add one more column into the table.

In order to solve the above repeating group problem what we need to do is we need to create another table called Product i.e. to resolve this problem we need to split the table. So, what we are going to do is, we are going to create two tables, one table is going to hold Customer data and the other table is going to hold product information.

Creating the Product table:

Here we are creating the Product table with some more columns. Why I am adding additional columns that I will discuss later. Please execute the below SQL Script to create the Product table. Please note, here we created a ProductId as primary key and ModelId as a unique key.

CREATE TABLE Product(
        [ProductId] [int] PRIMARY KEY,
 [ModelId] [int] UNIQUE,
 [ProductName] [nvarchar](50) NULL,
 [ProductCost] [money] NULL,	
 [ModelName] [nvarchar](50) NULL,
 [ManufacturerName] [nvarchar](50) NULL
);
Modifying the Customer table:

From the Customer table, we are going to remove the Product1, Product2, and Product3 columns. Then we will add ProductId and ModelId as foreign key columns which will reference the ProductId and ModelId column of the Product table. To keep things simple, first, we will delete the Customer table and then we will create the customer table with new columns. To do so, please execute the following SQL Script.

DROP TABLE Customer;

CREATE TABLE [Customer](
 [CustomerId] [int] PRIMARY KEY,
 [Name] [varchar](200) NULL,
 [Code] [varchar](200) NULL,
 [PhoneNumber] [varchar](200) NULL,
 [SSN] [varchar](200) NULL,
 [ProductId] INT FOREIGN KEY REFERENCES Product(ProductId),
 [ModelId] INT FOREIGN KEY REFERENCES Product(ModelId)
)

With the above change now, you can insert data into the Product table and then you can insert data into the Customer table as shown below.

Product Table:

Design Mistake 3: Repeating Group problems in SQL Server

Customer Table:

Now you can insert the ProductId and ModelId values into the ProductId and ModelId column of Customer table as shown below.

Database Normalization Repeating Group problems in SQL Server

That’s fine. There is no redundancy data or column. But this design is not what we are expecting. One customer can buy multiple products. But putting the ProductId and ModelId column here means one customer can buy only one product. That is one to one relationship.

So, this design will not fit here. Here, we need to introduce an intermediate table which has reference to both Product (ProductId and ModelId) and Customer (CustomerId) table. That means we need to create a table that has many to many relationships between Customer and Product table i.e. one customer can buy multiple products and multiple products can be bought by multiple customers.

First, delete the ProductId and ModelId column from the Customer table. Once you delete those two columns, your Customer table should look as below.

Database Normalization First Normal Form

Creating ProductCustomerMapping table:

Please execute the below script to create the ProductCustomerMapping table. This table maintains a many to many relationships between the Customer and Product table. It has a reference to the CustomerId column of the Customer table as well as reference to the ProductId and ModelId column of the Product table.

CREATE TABLE [ProductCustomerMapping](
 [ProductCustomerId] INT PRIMARY KEY,
 [CustomerId] INT FOREIGN KEY REFERENCES Customer(CustomerId),
 [ModelId] INT FOREIGN KEY REFERENCES Product(ModelId),
        [ProductId] INT FOREIGN KEY REFERENCES Product(ProductId)
);

Now, you can insert the data into the ProductCustomerMapping table as expected. For better understanding please have a look at the following image.

First Normal Form in Database Normalization

First Normal Form in Database Normalization in SQL Server:
  1. Design Problem 1: Duplicate Data due to no proper primary key and unique keys.
  2. Design Problem 2: One column containing multi-value data. It should atomic value. Atomic value means it should not be divided further.
  3. Design Problem 3: Repeating Groups or redundant columns.

When the above design problems are solved or you can say the above rules are satisfied, that means your database is in First Normal Form.

Design Problem 4: Non-Key columns are fully dependent on the primary key

The fourth design problem that lots of developers do is, they make non-key columns (columns without primary key) in the tables which are not dependent on the primary key column. In order to understand this better, please modify the Product table as shown below. Here, we are adding Country and City columns to the table.

Non-Key columns are fully dependent on the primary key

Now modify the data of the Product table as shown below.

Non-Key columns are fully dependent on the primary key in SQL Server

If you look at the Product table, the Product Name and Product cost columns are related to the ProductId which is the primary key. Model name field is dependent on the ModelId column. And Model has nothing to do with the Product. Along the same line, the non-key column Country and Area has a relation with the ProductId column.

This is a bad design because when we say a table, a table is a logical unit. If you are putting fields which are not related to the unit then you have problems. Tomorrow if you have to add or remove fields then it becomes very difficult to change the structure.

So, we should move the non-key columns which are not related to the unit to separate tables. We need to do lots of changes. Let do one by one.

Step1: Creating Model table:

Please execute the below SQL Query which will create the Model table. The Model table will contain only the model-related information.

CREATE TABLE [Model](
 [ModelId] [int] PRIMARY KEY,
 [ModelName] [nvarchar](50) NULL,
 [ManufacturerName] [nvarchar](50) NULL,
 [Country] [nvarchar](50) NULL,
 [City] [nvarchar](50) NULL
);

Once you create the Model table, and then please fill the table with the following data.

Second Normal Form in Database Normalization

Step2: Modifying the Product table.

From the Product table, delete all the columns except the ProductId, ProductName and ProductCost column. Once you delete the columns, the Product table structure should looks as shown below.

Second Normal Form in SQL Server

Now the Product table will contain only the Product related information. The data in the Product table is shown below.

Database Normalization

Note: We need to design the database in such a way that, all the non-key columns should fully dependent on the primary key column. If you are not following this rule means one table having the data of many entities.

Step3: Modifying the ProductCustomerMapping table.

To make the thing simple, just delete and create the ProductCustomerMapping table. The only change is that the ModelId column now references to the ModelId column of the Model table.

DROP TABLE ProductCustomerMapping;

CREATE TABLE [ProductCustomerMapping](
 [ProductCustomerId] INT PRIMARY KEY,
 [CustomerId] INT FOREIGN KEY REFERENCES Customer(CustomerId),
 [ModelId] INT FOREIGN KEY REFERENCES Model(ModelId),
    [ProductId] INT FOREIGN KEY REFERENCES Product(ProductId)
);

Once you modified the ProductCustomerMapping table then please fill the table with the following data.

Database Normalization Second Normal Form

Now, with the above database design, all the non-key columns are fully dependent on the primary key column.

Second Normal Form in Database Normalization in SQL Server:

Must satisfy the first normal form:

  1. Design Problem 1: Duplicate Data due to no proper primary key and unique keys.
  2. Design Problem 2: One column containing multi-value data. It should atomic value. Atomic value means it should not be divided further.
  3. Design Problem 3: Repeating Groups or redundant columns.

As well as the Design Problem 4: Non-Key columns are fully dependent on Primary key

When the above four design problems are solved or you can say when the database design satisfied the above four rules, it means your database is in Second Normal Form.

Design Problem 5: Transitive Dependencies

If you look at the Model table, then the City column value depends on the Country column value, not on the ModelId column. Putting both the values in the Model column is a bad design. So, what we need to do here is, we need to move these transitive dependencies to other tables.

Step1: Creating the Country table

Please execute the below SQL Script to create the Country table with two columns. Here, we marked the CountryId as the primary key.

CREATE TABLE [Country](
 [CountryId] [int] Primary Key,
 [CountryName] [nvarchar](50)
);

Fill the country with the following data.

Third Normal Form

Step2: Creating City table

Please execute the below SQL Script to create the City table with three columns. Here, we marked CityId as the primary key and CountryId is the foreign key reference to the CountryId column of the Country table.

CREATE TABLE [City](
 [CityId] [int] Primary Key,
 [CityName] [nvarchar](50),
 [CountryId] INT FOREIGN KEY REFERENCES Country(CountryId) 
);

Fill the City table with the following data.

Third Normal Form in Database Normalization

Step3: Modifying the Model table structure

From the Model table, we need to delete the Country and City column. Then we need to add CountryId column which will be a foreign key referencing to the CountryId column of the Country table. In order to do so, please execute the below script.

ALTER TABLE Model DROP COLUMN Country;
ALTER TABLE Model DROP COLUMN City;

ALTER TABLE Model ADD CountryId INT;
ALTER TABLE Model
ADD FOREIGN KEY (CountryId) REFERENCES Country(CountryId);

With the above changes now, modify the data in the Model table as shown below.

Third Normal Form in Database Normalization SQL Server

Third Normal Form in Database Normalization in SQL Server:

The Third Normal Form says that there should not be any transitive data and it should follow the first and second normal forms.

So, Normalization is all about splitting your table into multiple tables or units. In order to remember, the first, second and third normal form, please have a look at the following diagram which shows a summary of what we discussed in this article.

Database Normalization in SQL Server

In the next article, I am going to discuss Database De-normalization in detail. Here, in this article, I try to explain Database Normalization in SQL Server with one real-time example. I hope you enjoy this article. If you have any questions or queries about this article, then please feel free to ask me in the comment section.

Leave a Reply

Your email address will not be published. Required fields are marked *