Change Data Capture in SQL Server
In this article, I am going to discuss what exactly Change Data Capture in SQL Server is and its need. Please read our previous article where we discussed How SQL Server Store and Manages Data Internally in detail.
Why we need Change Data Capture (CDC) in SQL Server?
Many times in projects, you may like to maintain audit drills for important tables. Let us understand what it means with an example. First, create a database (TestDB) and a table (Employee) by executing the following SQL Scripts.
CREATE DATABASE TestDB GO USE TestDB GO CREATE TABLE Employee ( ID INT, Name VARCHAR(50), Salary Money )
Once you create the above database and table, let us now insert some data into the Employee table by executing the below Insert SQL Statements.
INSERT INTO Employee VALUES (1, 'James', 100000) INSERT INTO Employee VALUES (2, 'David', 100000) INSERT INTO Employee VALUES (3, 'Sara', 100000)
As you can see, we have inserted three records into the Employee table. Let say, when someone changes the Employee information (for example change name from David to Smith whose employee id is 2). If we would like to maintain some kind of audit drill table where we want to keep both the old values as well as new values then we need to use Change Data Capture (CDC) in SQL Server.
Note: CDC helps us to keep track of Insert, Update, and Delete activity on table data.
How to use CDC in SQL Server?
Enabling CDC in SQL Server is a two-step process. They are as follows:
Step1: First we need to enable CDC on the Database Level
Step2: Then we need to define specific tables in which we need to enable CDC.
Enable CDC on Database Level:
In order to enable CDC on the database level, you need to execute the following system stored procedure.
Once you execute the above-stored procedure, in the message window you will get the message as Commands completed successfully which means now CDC is enabled at the database level. Once the CDC is enabled, it creates a couple of tables by himself in the System Tables folder as shown below.
What all these tables are that we will discuss later part of this article. Along with these tables, it also creates a large number of stored procedures which you can find in the System Stored Procedures folder as shown below.
Note: These stored procedures and tables are basically used to manage the CDC functionality properly. Once we enable the CDC at the database (TestDB) level, the next step is to enable the CDC at table (Employee) level.
Enable CDC at Table Level:
We created the Employee table on the TestDB database. So, let us enable CDC on this Employee table. In order to enable CDC at Table level, we need to execute the system stored procedure sp_cdc_enable_table as shown below. Here, we are providing the schema, table name, and roll to the stored procedure.
@source_schema = ‘dbo’,
@source_name = ‘Employee’,
@role_name = null
In order to make the CDC work, the SQL Server Agent service should be in the run. If SQL Server Agent is not enabled, then you will get the following message when you execute the above-stored procedure.
How to Start the SQL Server Agent Service?
We have discussed SQL Server Agent in detail in our Job Scheduling article. So, please read that article to know SQL Server Agent in detail. In order to enable SQL Server Agent, go to object explorer in SSMS, and then right-click on the SQL Server Agent and click on the Start option from the context menu as shown below.
Once you click on the Start option, it will prompt you one window asking whether you want to start the SQL Server Agent or not. Simply, click on the Yes button as shown below.
Once you click on the Yes button, the SQL Server Agent service is run. Once the SQL Server Agent is enabled now go the System Tables folder and you will find one more table i.e. dbo_Employee_CT as shown below.
Note: _CT tables are those tables that actually capturers the Insert, Update, and Delete activity for a particular table. For example, the dbo_Employee_CT table will capture all the Insert, Update, and Delete activity performed on the Employee table. If you have another table called Department, and if you enable CDC for Department table, then you will have a table called dbo_Department_CT to capture all the DML activity performed on the Department table.
Example: Updating Employee table data
Let us modify some data in the Employee and then try to understand what happens in the dbo_Employee_CT table. Please update the Employee name whose Id is 2 by executing the below SQL Script.
UPDATE Employee SET Name = ‘Smith’ WHERE ID = 2
Once you execute the above statement, now have a look at the dbo_Employee_CT table as shown below. Just focus on the Name and salary column value. Here, the Old value is David and the new value for the Name field is Smith. But the Salary column value is unchanged as we did not change this value.
Example: Deleting From Employee table
Let us delete one record from the Employee and see what happens in the dbo_Employee_CT table. Please execute the below SQL statement to delete the Employee whose ID is 3.
DELETE FROM Employee WHERE ID = 3
Once you delete the Employee, please have a look at the dbo_Employee_CT table as shown below. As you can see it makes one entry for the deleted employee.
Example: Inserting into the Employee table
Let us insert one record into the Employee and see what happens in the dbo_Employee_CT table. Please execute the below Insert statement to insert one record into the Employee table.
INSERT INTO Employee VALUES (4, ‘John’, 3300000)
Once you insert one record into the Employee table, please have a look at the dbo_Employee_CT table as shown below. As you can see it makes one entry for the newly inserted employee.
How to identify which row is used for what type of operations (Insert, Delete, or Update)?
We can easily identify this based on the Operation column value of the _CT table. Please have a look at the below image.
The operation column has four possible values (1, 2, 3, and 4) and let’s see the meaning of each possible value.
- Delete Statement = 1
- Insert Statement = 2
- Value before update statement = 3
- Value after update statement = 4
Understanding some of the important CDC tables:
Let us understand some of the important tables that are created by the CDC in the System Tables folder.
This table keeps track of the column’s information. Please have a look at the records stored in this table for better understanding. You will find the following three records, each record represents one column i.e. the name of the data type of the column, etc.
This table basically keeps track of which tables are enabled for CDC. Now, if you look at the data in this table, then you will find one record which states that currently CTC is enabled for Employee table as shown below.
In the next article, I am going to discuss how to implement PIVOT and UNPIVOT operators in SQL Server with examples. Here, in this article, I try to explain Change Data Capture in SQL Server with one example. I hope now you understood what exactly CDC is and what its use in SQL Server.