Phantom Read Problem in SQL Server
In this article, I am going to discuss the Phantom Read Problem in SQL Server with an example. Please read our previous article before proceeding to this article where we discussed the Non-Repeatable Read Concurrency Problem in SQL Server with an example. At the end of this article, you will understand what is phantom read problem and when it occurs in SQL Server as well as you will also understand how to solve the phantom read problem in SQL Server.
What is Phantom Read Problem in SQL Server?
The Phantom Read Problem happens in SQL Server when one transaction executes a query twice and it gets a different number of rows in the result set each time. This generally happens when a second transaction inserts some new rows in between the first and second query execution of the first transaction that matches the WHERE clause of the query executed by the first transaction.
Let’s understand this with an example. We are going to use the following Employees table for this example.
Please use below SQL Script to create and populate the Employees table with the required dummy data.
-- Create Employee table CREATE TABLE Employees ( Id INT PRIMARY KEY, Name VARCHAR(100), Gender VARCHAR(10) ) Go -- Insert some dummy data INSERT INTO Employees VALUES(1001,'Anurag', 'Male') INSERT INTO Employees VALUES(1002,'Priyanka', 'Female') INSERT INTO Employees VALUES(1003,'Pranaya', 'Male') INSERT INTO Employees VALUES(1004,'Hina', 'Female')
The following diagram explains the Phantom Read Concurrency problem in SQL Server:
Let say we have two transactions Transaction 1 and Transaction 2. The Transaction 1 starts first and it reads the data from the Employees table where Gender is Male and it returns 2 rows for the first read and then the Transaction 1 is doing some other work. At this point of time, Transaction 2 started and it inserts a new employee with Gender is Male. Once the Transaction 2 insert the new employee, then Transaction 1 makes a second read and it returns 3 rows, resulting in phantom read concurrency problem.
Phantom Read in SQL Server with an Example:
Open 2 instances of the SQL Server Management Studio. From the first instance execute the Transaction 1 code and from the second instance execute the Transaction 2 code. Notice that when Transaction 1 completed, it gets a different number of rows for reading 1 and reading 2, resulting in phantom read problem.
-- Transaction 1 BEGIN TRANSACTION SELECT * FROM Employees where Gender = 'Male' -- Do Some work WAITFOR DELAY '00:00:15' SELECT * FROM Employees where Gender = 'Male' COMMIT TRANSACTION -- Transaction 2 INSERT into Employees VALUES(1005, 'Sambit', 'Male')
How to solve the Phantom Read issue in SQL Server?
You can use the Serializable or Snapshot Transaction Isolation Level to solve the phantom read issue in SQL Server. Please have a look at the following table which shows which isolation level solve which concurrency problems in SQL Server.
In the above example, to fix the phantom read concurrency problem let set the transaction isolation level of Transaction 1 to serializable. The Serializable Transaction Isolation Level places a range lock on the rows returns by the transaction based on the condition. In our example, it will place a lock where Gender is Male, which prevents any other transaction from inserting new rows within that Gender. This solves the phantom read problem in SQL Server.
When you execute Transaction 1 and 2 from 2 different instances of SQL Server Management Studio. The Transaction 2 is blocked until Transaction 1 completes and at the end of Transaction 1, both the reads get the same number of rows.
-- Transaction 1 SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION SELECT * FROM Employees where Gender = 'Male' -- Do Some work WAITFOR DELAY '00:00:15' SELECT * FROM Employees where Gender = 'Male' COMMIT TRANSACTION -- Transaction 2 INSERT into Employees VALUES(1006, 'Tarun', 'Male')
What are the Differences between repeatable read and serializable isolation level?
The Repeatable Read Transaction isolation Level prevents Non-Repeatable Read, Dirty Read, and Lost Update concurrency problems. In order to do this, the Repeatable Read adds additional locks on the data read by the transaction, which ensures that once a transaction read the data, then those data will be prevented from being read, update or delete by any other transaction. But it allows to insert new data into the database, as a result, it does not prevent Phantom Read Concurrency Problem.
On the other hand, the Serializable Isolation Level prevents all sorts of concurrency problems such as Non-Repeatable Read, Dirty Read, Lost Update, and Phantom Read. In order to do this, it places a range lock on the data return by the transaction which ensures that once a transaction read the data, then no other transaction can read, update, delete or insert new data within that range.
In the next article, I am going to discuss the Snapshot isolation level in SQL Server with an example. Here, in this article, I try to explain the Phantom Read Problem in SQL Server with an example. I would like to have your feedback. Please post your feedback, question, or comments about this article.