Have you ever said this or has a developer ever come running
to you in a panic saying what happened? How many times have you heard: “We just
promoted this small report change and now it’s taking twice as long to run”?
Probably more times than you would like to admit. When this happens, where do
you start looking for issues? The answer can likely be found in Execution
Plans.
Execution Plans can be very useful in debugging why a query
is running slowly and can give insight into what SQL Server is doing behind the
scenes to return the data. Depending on
the complexity of the query, Execution Plans will show where the data is being
pulled from, how much data is being pulled, what manipulations are being
performed, and what the cost is for each of those operations. There are many
more things that a complex Execution Plan can, and will, show; but for simple,
one table queries like the ones below, the Execution Plan is pretty straight
forward.
To start, we’ll create a small employee table like so:
CREATE TABLE Employees
(
EmployeeID INT
IDENTITY(1, 1)
,FirstName
VARCHAR(50)
,MiddleInitial
CHAR(1)
,LastName
VARCHAR(50)
,Address1
VARCHAR(50)
,Address2
VARCHAR(50)
,City VARCHAR(50)
,[State]
CHAR(2)
,Zip CHAR(10)
,CONSTRAINT
PK_Employees PRIMARY KEY
CLUSTERED (EmployeeID) WITH FILLFACTOR = 100)
And
add a small nonclustered index like so:
CREATE NONCLUSTERED INDEX
NCL_Emp_FName_LName ON Employees(FirstName, LastName) WITH FILLFACTOR = 100
Next,
I populated the table with about 500,000 employees, each with unique values.
Now, entering in 500,000 employee records isn't really feasible, so I used Red Gate’s SQL Data Generator. After the data has been
loaded, we can start looking at Execution Plans.
This
exercise assumes that we will be using SQL Server Management Studio (SSMS) to
generate and review Execution Plans and that all the statistics on the table are up to date.
There are several ways to extract and view Execution Plans, but we will stick
with the simpliest which is viewing the actual Execution Plan in SSMS after the
query executes. This can be acomplished by keying CTRL+M in the query window.
Now, after the query executes, the actual Execution Plan will be shown beside
the messages tab.
Now
that we have SSMS setup to show the Execution Plan, we can run our query that
was running well prior to the modification.
SELECT FirstName
,LastName
FROM Employees
WHERE LastName = 'Whitehead'
AND FirstName = 'Aisha'
This query returned 3 rows and
generated a nice Index Seek on the nonclustered index that we defined on the
FirstName and LastName column.
Now,
let’s say that a developer gets a request to return FirstName, LastNane, and
MiddleInitial. Using the following query, we
can return that information.
SELECT FirstName
,LastName
,MiddleInitial
FROM Employees
WHERE LastName = 'Whitehead'
AND FirstName = 'Aisha'
The
problem is now, what once was a sub-second query is now taking much longer.
Why? All the developer did is was add 1 column to the query. The answer is in
the Execution Plan.
What we see now is two added steps, a Key Lookup and Nested Loop
Join. In this instance, the Key Lookup is the operator that is causing the
issue and you can see this by looking at the actual cost listed under the
operator, which is 52% of the entire query. This is because the nonclustered
index that we created at the beginning of the project has most, but not all, of
the columns the modified query is looking for. In order the satisfy the
request, the query processor must go outside the index and back to the clustered
index in order to find MiddleInitial and return it as a part of the query. As
you can see, this is an expensive operation and one that could be averted if
only the nonclustered index included the MiddleInitial column.
For this exercise, we will add the MiddleInitial to the index as a
part of the composite index rather than adding it as an Included Column in the
index.
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[Employees]') AND name = N'NCL_Emp_FName_LName')
DROP INDEX [NCL_Emp_FName_LName] ON
[dbo].[Employees]
GO
CREATE NONCLUSTERED INDEX
[NCL_Emp_FName_LName] ON [dbo].[Employees]
(
[FirstName] ASC,
[LastName] ASC,
[MiddleInitial] ASC
)WITH
(FILLFACTOR =
100) ON
[PRIMARY]
GO
Now
when we go back and run our modified query, we will see the following Execution
Plan.
And
our query is back to running like it was before… fast.