Proper indexing can make or break a database. I've seen time and time again one little tweak significantly reduce query execution time e.g. 20 minutes to milliseconds. In this post we will take dive into a simple, yet effective indexing strategy. The demo will make use of the Word Wide Importers Data Warehouse sample database which can be downloaded from github: http://bit.ly/24KiKC6
When in doubt just remember to look for FPOC columns:
F - Filtering (Columns used in joins and the WHERE clause)
P - Partitioning (Columns used for table partitions and partitioning functions)
O- Ordering (Columns in an ORDER BY clause)
C- Covering (Any remaining column in the SELECT clause)
Here's the query will we analyze for FPOC index opportunities:
USE WideWorldImportersDW
GO
SELECT City, RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC)
FROM Dimension.City
WHERE [Sales Territory] = 'Southeast'
The current execution plan performs a clustered index scan to satisfy the query.
Based on the analysis so far we can certainly improve performance. The optimizer automatically recognizes the query might benefit from an index as evident by the green text hint. It produces the following recommendation:
CREATE NONCLUSTERED INDEX GreenHint
ON [Dimension].[City] ([Sales Territory])
INCLUDE ([City],[State Province],[Latest Recorded Population])
Generally I recommend against implementing green query hints without further analysis. The optimizer's recommendation improves performance but we can create an even better one. Let's see if we can beat the optimizer using the FPOC:
F - [Sales Territory]
P - [State Province]
O - [Latest Recorded Population]
C - City
Usually columns in the FPO portion should be defined in the key of the index. "C" is fine remaining in the INCLUDE clause. Here's what the FPOC index looks like:
CREATE NONCLUSTERED INDEX nonClus_FPOC_IX
ON Dimension.City([Sales Territory],[State Province],[Latest Recorded Population] DESC)
INCLUDE ([City])
This index uses the same columns as the optimizer suggestion only we pushed everything except city up to the key level. Now when we re-run the query we get a slightly modified plan.
Notice the Sort operation has been removed. Sorts are usually expensive operations and they scale extra-linearly. Meaning the sort will become more expensive at an increasing rate as they table grows. It's important to note the sort is performed because we use window function in the query i.e. RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC). Because the data is requested in descending order the key must defined in descending order if we intend to remove the sort operation.
We might get better query performance using a filtered index. It's important to note filtered indexes become useless for values outside the WHERE clause. Sometimes this is worth it for the performance gain.
CREATE NONCLUSTERED INDEX nonClus_FPOC_IX_Filter
ON Dimension.City([Sales Territory],[State Province],[Latest Recorded Population] DESC)
INCLUDE ([City])
WHERE [Sales Territory] = 'Southeast'
We have a smaller structure to since all rows not in the Southeast sales territory are excluded.
Now let's put our indexes to the test. We will run the query 10,000 times for each index and capture time elapsed in milliseconds. For this test I recommend discarding results after execution. This can be enabled in SMSS top menu Query > Query Options > Results > Grid > Check "Discard results after execution".
USE [WideWorldImportersDW]
GO
CREATE TABLE [dbo].[Indexlog](
[ExecutionID] [int] IDENTITY(1,1) NOT NULL,
[IndexName] [varchar](50) NULL,
[QueryDurationMS] [int] NULL
)
GO
DBCC DROPCLEANBUFFERS --Run Against cold buffers, Don't Run in Production
GO
DECLARE @@startime DATETIME
DECLARE @@endtime INT
DECLARE @@x SMALLINT = 10000
WHILE @@x > 0
BEGIN
SET @@startime = GETDATE()
SELECT City,[State Province], RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC) AS StateRank
FROM Dimension.City WITH (INDEX(PK_Dimension_City))
WHERE [Sales Territory] = 'Southeast'
SET @@endtime = DATEDIFF(MILLISECOND,@@startime,GETDATE())
INSERT INTO dbo.IndexLog
SELECT 'PK_Dimension_City',@@endtime
SET @@x = @@x-1
END
GO
DBCC DROPCLEANBUFFERS --Run Against cold buffers, Don't Run in Production
GO
DECLARE @@startime DATETIME
DECLARE @@endtime INT
DECLARE @@x SMALLINT = 10000
WHILE @@x > 0
BEGIN
SET @@startime = GETDATE()
SELECT City,[State Province], RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC) AS StateRank
FROM Dimension.City WITH (INDEX(GreenHint))
WHERE [Sales Territory] = 'Southeast'
SET @@endtime = DATEDIFF(MILLISECOND,@@startime,GETDATE())
INSERT INTO dbo.IndexLog
SELECT 'GreenHint',@@endtime
SET @@x = @@x-1
END
GO
DBCC DROPCLEANBUFFERS --Run Against cold buffers, Don't Run in Production
GO
DECLARE @@startime DATETIME
DECLARE @@endtime INT
DECLARE @@x SMALLINT = 10000
WHILE @@x > 0
BEGIN
SET @@startime = GETDATE()
SELECT City,[State Province], RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC) AS StateRank
FROM Dimension.City WITH (INDEX(nonClus_FPOC_IX))
WHERE [Sales Territory] = 'Southeast'
SET @@endtime = DATEDIFF(MILLISECOND,@@startime,GETDATE())
INSERT INTO dbo.IndexLog
SELECT 'nonClus_FPOC_IX',@@endtime
SET @@x = @@x-1
END
GO
DBCC DROPCLEANBUFFERS --Run Against cold buffers, Don't Run in Production
GO
DECLARE @@startime DATETIME
DECLARE @@endtime INT
DECLARE @@x SMALLINT = 10000
WHILE @@x > 0
BEGIN
SET @@startime = GETDATE()
SELECT City,[State Province], RANK() OVER(PARTITION BY [State Province] ORDER BY [Latest Recorded Population] DESC) AS StateRank
FROM Dimension.City WITH (INDEX(nonClus_FPOC_IX_Filter))
WHERE [Sales Territory] = 'Southeast'
SET @@endtime = DATEDIFF(MILLISECOND,@@startime,GETDATE())
INSERT INTO dbo.IndexLog
SELECT 'nonClus_FPOC_IX_Filter',@@endtime
SET @@x = @@x-1
END
The FPOC indexes outperformed the out of box table structure and the green hint put forward by the optimizer. Adding a filter to the FPOC barely improved performance. Because the of the filtered index restrictions we should proceed with adding the unfiltered FPOC.
Having a good index strategy has the potential to speed up query execution as we saw above. This lead to faster applications/ analysis and happy end users. As a closing remark it's important to point a few index design considerations:
- Always check existing indexes. SQL server will not prevent you from defining a duplicate index.
- Non-clustered indexes can add overhead to inserts and updates. It might hurt system performance putting an index on table with frequent updates and infrequent reads.
- Indexes take up disk space. Developers should balance performance with resource consumption during the design phase. Does it make sense to have a 100GB base table with 400GB in Non-clustered indexes? Possibly, but in that scenario there's probably an opportunity to consolidate.
Comments