Anytime a user wants to access a large percent of table rows the query optimizer might use an access method known as a scan. Here's an example of query satisfied with a scan :
Prior to running this query I added an index to the state column as to illustrate when the optimizer might chose a seek over a scan.
CREATE NONCLUSTERED INDEX NonClus_StatePro ON Dimension.City([State Province])
Aside - If the query was in production I would consider covering by adding the required columns in an INCLUDE clause, however it would change the dynamic of this demo.
Let's see when we look at cities in Hawaii instead:
Here the optimizer decided it was more efficient to first seek the non-clustered index and then scan the table. It chose this plan because a small percentage of the table rows contain the value 'Hawaii'. The next natural question is, "What is the tipping point for a scan vs a seek with look-ups?" The correct answer is "it depends". But as a rule of thumb, when more than 1% of the rows are returned by the search criteria the optimizer shifts from seeks to scans.
An analogy can be used to further illustrate the concept of scans vs. seeks. Imagine if someone was tasked with counting the occurrences of the word "the" in a history text book. The fastest way to achieve this would be to scan every page of the book. Now imagine if someone was tasked with counting the occurrences of the word "George Washington". The fastest way to achieve this would be to seek "George Washington" in the back of the book index then find the page numbers and then look up those pages. The word "George Washington" has a lower cardinality then the word "the" thus a seek speeds up the search. Roughly the same concept for a database query.
Just to prove scans are sometimes more optimal then seeks and vice versa we will run both queries above with hints in a batch.
SELECT [City Key] ,[City] ,[State Province] FROM [WideWorldImportersDW].[Dimension].[City] WHERE [State Province] = 'Hawaii' OPTION(RECOMPILE) SELECT [City Key] ,[City] ,[State Province] FROM [WideWorldImportersDW].[Dimension].[City] WITH (FORCESCAN) --Forces SCAN access WHERE [State Province] = 'Hawaii' OPTION(RECOMPILE)
The Seek was apparently only 1/5 the cost of the batch:
I/O and time statistics tell the same story, 660 reads vs 3482 and 5 milliseconds vs 45.
Changing our search value back to Texas and changing the hint to WITH (FORCESEEK) once again confirms the optimizer's natural choice.
We will analyze how the storage engine handles the optimizer's request to perform a scan.
The storage engine has a two options to perform the scan query. It can scan the pages in allocation order or index order. Allocation order scans work by reading the data on IAM pages in file order. Index order scans data by traversing the linked list of the index leaf level. When fragmentation occurs, index order scans might suffer a performance hit because the path of the scan is no longer in order.
The storage engine will always use a allocation order scan when the table is a heap. The storage engine might use allocation order scan under the following criteria:
- the plan chooses an index scan and ordering is not required and the index is greater than 64 pages (512 KB)
- an un-ordered scan occurs on an index greater than 64 pages and the request isolation level is set to Read Uncommitted e.g. NOLOCK hint
- The query is ran with a TABLOCK hint
- the scan occurs on a read-only database or file group
Understanding the internals of scans is important from a query tuning perspective. Systems will benefit immensely when administered and developed by data pros familiar with access methods.