Blogs

.NET to SQL to Project Server and everything in between
Welcome to Blogs Sign in | Join | Help
in Search

Lara's Blog

Microsoft SQL Server

SQL Server 2005 and Substring Searches

SQL Server 2005 statistics are now built on substring character in text, ntext, char, varchar, and nvarchar columns.  This greatly improves string searches using wildcard characters. 

SQL Server 2000 did not always generate the best plan or choose the right index when searching LIKE predicates when the wildcard was at the beginning of the string (LIKE ‘%xyz%’).  Often, these search conditions required SQL Server to use a guessing algorithm to determine if the index was useful.  

 

The enhancements to the statistics in SQL Server 2005 provide a better estimate on index usage and will more accurately choose the best index in string wildcard search situations.  To test this change, create an index on the column CarrierTrackingNumber in the AdventureWorks databases in SQL Server 2000 and SQL Server 2005. 

 

SQL Server 2000
create index ix_CarrierTrackingNumber
on dbo.SalesOrderDetail(CarrierTrackingNumber)

SQL Server 2005
create index ix_CarrierTrackingNumber
on sales.SalesOrderDetail(CarrierTrackingNumber)

 

After the indexes are created, when you run DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', CarrierTrackingNumber) in SQL Server 2005 you will note a new column titled “String Index” in the first output row.  This column indicates if the statistics include a string summary of the substring frequency distribution of the characters.  This is what the optimizer uses to determine if the index is meaningful and useful when developing the query plan. 

 

Note that this string summary is limited to 80 characters.  If your character column is larger than 80 characters, SQL Server will choose the first 40 and last 40 characters of data when building the string summary.  If you search on substrings not represented in these 80 characters, SQL Server may not choose to use the index.  

 

To prove out the effect on this change, when you run the following query in SQL Server 2000 and review the execution plan you will note SQL Server chooses a clustered index scan on the SalesOrderDetail table.

 

In SQL Server 2000 the following will produce a clustered index scan.  The statistics output indicates a total of 1190 logical reads (total number of pages in my table).

 

SQL Server 2000
set statistics io on
select UnitPriceDiscount, CarrierTrackingNumber
from dbo.SalesOrderDetail
where CarrierTrackingNumber like '%111%'

 

The same query executed in SQL Server 2005 will produce an index scan on the newly created index with a total number of 460 logical reads.  This is over a 60% improvement. 

 

SQL Server 2005
set statistics io on
select UnitPriceDiscount, CarrierTrackingNumber
from Sales.SalesOrderDetail
where CarrierTrackingNumber like '%111%'

Published Tuesday, March 07, 2006 6:56 PM by Lara Rubbelke

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Frank said:

Thanks for this post. Keep it up!

December 18, 2007 3:49 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server, by Telligent Systems