SQL Server 2005 statistics are now built on substring character in text, ntext, char, varchar, and nvarchar columns. This greatly improves string searches using wildcard characters.
SQL Server 2000 did not always generate the best plan or choose the right index when searching LIKE predicates when the wildcard was at the beginning of the string (LIKE ‘%xyz%’). Often, these search conditions required SQL Server to use a guessing algorithm to determine if the index was useful.
The enhancements to the statistics in SQL Server 2005 provide a better estimate on index usage and will more accurately choose the best index in string wildcard search situations. To test this change, create an index on the column CarrierTrackingNumber in the AdventureWorks databases in SQL Server 2000 and SQL Server 2005.
SQL Server 2000
create index ix_CarrierTrackingNumber
on dbo.SalesOrderDetail(CarrierTrackingNumber)
SQL Server 2005
create index ix_CarrierTrackingNumber
on sales.SalesOrderDetail(CarrierTrackingNumber)
After the indexes are created, when you run DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail', CarrierTrackingNumber) in SQL Server 2005 you will note a new column titled “String Index” in the first output row. This column indicates if the statistics include a string summary of the substring frequency distribution of the characters. This is what the optimizer uses to determine if the index is meaningful and useful when developing the query plan.
Note that this string summary is limited to 80 characters. If your character column is larger than 80 characters, SQL Server will choose the first 40 and last 40 characters of data when building the string summary. If you search on substrings not represented in these 80 characters, SQL Server may not choose to use the index.
To prove out the effect on this change, when you run the following query in SQL Server 2000 and review the execution plan you will note SQL Server chooses a clustered index scan on the SalesOrderDetail table.
In SQL Server 2000 the following will produce a clustered index scan. The statistics output indicates a total of 1190 logical reads (total number of pages in my table).
SQL Server 2000
set statistics io on
select UnitPriceDiscount, CarrierTrackingNumber
from dbo.SalesOrderDetail
where CarrierTrackingNumber like '%111%'
The same query executed in SQL Server 2005 will produce an index scan on the newly created index with a total number of 460 logical reads. This is over a 60% improvement.
SQL Server 2005
set statistics io on
select UnitPriceDiscount, CarrierTrackingNumber
from Sales.SalesOrderDetail
where CarrierTrackingNumber like '%111%'