Thursday, March 28, 2013

Hash Index in SQL Server


Most of the books and references only covers two type of indexes Clustered Index and Non-clustered indexes . But while reading an artical I come to know that we have another kind of Index too that is known as Hash Index.


We shall try to learn this using an example of Post Office address searching utitlity that allow user to search the address on the basis of postal code and vice versan.


The main challenge that these applications face is that the search pattern may be very different depending upon the criteria used by the user - users can query either on the ZIP code (which can easily be indexed) or on the area address that consists at least of the City, State and Country.

To demonstrate how hash-indexes may be of use in such a situation consider the main data source


USE tempdb;
GO
SET NOCOUNT ON;

CREATE TABLE PostalCode 
       (PostalCode NVARCHAR(6),
        PostalArea NVARCHAR(20),
        PostalCity NVARCHAR(28),
        PostalState NVARCHAR(2),
        PRIMARY KEY CLUSTERED (PostalCode, PostalArea)
);
GO


Clustered index on PostalCode and Area is sufficient for Postal Code search. 
However if the user needs to search for the PostalCode based on the Area, City & State information, the story is a bit different. A typical approach to address this challenge involves creation of a non-clustered index on the columns of interest (Area, City & State) and including the PostalCode as an included column is shown below.


--Typical Solution
CREATE NONCLUSTERED INDEX idx_PostalAreaSearchNC 
ON PostalCode (PostalArea, PostalCity, PostalState) 
INCLUDE (PostalCode);
GO


Both the clustered and the non-clustered index on the PostalCode table work well in the given scenario.

Now lets experiments for Hash Index.

Create a computed  PostalCheckSum column on the PostalArea, PostalCity and PostalState columns.

--Create a CHECKSUM column on the table
ALTER TABLE PostalCode
ADD PostalCheckSum AS CHECKSUM(PostalArea, PostalCity, PostalState);
GO


After creating Checksum column add a non-clustered index on the checksum column with the PostalCode as anincluded column.


--Use this checksum value, with the PostalCode as an included column
CREATE NONCLUSTERED INDEX idx_PostalChecksumSearchNC 
ON PostalCode (PostalCheckSum) 
INCLUDE (PostalCode);
GO


Using this new index you can query the same


SELECT * 
FROM dbo.PostalCode 
WHERE PostalCheckSum = CHECKSUM('Sector 126', 'Noida', 'UP');




A CHECKSUM column contains hash values for a given expression, and therefore, the index created on a CHECKSUM column is called a hash index.
The prime benefit of a hash index is that they reduce the amount of space required for index storage by a drastic amount.


We can see that simply using a hash index in such a simple scenario provides a reduction of about 36% [{(69.448-43.793)/69.448}*100 = 36.94%] in the space required by an index to fulfill the same requirement.
Please note that this does imply that all non-clustered indexes be converted to hash indexes. Those indexes which have a wider disk footprint resulting out of a large number of character columns included on the index are ideal candidates for hash indexes. Higher the record size of an index, the more likely is the system going to be benefited by the use of hash indexes.
Another use of Hash Indexes
Microsoft SQL Server has the maximum size of the index keys limited to 900 bytes. To create an index with a key greater than 900 bytes, use a hash index using CHECKSUM!
Please note that creating a hash index materializes the checksum column

Refereces :
This article is taken from SQLServerCentral
Post a Comment