SQL Clustered/NonClustered Index and its use

Most of part of this article is written by : Article by Robert Sheldon
I have tried to make it easy to understand.


What is an index?

Indexes are created on columns in tables or views. The index provides a fast way to search data as your SQL SELECT queries are fired. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server looks for it in the index to quickly locate the entire row of data. Without the index, a complete table scan would have to be performed in order to locate the row, which can have a significant effect on performance.

An index is like a tree of a set of nodes that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown in Figure 1.

Figure 1: B-tree structure of a SQL Server index

When a query is issued against an indexed column, the query engine starts at the root node and navigates down through the intermediate nodes, with each layer of the intermediate level more granular than the one above. The search continues down through the index nodes until it reaches the leaf node.

Example : 
If you’re searching for the value 123 in an indexed column, the query engine would first look in the root level to determine which side(left or right) node to refer by comparing. In this example, the first page points the values 1-100, and the second page, the values 101-200, so the query engine would go to the second page on that level-2. The query engine would then determine that it must go to the third page at the next intermediate level. From there, the query engine would navigate to the leaf node for value 123. The leaf node will contain either the entire row of data or a pointer to that row, depending on whether the index is clustered or nonclustered.

Clustered Indexes

A clustered index stores the actual data rows at the leaf level of the index. Returning to the example above, that would mean that the entire row of data associated with the primary key value of 123 would be stored in that leaf node.
An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view.

Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.
Note:  A primary key column has a clustered index by default.

Nonclustered Indexes

Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row pointers that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data.

Nonclustered indexes cannot be sorted like clustered indexes; however, you can create more than one nonclustered index per table or view.
SQL Server 2005 supports up to 249 nonclustered indexes, and SQL Server 2008 support up to 999.

Index Types

In addition to an index being clustered or nonclustered, it can be configured in other ways:

  • Composite index:
      •   An index that contains more than one column.
      • In both SQL Server 2005 and 2008, you can include up to 16 columns in an index.
      • Both clustered and nonclustered indexes can be composite indexes.
  • Unique Index:
      • An index that ensures the uniqueness of each value in the indexed column.
      • If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns.
      • For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

A unique index is automatically created when you define a primary key or unique constraint:

    • Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view.
    • Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index.

  • Covering index: A type of index that includes all the columns that are needed to process a particular query. For example, your query might retrieve the FirstName and LastName columns from a table, based on a value in the ContactID column. You can create a covering index that includes all three columns.

Things to remember before creating index

As mentioned above, indexes can enhance performance because they can provide a quick way for the query engine to find data. However, you must also take into account whether and how much you’re going to be inserting, updating, and deleting data. When you modify data, the indexes must also be modified to reflect the changed data, which can significantly affect performance.

You should consider the following guidelines when planning your indexing strategy:

  • For tables that are heavily updated, use as few columns as possible in the index, and don’t over-index the tables.
  • If a table contains a lot of data but data modifications are low, use as many indexes as necessary to improve query performance.
  • For clustered indexes, try to keep the length of the indexed columns as short as possible. Ideally, try to implement your clustered indexes on unique columns that do not permit null values. This is why the primary key is often used for the table’s clustered index.
  • The uniqueness of values in a column affects index performance. In general, the more duplicate values you have in a column, the more poorly the index performs. On the other hand, the more unique each value, the better the performance. When possible, implement unique indexes.
  • For composite indexes, take into consideration the order of the columns in the index definition. Columns that will be used in comparison expressions in the WHERE clause (such as WHERE FirstName = 'Charlie') should be listed first. Subsequent columns should be listed based on the uniqueness of their values, with the most unique listed first. 
  • Try to insert or modify as many rows as possible in a single statement, rather than using multiple queries.


 

Examples
A. Use a simple index

CREATE INDEX index_name
   ON table_name (column_name)

B. Use a unique clustered index

This example creates an index on the employeeID column of the emp_pay table that enforces uniqueness. This index physically orders the data on disk because the CLUSTERED clause is specified.
CREATE TABLE emp_pay 
(    
   employeeID int NOT NULL,    
   base_pay money NOT NULL,    
   commission decimal(2, 2) NOT NULL

)
   


INSERT emp_pay VALUES (1, 500, .10)

INSERT emp_pay VALUES (2, 1000, .05)
INSERT emp_pay VALUES (3, 800, .07)
INSERT emp_pay VALUES (5, 1500, .03)
INSERT emp_pay VALUES (9, 750, .06)
GO







CREATE UNIQUE CLUSTERED INDEX employeeID_ind ON emp_pay (employeeID) GO
C. Use a simple composite index

This example creates an index on the orderID and employeeID columns of the order_emp table.
CREATE TABLE order_emp 
(
orderID int IDENTITY(1000, 1),
employeeID int NOT NULL,
orderdate datetime NOT NULL DEFAULT GETDATE(),
orderamount money NOT NULL
)

INSERT order_emp (employeeID, orderdate, orderamount) VALUES (5, '4/12/98', 315.19)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (5, '5/30/98', 1929.04)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (1, '1/03/98', 2039.82)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (1, '1/22/98', 445.29)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (4, '4/05/98', 689.39)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (7, '3/21/98', 1598.23)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (7, '3/21/98', 445.77)
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (7, '3/22/98', 2178.98)

GO

CREATE INDEX emp_order_ind ON order_emp (orderID, employeeID)