Clustered Tables Vs Heap Tables in SQL Server

This article explains the differences between Clustered Tables and Heap Tables in SQL Server.

Introduction Heap table

A heap is a table without a cluster index. Data is stored in a heap without specifying an order. We can create one or more non-Clustered Indexes on this table. Normally data is initially stored in the order in which rows are inserted into the table. The database engine can move the data around the heap to stored data efficiently so there is no guarantee of data order. When we insert data into heap tables, SQL Server tries to fill the pages as much as possible. Before inserting the records, SQL Server does not analyze the actual free space available on a page. It uses the Page Free Space (PFS) allocation map instead.

Heap Structures

The system table "sys.Partitions" contains one row for heaps with index_id=0 for each partition used by heap. A heap has a single partition by default. A heap may have multiple partitions. When a heap has multiple partitions then each partition has a heap structure that contains the data. For example if a heap has two partitions then two heap structures are present in the database, one in each partition.

Each heap structure will have one or more allocation units to store and manage the data within a partition for each data type. So each heap will have at least one IN_ROW_DATA allocation unit per partition, one LOB_DATA allocation unit per partition if it contains a LOB object, one ROW_OVERFLOW_DATA allocation unit per partition if the column length exceeds the 8060 byte row size limit.

The system view "sys.system_internals_allocation_units" has the column "first_iam_page" that points tp the first Index Allocation Map (IAM) in the chain of IAM pages. An IAM represents something.

The following figure shows how the database engine uses IAM pages to retrieve data rows from a heap (in a single partition).



When to use Heap

If we do not create a non-Clustered Index on a heap table then the entire table would be examined to find the row. This is only acceptable when the table contains only a few rows. So it is a good idea to use a heap table when the result table is tiny.

When not to use a heap

  • When the data is frequently returned in a sorted order
  • When the table is large and there is no Clustered Index created on the table
  • When the range of the data is frequently queried on the table
  • When the data is frequently grouped
Introduction to Cluster table

The name suggests itself, these tables have a Clustered Index. Data is stored in a specific order based on a Clustered Index key.

Clustered Index Structures

Indexes are organized as B-trees in SQL Server. Every page in this B-tree is called an index node. The top-most node of this tree is called the "root node" and the bottom level of the nodes is called "leaf nodes" and any index level between the root node and leaf node is called an "intermediate level". The leaf nodes contain the data pages of the table in the case of a cluster index. The root and intermediate nodes contain index pages holding an index row. Each index row contains a key value and pointer to intermediate level pages of the B-tree or leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

The system table "sys.Partitions" contains one row for Clustered Indexes with index_id=1 for each partition used by the Clustered Index. A Clustered Index has a single partition by default. A Clustered Index may have multiple partitions. When a Clustered Index has multiple partitions then each partition has a B-tree structure that contains the partition. For example if a Clustered Index has two partitions then there are two B-tree structures present in the database, one in each partition.

The following figure shows the structure of a Clustered Index in a single partition.
 
 


The same as a heap, each Clustered Index will have one or more allocation units that are used to store and manage the data within the partition. This depends on the datatype in the Clustered Index. So each Clustered Index structure will have at least one IN_ROW_DATA allocation unit per partition, one LOB_DATA allocation unit per partition if it contains a LOB object, one ROW_OVERFLOW_DATA allocation unit per partition if the column length exceeds the 8060 byte row size limit.

Heap Table Vs Cluster Table

Heap Table

Cluster Table

Data is not stored in any specific order

 

Data is stored in in order based on the Clustered Index key

Specific or filter data cannot be retrieved quickly, unless non-Clustered Indexes are created on the table

If the query uses cluster index columns, data can be retrieved faster

Data pages are independent (not linked), so sequential access needs to refer back to Index Allocation Map (IAM) pages.

Data pages are linked with each other for faster sequential access

 

No additional time is required to maintain cluster index

Required additional time to maintain cluster index

 

Not required additional space to store a Clustered Index tree

Requires additional space to store a Clustered Index tree

Sys.indexes catalog view has value 0 in column "index_id"for these tables.

Sys.indexes catalog view has value 1 in column "index_id"for these tables.

Heap is best suited when the table is tiny. For example one organization has 4 to 5 branch, so the branch table can be created as a heap table.

Clustered table can be created for all scenarios. But needs caref to select the index key.

  
Table Fragmentation

Tables becoming fragmented is a common issue with all tables. Depending upon the activity (insert, update and delete) done on the table (either heap or clustered), the table can become fragmented. Much of this depends upon the activity as well as key columns of the Clustered Index key.

If we perform only an insert operation on heap tables then our table never becomes fragmented until only the new data is written. If our Clustered Index is sequential (such as an Identity value) and we have only an insert operation then this table will not become fragmented because new data is always written at the end of the Clustered Index. If we have either a heap table or a clustered table and there are many inserts, updates and deletes then the data page becomes fragmented. This results in additional data pages to read to satisfy the queries.

With a heap table, SQL Server does not force where the new data pages are written. Whenever the new data is written that is always written at the end of the table or on the next available page that is assigned to this table. If the data is deleted then the space becomes free in the data pages, but it is never used because the new data is always written to the next available page (or end of the table). With a clustered table, depending on the Clustered Index key, new records may be written to existing pages if the free space is available on the page or it may need to split a page into multiple pages to insert the new data. If the data is deleted then there are free spaces that may be used again if data needs to be inserted into one of the existing pages. So the conclusion is that a heap table can become more fragmented than a clustered table.

Resolving the fragmentation for the clustered table can be easily done by rebuilding the Clustered Index.

Resolving the fragmentation for the Heap table is not especially easy. There are various options to resolve fragmentation. The following might be an option to resolve fragmentation in a heap table.

  • Create a Clustered Index
  • Export the data, truncate the heap table and import the data back to the table
  • Create a new table; copy all the data from the heap table into the new table
In addition, when we create a new table using Management Studio and specify a primary key for the table, Management Studio automatically creates a Clustered Index on a Primary Key but this can be overridden.

I hope this helps!