The Next Interview Question - Identity Vs Newsequentialid Vs Newid

Tural Suleymani
2y
7k
0
1

Article

T-SQL allows us to use different functions as “a marker” to make our rows unique. The most used functionalities are:

IDENTITY() -> type : Int(tinyint, bigint etc)
NewId() -> type : UniqueIdentifier
NewSequencialID() -> type : UniqueIdentifier
Custom-defined solutions/algorithms
Sequences

In most simple cases, we often use IDENTITY to create simple tables. This allows us to create uniqueness in rows easily.

IDENTITY is one of the most efficient ways to perform inserting performance. It helps us to insert a row at the end of the table, and doesn’t “split” the table when it comes to pages. These two advantages provide the best insert performance in tables. But, unfortunately, IDENTITY “fails” when you need to do a “distributed” insertion.

Say you have different instances of the same application, and they replicate the database/table. So you have the below diagram:

In the above case, unfortunately, every instance/branch will insert the same/duplicate identity key to the table clones, and when syncing/merging it will fail.

Brief for Identity

Pros

Insertion speed/performance is high
Has a sequential insert
Doesn’t split your pages.

Cons

Doesn’t support distributed uniqueness.

NewId() provides different solution for this problem. Using it, we can “bypass” the “distributive uniqueness” problem. But, this function brings another problem into the solution: now we have “page splitting” and insertion performance. Let me explain this in more detail.

When you use identity, it has a number where for the next row, without any effort, the system does n+1 and creates a sequence. But NewId() does not possess the same orderly insert benefits that identity columns do when used as the clustering key.

Long story short, NewId() is a fast insertion, but as data is split between pages, it has slow reading (for only split rows).

The insert algorithm changes from sequential to random access. GUIDs require more Input-Output to locate the insert location in the clustered index.

Fragmentation at the leaf level leads to less dense pages. It paves the way for defragmentation/reindexing.

NewId() is also interesting from the “outside of the box” use. This means that when creating any application for our database, we can directly generate GUIDs on the application side and insert them into our table.

Brief for NewId()

Pros

User can insert data from outside of SQL (API, app, etc)
Supports distributed uniqueness.

Cons

Insertion speed is slow
Doesn’t provide sequential generation
Defragmentation/reindexing
Splits your pages.

NewSequentialID() takes care of both “problems” that IDENTITY and NewId() create. It makes fast insertion (70-90% same as IDENTITY) and doesn’t allow the system to split pages. When it comes to page splitting, NewSequentialId() is smart enough not to split rows between pages. But, as you know, nothing is free. NewSequentialID() has fast reading, but when inserting rows from multiple sessions, unfortunately, there is a “delay” for analyzing and inserting it to a new page.

We can’t use NewSequentialId() like we can NewId(). You can’t call it directly. You need to assign it to a column as a DEFAULT.

NewSequencialId() supports bit shifting. It generates sequential GUID, which degreases IO.

Brief for NewSequentialID()

Pros

Insertion speed/performance is high
Has a sequential insert
Doesn’t split pages
Fixes distributed uniqueness.

Const

When inserting from multiple sessions, it has a “short-delay”
Can’t be used as in arbitrary T-SQL statements.

In the end, let's create three different tables and test the insertion performance of each.

-- IDENTITY
CREATE TABLE DemoForIdentity (
   id int identity(1,1) not null primary key clustered,
   number int not null,
   def_data char(250) not null default 'simple default data');
GO

-- NEWID
CREATE table DemoForIdentityNewId (
   id uniqueidentifier default newid() not null primary key clustered,
    number int not null,
   def_data char(250) not null default 'simple default data');
GO

-- NEWSEQUENTIALID
CREATE table DemoForIdentityNewSequentialId (
   id uniqueidentifier default newsequentialid() not null primary key clustered,
   number int not null,
   def_data char(250) not null default 'simple default data');
 GO

After execution, apply the below code for every table. I used "DemoForIdentity" just for demo purposes.

-- Inserting 70.000 rows.

DECLARE @counter as int = 0;

while @counter < 70000 begin
   insert DemoForIdentity (number)
   values (@counter);

   set @counter = @counter + 1;
end;
GO
WITH CTE AS
(
SELECT reads, writes
FROM sys.dm_exec_sessions
WHERE session_id = @@spid )
SELECT reads, writes,NULL as index_type_desc, NULL as index_depth,NULL as page_count,NULL as avg_page_space_used_in_percent
,NULL as avg_fragmentation_in_percent,NULL as record_count FROM CTE
UNION ALL
select NULL,NULL, index_type_desc, index_depth, page_count, avg_page_space_used_in_percent,
avg_fragmentation_in_percent, record_count
from sys.dm_db_index_physical_stats(db_id(), object_id('DemoForIdentity'), null, null, 'detailed')
where index_level = 0;

Here are the results for each table-based execution:

Response for Identity

Response for NewId()

Response for NewSequentialId()