Binary_Checksum the Alternate of NewID() to Fetch Random Records in SQL Server

Binary_Checksum is the Alternate of NewID() to Fetch Random Records in the SQL Server

Sometimes it is required to fetch random records from a database. The simplest approach is to use NewID() with the order by clause. Well, I have no problem with that, but when you have a large table to query, it'll start showing issues with performance. Why? I'll try to answer this question with an alternative solution.

First, I have the Adventure works database with around 260000 records in a table and a query that will take a random record from it. 

SELECT TOP 1 [BusinessEntityID] FROM HumanResources.EmployeePayHistory
    ORDERBY NEWID()

The problem with this query is when you look at the execution plan; you'll find it performs a sort operation before it picks a top record.

Binary1.gif

This will happen if you use NewID() to pick the random records. It first sorts the records based on NewID(), the worst case for sorting based on GUID values. So 91% of the cost is due to the sort operation.

Now to get rid of the Sort operation of NEWID() by BINARY_CHECKSUM

The best approach is to use the BinaryCheckSum instead of NewID to reduce the sort operation to perform on the NEWID column generated in Memory.

SELECT TOP 1 [BusinessEntityID]
FROM HumanResources.EmployeePayHistory
WHERE (ABS(CAST(
(BINARY_CHECKSUM(*) *
RAND()) as int)) % 100) < 10

The BINARY_CHECKSUM function generates a checksum value based on the values of the columns you specify. If two rows are different, they typically will generate other checksum numbers. The BINARY_CHECKSUM function generally is used to verify whether any of the columns in a row in a table have changed. However, for our purposes, it generates a number that looks like a random number for each row.

Another form of using BinaryCheckSum can be

SELECT TOP 1 [BusinessEntityID]

FROM HumanResources.EmployeePayHistory
WHERE
(ABS(CAST(
(BINARY_CHECKSUM
([BusinessEntityID], NEWID())) as int))
% 100) < 10

Both forms of Binary_Cheksum will have the same query Execution plan with a similar cost.

Now let's look at the Figures analyzed by the Microsoft team to compare the performance of both queries.   

13 million rows          
NEWID query 253 347,420 13,810,132 14,157,552 422,891
BINARY_CHECKSUM query 21 347,420 0 347,420 49,203

And in our case here, I've also done some comparisons based on Query plans of the Old query of NEWID and the new query with BINARY_CHECKSUM.

Binary2.gif

This is another tip for improving performance, but if you have to pick random records from small SQL tables, you can still use NEWID() with no issues.

Let me know your feedback. Please make comments if you like to.


Similar Articles