What Is Date Correlation Optimization?

I was doing some digging and peaking around again in SQL Server and came across a database option called Date Correlation Optimization Enabled = False. Honestly, I had no clue what it did, so I took it as a learning opportunity to look into it and do a little research. Who knows, it may actually help me solve one of the many problems I run into day to day for clients.



So, What Does It Do?

According to MSDN - The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.

How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.

In English

Basically, it uses a foreign key relationship key between tables in SQL Server to enhance performance of date and date time queries when the dates fall within a certain defined range of each other (correlates). Ok that’s cool, but what’s the big deal? The power really comes in for things like reporting, validation, and data warehouses. With this option turned on, SQL Server maintains statistics between correlated columns and creates improved execution plans that read less data.

Let’s See It in Action

Consider this, all internet orders that are received have a must send out by date (due date) of 10 days after the order is received. Therefore, the OrderDate and DueDate are correlated, related to each other.

Here is a query you would normally run.

  1. SELECT * FROM InternetOrders AS i INNER JOIN InternetOrderDetail AS d ON i.InternetOrderID = d.InternetOrderID WHERE i.OrderDate BETWEEN '20170801' AND '20170901';  

Without DATE_CORRELATION_OPTIMIZATION turned on, the optimizer would create a plan just like anything else, however with it set to ON the optimizer can make more granular execution plans.

Here’s how

With each INSERT, UPDATE and DELETE between these two tables SQL Server is gathering statistics which help the optimizer infer the query to be more like the one below. This is where the power comes in. The optimizer can better narrow down the records it needs to read and therefore can return faster results.

Here is the way SQL interprets the dates now that correlation is turned on and statistics are being gathered. Based on those statistics it can now infer that each DueDate is exactly 10 days after the OrderDate.

  1. SELECT * FROM InternetOrders AS i,InternetOrderDetail AS d WHERE i.InternetOrderID = d.InternetOrderID AND i.OrderDate BETWEEN '8/1/17'  
  2. AND '8/15/17'  
  3. AND d.DueDate BETWEEN CAST('20170801'  
  4.     AS DATETIME) + 10 AND CAST('20170901'  
  5.     AS DATETIME) + 10; 

Depending on the number of records in the two tables this can be a VERY significant decrease to execution times.

The Caution

You should not enable DATE_CORRELATION_OPTIMIZATION in update-intensive database environments. SQL Server keeps all the correlation information in statistics form, this means with every INSERT, UPDATE and DELETE you gain additional overhead.

As always, be sure to test it before you use it in production.