Optimizing LINQ Queries In C#.NET For MS SQL Server

Introduction

 
LINQ was added to .NET as a new powerful data manipulation language. LINQ to SQL allows you to talk with DBMS quite conveniently by using Entity Framework, for example. But often when using it developers forget to consider what kind of SQL query will be generated by the queryable provider (Entity Framework in our example case). In this article, we’ll look at how exactly you can optimize the performance of LINQ queries.
 

Implementation 

 
Let’s examine the two most important points by using an example.
 
First, we’ll need to create the Test database in SQL Server. In this database, we’ll create two tables by running the following query,
  1. USE [TEST]  
  2. GO  
  3.   
  4. SET ANSI_NULLS ON  
  5. GO  
  6.   
  7. SET QUOTED_IDENTIFIER ON  
  8. GO  
  9.   
  10. CREATE TABLE [dbo].[Ref](  
  11. [ID] [intNOT NULL,  
  12. [ID2] [intNOT NULL,  
  13. [Name] [nvarchar](255) NOT NULL,  
  14. [InsertUTCDate] [datetime] NOT NULL,  
  15.  CONSTRAINT [PK_Ref] PRIMARY KEY CLUSTERED   
  16. (  
  17. [ID] ASC  
  18. )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ONON [PRIMARY]  
  19. ON [PRIMARY]  
  20. GO  
  21.   
  22. ALTER TABLE [dbo].[Ref] ADD  CONSTRAINT [DF_Ref_InsertUTCDate]  DEFAULT (getutcdate()) FOR [InsertUTCDate]  
  23. GO  
  24.   
  25. USE [TEST]  
  26. GO  
  27.   
  28. SET ANSI_NULLS ON  
  29. GO  
  30.   
  31. SET QUOTED_IDENTIFIER ON  
  32. GO  
  33.   
  34. CREATE TABLE [dbo].[Customer](  
  35. [ID] [intNOT NULL,  
  36. [Name] [nvarchar](255) NOT NULL,  
  37. [Ref_ID] [intNOT NULL,  
  38. [InsertUTCDate] [datetime] NOT NULL,  
  39. [Ref_ID2] [intNOT NULL,  
  40.  CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED   
  41. (  
  42. [ID] ASC  
  43. )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ONON [PRIMARY]  
  44. ON [PRIMARY]  
  45. GO  
  46.   
  47. ALTER TABLE [dbo].[Customer] ADD  CONSTRAINT [DF_Customer_Ref_ID] DEFAULT ((0)) FOR [Ref_ID]  
  48. GO  
  49.   
  50. ALTER TABLE [dbo].[Customer] ADD  CONSTRAINT [DF_Customer_InsertUTCDate]  DEFAULT (getutcdate()) FOR [InsertUTCDate]  
  51. GO  
Now, let’s fill the Ref table with the help of the following script. Wait... I just ran a script, but didn’t save it. In such cases, SQL Complete by Devart can be very convenient. It integrates with SSMS and Visual Studio and has the Execution History feature,
 
Fig.1 – Execution History feature
 
This functionality displays the history of queries that were executed in SSMS,
 
 
Fig.2 – History of queries
 
Take note how the window consists of the following elements:
  1. Search box for filtering the results
  2. Date range box for filtering the results
  3. Results presented in a table. You can sort data by this table’s columns (by using the SHIFT key, you can choose a set of columns for sorting)
  4. Code of the selected row
The result table contains a history of scripts executed in SSMS and comprises the following tables,
  1. Status – shows whether the script was executed successfully
  2. Query Text – code of the script
  3. Size (Bytes) – size of the script in bytes
  4. Executed On – date and time at which the script was executed
  5. Duration – how long it took for the script to be executed
  6. File – name of a file or a tab in SSMS followed by the name of the SQL Server instance on which the script was executed
  7. Server - name of the SQL Server instance on which the script was executed
  8. User – the login under which the script was executed
  9. Database – the database context in which the script was executed
We can find the required query in this history table,
  1. USE [TEST]  
  2. GO  
  3.   
  4. DECLARE @ind INT=1;  
  5.   
  6. WHILE(@ind<1200000)  
  7. BEGIN  
  8. INSERT INTO [dbo].[Ref]  
  9.            ([ID]  
  10.            ,[ID2]  
  11.            ,[Name])  
  12.     SELECT  
  13.            @ind  
  14.            ,@ind  
  15.            ,CAST(@ind AS NVARCHAR(255));  
  16.   
  17. SET @ind=@ind+1;  
  18. END   
  19. GO  
 In a similar way, we can populate the Customer table by using the following script,
  1. USE [TEST]  
  2. GO  
  3.   
  4. DECLARE @ind INT=1;  
  5. DECLARE @ind_ref INT=1;  
  6.   
  7. WHILE(@ind<=12000000)  
  8. BEGIN  
  9. IF(@ind%3=0) SET @ind_ref=1;  
  10. ELSE IF (@ind%5=0) SET @ind_ref=2;  
  11. ELSE IF (@ind%7=0) SET @ind_ref=3;  
  12. ELSE IF (@ind%11=0) SET @ind_ref=4;  
  13. ELSE IF (@ind%13=0) SET @ind_ref=5;  
  14. ELSE IF (@ind%17=0) SET @ind_ref=6;  
  15. ELSE IF (@ind%19=0) SET @ind_ref=7;  
  16. ELSE IF (@ind%23=0) SET @ind_ref=8;  
  17. ELSE IF (@ind%29=0) SET @ind_ref=9;  
  18. ELSE IF (@ind%31=0) SET @ind_ref=10;  
  19. ELSE IF (@ind%37=0) SET @ind_ref=11;  
  20. ELSE SET @ind_ref=@ind%1190000;  
  21. INSERT INTO [dbo].[Customer]  
  22.           ([ID]  
  23.           ,[Name]  
  24.           ,[Ref_ID]  
  25.           ,[Ref_ID2])  
  26.     SELECT  
  27.           @ind,  
  28.           CAST(@ind AS NVARCHAR(255)),  
  29.           @ind_ref,  
  30.           @ind_ref;  
  31.   
  32. SET @ind=@ind+1;  
  33. END  
  34. GO  
The SQL Complete tool can help with keeping the code of your scripts neatly formatted.
 
In this way, we created two tables – one of them has over 1 million rows, and the other has more then 10 million.
 
Now we need to create a test project in Visual Studio. This one will be a Visual C# Console App (.NET Framework):
 
Next, we’ll need to add a library for the Entity Framework so that we can interact with the database.
 
To add this library, right-click the project and choose ‘Manage NuGet Packages...’ in the context menu,
 
Fig.3 - ‘Manage NuGet Packages...’ in the context menu
 
In the window that will be opened, let’s enter ‘Entity Framework’ in the search box, choose the Entity Framework package, and install it:
 
Fig.4 - Entity Framework package installation
 
Next, in the App.config file, we’ll add the following block after the configSections element:
  1. <connectionStrings>  
  2.     <add name="DBConnection" connectionString="data source=MSSQL_INSTANCE_NAME;Initial Catalog=TEST;Integrated Security=True;" providerName="System.Data.SqlClient" />  
  3. </connectionStrings>  
Make sure the connection string is entered in connectionString.
 
Now, let’s create 3 interfaces in separate files,
 
IBaseEntityID
  1. namespace TestLINQ    
  2. {    
  3.     public interface IBaseEntityID    
  4.     {    
  5.         int ID { getset; }    
  6.     }    
  7. }
IBaseEntityName
  1. namespace TestLINQ      
  2. {      
  3.     public interface IBaseEntityName      
  4.     {      
  5.         string Name { getset; }      
  6.     }      
  7. }      
IBaseNameInsertUTCDate
  1. namespace TestLINQ        
  2. {        
  3.     public interface IBaseNameInsertUTCDate        
  4.     {        
  5.         DateTime InsertUTCDate { getset; }        
  6.     }        
  7. }    
In a separate file, create a basic class BaseEntity for our two entities, which will contain their common fields,
  1. namespace TestLINQ    
  2. {    
  3.     public class BaseEntity : IBaseEntityID, IBaseEntityName, IBaseNameInsertUTCDate    
  4.     {    
  5.         public int ID { getset; }    
  6.         public string Name { getset; }    
  7.         public DateTime InsertUTCDate { getset; }    
  8.     }    
  9. }    
Next, we’ll create our two entities, each one in a separate file,
 
Ref
  1. using System.ComponentModel.DataAnnotations.Schema;    
  2.     
  3. namespace TestLINQ    
  4. {    
  5.     [Table("Ref")]    
  6.     public class Ref : BaseEntity    
  7.     {    
  8.         public int ID2 { getset; }    
  9.     }    
  10. }    
Customer
  1. using System.ComponentModel.DataAnnotations.Schema;    
  2.     
  3. namespace TestLINQ    
  4. {    
  5.     [Table("Customer")]    
  6.     public class Customer: BaseEntity    
  7.     {    
  8.         public int Ref_ID { getset; }    
  9.         public int Ref_ID2 { getset; }    
  10.     }    
  11. }    
Finally, let’s create a context UserContext in a separate file,
  1. using System.Data.Entity;    
  2.     
  3. namespace TestLINQ    
  4. {    
  5.     public class UserContext : DbContext    
  6.     {    
  7.         public UserContext()    
  8.             : base("DbConnection")    
  9.         {    
  10.             Database.SetInitializer<UserContext>(null);    
  11.         }    
  12.     
  13.         public DbSet<Customer> Customer { getset; }    
  14.         public DbSet<Ref> Ref { getset; }    
  15.     }    
  16. }    

In this way, we get a solution for performing optimization tests with the help of LINQ to SQL through Entity Framework for MS SQL Server:

Fig.5 - Solution for optimization tests
 
Now, let’s enter the following code in the Program.cs file:
  1. using System;  
  2. using System.Collections.Generic;  
  3. using System.Linq;  
  4.   
  5. namespace TestLINQ  
  6. {  
  7.     class Program  
  8.     {  
  9.         static void Main(string[] args)  
  10.         {  
  11.             using (UserContext db = new UserContext())  
  12.             {  
  13.                 var dblog = new List<string>();  
  14.                 db.Database.Log = dblog.Add;  
  15.   
  16.                 var query = from e1 in db.Customer  
  17.                             from e2 in db.Ref  
  18.                             where (e1.Ref_ID == e2.ID)  
  19.                                  && (e1.Ref_ID2 == e2.ID2)  
  20.                             select new { Data1 = e1.Name, Data2 = e2.Name };  
  21.   
  22.                 var result = query.Take(1000).ToList();  
  23.   
  24.                 Console.WriteLine(dblog[1]);  
  25.   
  26.                 Console.ReadKey();  
  27.             }  
  28.         }  
  29.     }  
  30. }  
When we run the project, this is the output we’ll see in the console as a result,
  1. SELECT TOP (1000)   
  2.     [Extent1].[Ref_ID] AS [Ref_ID],   
  3.     [Extent1].[NameAS [Name],   
  4.     [Extent2].[NameAS [Name1]  
  5.     FROM  [dbo].[Customer] AS [Extent1]  
  6.     INNER JOIN [dbo].[Ref] AS [Extent2] ON ([Extent1].[Ref_ID] = [Extent2].[ID]) AND ([Extent1].[Ref_ID2] = [Extent2].[ID2])  
As you can see, a LINQ query has efficiently generated a SQL query to the MS SQL Server DBMS.
 
Now, let’s change the AND condition to OR in the LINQ query,
  1. var query = from e1 in db.Customer  
  2.                             from e2 in db.Ref  
  3.                             where (e1.Ref_ID == e2.ID)  
  4.                                  || (e1.Ref_ID2 == e2.ID2)  
  5.                             select new { Data1 = e1.Name, Data2 = e2.Name };  
Execute the application again.
 
An exception will be thrown. From the description of the error, we’ll see that the operation has timed out after 30 seconds,
 
Fig.6 – Error description
 
This is the query that LINQ has created,
 
Fig.7 – LINQ query
 
We can see that the selection is performed through a cartesian product of two sets(tables),
  1. SELECT TOP (1000)   
  2.     [Extent1].[Ref_ID] AS [Ref_ID],   
  3.     [Extent1].[NameAS [Name],   
  4.     [Extent2].[NameAS [Name1]  
  5.     FROM  [dbo].[Customer] AS [Extent1]  
  6.     CROSS JOIN [dbo].[Ref] AS [Extent2]  
  7.     WHERE [Extent1].[Ref_ID] = [Extent2].[ID] OR [Extent1].[Ref_ID2] = [Extent2].[ID2]  
 Let’s rewrite the LINQ query like this,
  1. var query = (from e1 in db.Customer  
  2.                             join e2 in db.Ref  
  3.                             on e1.Ref_ID equals e2.ID  
  4.                             select new { Data1 = e1.Name, Data2 = e2.Name }).Union(from e1 in db.Customer  
  5.                                                                                     join e2 in db.Ref  
  6.                                                                                     on e1.Ref_ID2 equals e2.ID2  
  7.                                                                                     select new { Data1 = e1.Name, Data2 = e2.Name });  
 This is the SQL query we’ll get as a result,
  1. SELECT   
  2.     [Limit1].[C1] AS [C1],   
  3.     [Limit1].[C2] AS [C2],   
  4.     [Limit1].[C3] AS [C3]  
  5.     FROM ( SELECT DISTINCT TOP (1000)   
  6.         [UnionAll1].[C1] AS [C1],   
  7.         [UnionAll1].[NameAS [C2],   
  8.         [UnionAll1].[Name1] AS [C3]  
  9.         FROM  (SELECT   
  10.             1 AS [C1],   
  11.             [Extent1].[NameAS [Name],   
  12.             [Extent2].[NameAS [Name1]  
  13.             FROM  [dbo].[Customer] AS [Extent1]  
  14.             INNER JOIN [dbo].[Ref] AS [Extent2] ON [Extent1].[Ref_ID] = [Extent2].[ID]  
  15.         UNION ALL  
  16.             SELECT   
  17.             1 AS [C1],   
  18.             [Extent3].[NameAS [Name],   
  19.             [Extent4].[NameAS [Name1]  
  20.             FROM  [dbo].[Customer] AS [Extent3]  
  21.             INNER JOIN [dbo].[Ref] AS [Extent4] ON [Extent3].[Ref_ID2] = [Extent4].[ID2]) AS [UnionAll1]  
  22.     )  AS [Limit1]  
Unfortunately, there can only be one connection condition in one LINQ query, so we can achieve the results we need by creating one query for each of the two conditions and then combining them by using Union to delete repeating lines.
 
Yes, queries will be non-equivalent in most cases, considering that full row duplicates can be returned. However, in real life duplicate rows are not needed and usually they’re something you’ll want to get rid of.
Now let’s compare the execution plans of these two queries:
 
Average execution time for CROSS JOIN is 195s,
 
Fig. 8 – Execution time for CROSS JOIN
  
Average execution time for INNER JOIN-UNION is less than 24s
 
 
Fig. 9 – Execution time for INNER JOIN-UNION
 
As we can see from the results, the optimized LINQ query works several times faster than the unoptimized one in these two tables with millions of records.
 
For the version with the AND condition, a LINQ query will look like this,
  1. var query = from e1 in db.Customer  
  2.                             from e2 in db.Ref  
  3.                             where (e1.Ref_ID == e2.ID)  
  4.                                  && (e1.Ref_ID2 == e2.ID2)  
  5.                             select new { Data1 = e1.Name, Data2 = e2.Name };  
Almost always in this case, a correct SQL query will be generated, with the execution time of approximately 24 seconds:
 
 
Fig. 10 - Correct SQL query
 
Also, for LINQ to Objects operations, instead of a query that looks like this,
  1. var query = from e1 in seq1  
  2.                             from e2 in seq2  
  3.                             where (e1.Key1==e2.Key1)  
  4.                                && (e1.Key2==e2.Key2)  
  5.                             select new { Data1 = e1.Data, Data2 = e2.Data };  
 we can use a query similar to the following,
  1. var query = from e1 in seq1  
  2.                             join e2 in seq2  
  3.                             on new { e1.Key1, e1.Key2 } equals new { e2.Key1, e2.Key2 }  
  4.                             select new { Data1 = e1.Data, Data2 = e2.Data };  
where,
  1. Para[] seq1 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 4, Data = "999" } };  
  2. Para[] seq2 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 5, Data = "999" } };  
The Para type is defined in the following way,
  1. class Para  
  2. {  
  3.         public int Key1, Key2;  
  4.         public string Data;  
  5. }  

Conclusion

 
We looked at some aspects of LINQ query optimization for MS SQL Server. Also, SQL Complete helped us a lot with searching through the query history and with formatting the scripts we used in the course of this article.
 
Unfortunately, even experienced .NET developers often forget that it’s necessary to understand what the instructions they use do in the background. Otherwise, they can become configurators and set a figurative time bomb in the future – both when the solution is scaled and when the environment’s external conditions are slightly changed.
 
Source files for the test – the project itself, table creation in the TEST database, and population of these tables with data can be found here.
 
Also, the Plans folder from this repository contains execution plans of queries with OR conditions.
 
Additionally, there is an excellent solution called dotConnect - it’s a line of data access components from Devart for various DBMS. Among others, dotConnect components support such ORM tools as Entity Framework Core and LinqConnect which allows you to work with LINQ to SQL classes.
 
References 
  1. https://habr.com/ru/post/459716/
  2. https://stackoverflow.com/questions/56940996/how-to-optimally-combine-two-collections-in-linq-with-several-conditions-for-ms
  3. https://github.com/jobgemws/TestLINQ