LINQ to SQL Performance Considerations

This article focuses on some of the performance issues while working with LINQ to SQL.


Introduction

As developers we have a wide array of tools at our disposal.  One of the most time consuming, but most beneficial tasks we must undertake is to determine which of those tools to use for a given situation.

We must weigh very carefully the trade offs to using many of these tools.  Some tools provide us with an easy to use interface but at a cost of overall performance.

Some tools give us lighting speed but at a cost of code maintainability.

Is any one tool the best solution for all situations? If you think so please keep on reading.

We should also not base our decision on choosing a tool because it is the current programming rage in the industry or the latest and greatest cool widget from XYZ Corporation.

Recently I have been able to do some coding using LINQ to SQL.  Being an old school stored procedure and SQL guy I decided to roll up my sleeves and jump into to the O/RM (Object Relational Model) ocean as it may be.

What I found is quite interesting and has allowed me to add some new interesting tools to my programming tool box.

The Battle at Hand

Where I am currently consulting I have been writing a manufacturing Traceability system in ASP.Net that calls a lot of stored procedures.

I have had to make some modifications to the application and decided to incorporate some LINQ to SQL into the code so I could evaluate several factors. 

I wanted to see what impact writing LINQ to SQL had on the overall development and test cycle time and wanted to evaluate the impact this new code had on the performance and response time of the application.

I decided to take one stored procedure and modify the code to try out various programming situations and compare the findings.  I decided to compare the original code that calls the stored procedure with various LINQ to SQL solutions and then benchmark the findings.

Here is a scaled down version of the stored procedure I was originally using:

Create PROCEDURE [dbo].[CalcFailedTest]
          -- Add the parameters for the stored procedure here
          @StartDate datetime,
          @EndDate datetime,
          @Model varchar(2)
AS
BEGIN
          -- SET NOCOUNT ON added to prevent extra result sets from
          -- interfering with SELECT statements.
          SET NOCOUNT ON;
          Declare @StartDateTime as datetime
          Declare @EndStartDateTime as datetime
          Declare @EndDateTime as datetime
          Declare @StartTime varchar(12)
          Declare @EndTime varchar(12)
          Declare @Hours int
          declare @TotalCnt int
          select @StartTime = StartTime from shifts
          where shiftname = 'Shift1'
          select @EndTime = StartTime,
                   @Hours = Hours from shifts
          where shiftname = 'Shift2'
          -- set the begin date and end date
          set @StartDateTime = Convert(datetime,@StartDate + @StartTime,20)
          set @EndStartDateTime = Convert(datetime,@EndDate + @EndTime,20)
          set @EndDateTime = DateAdd(hh,@Hours,@EndStartDateTime)
select @TotalCnt = COUNT(partnumber) from Inspection where TestTimeStamp > @StartDateTime and TestTimeStamp < @EndDateTime
and Model = @Model
          select
case when TestResults = 'F' then COUNT(PartNumber) else 0 end as FailedCount,
case when TestResults = 'C' then COUNT(PartNumber) else 0 end as CancelCount,
TestResults ,FailStep, Convert(decimal(18,2),1.0*COUNT(PartNumber)/@TotalCnt*100) PercentFailed
from Inspection where TestTimeStamp > @StartDateTime and TestTimeStamp < @EndDateTime
          and TestResults != 'P' and Model = @Model
          group by TestResults, FailStep
          order by FailedCount desc
End

Background

Our production facility operates two shifts per day; our day runs from 03:00 AM on one day to 0:300AM the next day. So to calculate production metrics properly we have to first determine start and end date and times.

The stored procedure returns all tests that were failed or canceled, which test step the test was failed or canceled at and what the percentage failed was (meaning that in some cases a tester could fail or cancel a test, for the same reason, more than one time a day. 

In order to calculate Percentage Failed we also have to know the total tests run for the given model.  That is the reason for the @TotalCnt select statement.

The engineers want the failed and canceled counts in different columns on the report so they can export to excel and perform calculations; this is why I use the case statement.

1st Scenario 

In the first scenario I decided to use LINQ to SQL query expressions to re-create the above results. I of course used the MSLinqToSQLGenerator (Designer) to create the object model used by the below code: Here is the code:

Program p = new Program();
SQLDataDataContext sq = new SQLDataDataContext(p.GetConnectionString("Production_Monitoring");
sq.ObjectTrackingEnabled = false;
string s1StartTime = "";
string s2StartTime = "";
string sDate = "2010-03-29";
int tcount = 0;
int iHours = 0;
var stopwatch = new Stopwatch();
stopwatch.Start();
// Get Shift1 Start Time
var shift1 =
    from sft1 in sq.Shifts
    where sft1.ShiftName == "Shift1"
    select new { sft1.StartTime };
foreach (var sft in shift1)
{
    s1StartTime = sft.StartTime;
}
// Get Shift2 Hours
var shift2 =
    from sft2 in sq.Shifts
    where sft2.ShiftName == "Shift2"
    select new {sft2.StartTime, sft2.Hours };
foreach (var sft in shift2)
{
    s2StartTime = sft.StartTime;
    iHours = Convert.ToInt32(sft.Hours);
}
// Calculate the start and end dates and times
DateTime dStartTime = Convert.ToDateTime(sDate + " " + s1StartTime);
DateTime dEndStartTime = Convert.ToDateTime(sDate + " " + s2StartTime);
DateTime dEndTime = dEndStartTime.AddHours(iHours);
// get the total model count
var icount =
    from insp in sq.Inspections
    where insp.TestTimeStamp > dStartTime && insp.TestTimeStamp < dEndTime && insp.Model == "EP"
    group insp by insp.Model into grp
    select new { Count = grp.Count() };
foreach (var i in icount)
{
    tcount = i.Count;
}
// generate failed step results and calculate percentage failed
var unordered =
    from insp in sq.Inspections
    where insp.TestTimeStamp > dStartTime && insp.TestTimeStamp < dEndTime && insp.Model == "EP" && insp.TestResults != "P"
    group insp by new { insp.TestResults, insp.FailStep } into grp
    select new
    {
        FailedCount = (grp.Key.TestResults == "F" ?
        grp.Count() : 0),
        CancelCount = (grp.Key.TestResults == "C" ?
        grp.Count() : 0),
        grp.Key.TestResults,
        grp.Key.FailStep,
        PercentFailed = Convert.ToDecimal(1.0 * grp.Count() /
                    tcount * 100)
    };
var fStep =
    from selection in unordered
    orderby selection.FailedCount descending,
    selection.CancelCount descending
    select selection;
stopwatch.Stop();
Console.WriteLine("Linq - " +
    stopwatch.ElapsedMilliseconds);
stopwatch.Reset();

Notice that I set sq.ObjectTrackingEnabled = false.  Since the application is read only we want to turn off the the update, create and delete overhead that LINQ provides in the data context.

First I get the shift 1 start time, and then we get the shift 2 start time and hours for the shift.

I then calculate the day start and end date and times.

I then get the total count for the Model being tested.

I now can query the testers that produces failed and canceled results and calculate the percentage failed.

2nd Scenario

In the second scenario we will examine the call to the stored procedure using ado.net and my data layer class.  Please refer to my previous article for information on how I implement a data layer.  My previous article is called "Optimize your Data Layer for quicker Code Development" 

Here is the code:

sqlServer DB = new sqlServer();
SqlConnection DBConnect = null;
SqlDataReader ESDR = null;
string[] strParms = new string[3];
stopwatch.Start();
strParms[0] = "2010-03-29";
strParms[1] = "2010-03-29";
strParms[2] = "EP";
DB.Connect("Production_Monitoring", ref DBConnect);
DB.GetLookUpData("CalcFailedTest", "@StartDate,@EndDate,@Model",
ref ESDR, DBConnect, strParms);
stopwatch.Stop();
Console.WriteLine("DataLayer = " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();

As you can see using ado.net and a streamed lined data layer there is not much code to implement.

I create an instance of the data layer.  We create SqlConnection and  SqlDataReader objects.  I setup the strParms array with the arguments we want to pass to the stored procedure, get the connection string from the app.config file and call the stored procedure.

3rd Scenario

In this scenario I switched back to LINQ in order to call the above stored procedure.  This way I can compare both maintainability and performance comparisons between this and the other scenarios.

Here is the LINQ code to call a stored procedure:

Program p = new Program();
SQLDataDataContext sq = new SQLDataDataContext(p.GetConnectionString("Production_Monitoring"));
sq.ObjectTrackingEnabled = false;
stopwatch.Start();
string sDate = "2010-03-29";
DateTime dStart = Convert.ToDateTime(sDate);
DateTime dEnd = Convert.ToDateTime(sDate);
var lsp = sq.CalcFailedTest(dStart, dEnd, "EP").ToList();
stopwatch.Stop();
Console.WriteLine("Linq SP - " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();

Here I create a new data context and set the ObjectTrackingEnabled property to false.  I setup the date parameters and call the stored procedure.

Notice that I call the .ToList() method so all the results will be processed in order to get accurate benchmark times.

4th Scenario

In the fourth scenario I will use LINQ to call a user-defined function.  User-defined functions are more flexible that stored procedures as you can call user-defined functions from stored procedures as well as from LINQ query expressions.

Here is the code:

Program p = new Program();
SQLDataDataContext sq = new SQLDataDataContext(p.GetConnectionString("Production_Monitoring"));
sq.ObjectTrackingEnabled = false;
stopwatch.Start();
string sDate = "2010-03-29";
DateTime dStart = Convert.ToDateTime(sDate);
DateTime dEnd = Convert.ToDateTime(sDate);
var lfn = sq.fn_CalcFailedTest(dStart, dEnd, "EP").ToList();
stopwatch.Stop();
Console.WriteLine("Linq SP - " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();

As you can see there is no difference between calling a stored procedure and a stored function.  But as indicated there is much more flexibility in the type of scenario's that you can use user-defined functions.

Benchmarks

Now that you have viewed all of the scenarios we can review the performance benchmarks:

Currently in our production database the Inspection table contains 182,000 rows.  The result of the above queries returns 14 rows.

I ran these queries from a Visual Studio C# console application and benchmarked the running queries using the Stopwatch class, which is located in the System.Diagnostics namespace.

In order to get a good sample size I ran the console application 5 times.

Here are the results:

ScernaioMillisecondsMillisecondsMillisecondsMillisecondsMilliseconds
LINQ175171165165164
Data Layer7777777777
LINQ SP9088898889
LINQ FN100838383100

Conclusion

I assumed that the performance of LINQ would be less that the data layer but what surprised me was the amount of this difference.

In complex queries that make multiple connections to the database and have to translate the LINQ code to SQL for each query appear to cause some performance issues.

The performance of calling stored procedures and stored functions from LINQ is not that much different from the data layer query.

The benefits of using LINQ to SQL really shorten up the coding process once you get past the learning curve.  I am still going through this learning curve.

Code maintainability is manageable and is similar to using my data layer.  But the O/RM object mappings, code safe declarative approach does provide major benefits to the job of developing code.

So it makes a lot of sense to look at which situations you want to implement query expressions to query a database and which situations you want to use LINQ to query stored procedures or user-defined functions.

The moral of the story is to use some judgment and perform some benchmarking and profiling before you choose which tool to use from your tool box.

For me this learning experience has given me a new tool to add to my tool box.