SQL Server Query Optimization - Day 1 (Basics)

Writing better performing queries is an ever more demanding topic. Before starting a series of posts which will help you to find out the secrets behind better performing queries, let’s revisit some basics of SQL Server and TSQL.
 
The most surprising thing in the SQL Server is that TSQL is not a HOW language it’s just a WHAT language. If you need to execute a query to get an employees list with salary and load as advances records with no loan amount in the advance, you can just ask SQL Server through TSQL to provide you a desired list of employees according to your criteria. But how will the query be executed? It is purely the SQL Server's job to find an efficient way to extract information for you.

  1. SELECT  em.*    
  2. FROM dbo.Emp em   
  3.    INNER JOIN dbo.Salary sl ON sl.empid= em.empid  
  4.    INNER JOIN dbo.LoansAdvances la ON la.empid = em.empid  
  5. WHERE em.City = ‘New York’  

SQL Server Query Optimizer is responsible for this tough job of execution, plan creation, and selection. By executing your above query, first it will be parsed and syntax will be verified and if passed, the“Binding” phase will verify the existence of base objects (tables, views, and functions). If everything is good during the first two phases, then the Query Optimizer's job starts. Query Optimizer (based on server and query configurations and table statistics - metadata of user data) will create multiple execution plans and will select an optimum performing plan based on execution cost. Execution cost is the total of CPU cost and IO cost.

Why optimum and why not best
 
Because SQL Server Query Optimizer is bound to create and select a plan out of hundreds or thousands of plans, but within a fixed half second (500 milliseconds), the Optimizer can’t take a whole day to test all expected plans; that’s why the Query Optimizer optimum plans within half milliseconds and sends it for execution.

 

Serial Plan Execution or Parallel Plan Execution

Besides a few configurations on the SQL Server for instance, a query can be executed using a single CPU core, or query work can be divided and resolved through multiple CPU cores on a box. Again, it's the Query Optimizer (and sometimes execution engine) that will decide whether a query will be fast on a single CPU or on multiple CPUs, and if it is good to divide on multiple CPUs, then how many CPUs will be involved. A query can be slow if it’s not written according to Query Optimizer, and Optimizer will be unable to divide work on multiple CPUs and will try to resolve it through only one CPU.

 

Data movement between memory and hard drive (IO)

One more basic concept which is a must to keep in mind is that to resolve a query, the SQL Server has to load complete data pages in memory instead of only the required table records. A data page is a logical boundary of 8KB. If your query needs only one record, butthe page where the  record is located contains hundreds of other records,it still has to move the whole page from hard drive to memory. Not only one page; actually the SQL Server will load the whole extent. That is the reason we will keep eye on the number of page reads or scans during query optimization.

 

Summary

TSQL is just a WHAT query and not a HOW query, and it's the SQL Server Query Optimizer's job to find the most optimum way to execute your query. SQL Server Query Optimizer is just a piece of software and has limitations. It can only work better if you have provided a TSQL query the way it can understand better.