SQL Performance (1), Optimized SELECT Query (A)

This series of articles will discuss SQL server performance.  In the title, I use SQL Performance because the articles could include SQL statement (Query) optimization and also include SQL Server performance issues, and probably include stored procedure performance issues.

A: Introduction[ref]

 
Performance tuning SELECT statements can be a time-consuming task that follows Pareto principle’s 80/20 rule: 20% effort is likely to give you an 80% performance improvement. To get another 20% performance improvement you probably need to spend 80% of the time.
 
In this article, we will emphasize the 20% effort with 80% benefits to improve performance before we move to query plans and some complicated or detailed issues that I leave to the second article, SQL Performance (2), Optimized SELECT Query (B).  In this way, it will be easier for both me and the readers to follow.
 
So, this article will be the easy part, but the important part for tuning the SQL query to improve the performance.
 

B: The Causes affecting query run time

 
Obviously, the factors affecting query run time is the size of the query return and the complexity of the query,
  • Size
    The larger the table (more columns), the longer the records (more rows), the worse the query performance will be.

  • Complexity
    The more complex the query, the worse the query performance will be.  Some major factors for query complexity as,

    • Joins
      If your joins substantially increase the row count of the result set, your query is likely to be slow.

    • Aggregations
      Combining multiple rows to produce a result requires more computation than simply retrieving those rows.

C: The Major Improvement for Tuning Query

 
Obviously reducing the size and avoid complexities of the query will be the solutions,
  • Reduce Size - Limit Size of the query Set

    • Limit Columns: Only Select Fields You Need
    • Limit Rows: Select Reasonable Record length
    • Remove Unnecessary Tables in the query
    • Remove OUTER JOINS

  • Enforce Indexes to Improve Performance

    • Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement
    • Remove Calculated Fields in JOIN and WHERE Clauses

  • Reduce Complexity --- will be discussed in details in SQL Performance (2), Optimized SELECT Query (B)

    • Run your query during off-peak hours

      • Selecting from large tables (>1,000,000 records)
      • Cartesian Joins or CROSS JOINs
      • Looping statements
      • SELECT DISTINCT statements
      • Nested subqueries
      • Wildcard searches in long text or memo fields
      • Multiple schema queries
We will discuss the 20% effort to improve the performance here and leave the 80% work in the next article, SQL Performance (2), Optimized SELECT Query (B).
 

1. Limit Size of the query Set

 
a. Limit Columns: Only Select Fields You Need
 
SELECT fields instead of using SELECT *.
 
Do not use,
  1. SELECT *    
  2. FROM Customers   
Use,
  1. SELECT FirstName, LastName, Address, City, State, Zip    
  2. FROM Customers   
Assume the Customers table has 10 columns with 1 million rows, if the second query used 2 seconds including the network traffic to get UI, the first one probably will cost double time.
 
b. Limit Rows: Select reasonable Record Length
 
For a very long return records, say, thousands of lines, such as,
  1. SELECT Product, Price, Amount, Total    
  2. FROM Orders   
you should get reasonable length by either adding a restricting condition, such as,
  1. SELECT Product, Price, Amount, Total      
  2. FROM Orders   
  3. WHERE OrderDate  >= '2014-03-01'  AND OrderDate <  '2014-04-01'
or by using a LIMIT statement,
  1. SELECT Product, Price, Amount, Total          
  2. FROM Orders      
  3. LIMIT 1000  
TOP in SQL Server,
  1. SELECT TOP 1000 Product, Price, Amount, Total            
  2. FROM Orders       
c. Remove Unnecessary Tables in the query
 
During development, it is possible that you add tables to the query that may not finally have any impact on the data returned. By removing the JOINS to these unnecessary tables reduce the time of processing the extra data, such as,
  1. SELECT c.FirstName, c.LastName, c.Address, c.City, c.State, c.Zip      
  2. FROM Customers c INNER JOIN Orders o  
  3. ON c.ID = o.CustomerID 
The Orders table finally is not used at all, that should be eliminated from the Query.
 

2. Enforce Indexes

 
a. Make indexes on all fields used in the WHERE and JOIN portions of the SQL statement

There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.

Reason

When fields are not indexed, SQL Server will typically do a full table scan and this may reduce performance. Unless the table is very small, a table scan tends to yield the worst performance out of all the types of database reads.
 
b. Remove Calculated Fields in JOIN and WHERE Clauses
 
This can be done by creating a field with the calculated values used in the join on the table. See below,
  1. FROM sales a JOIN budget b
  2. ON ((year(a.sale_date) * 100) + month(a.sale_date)) = b.budget_year_month            
Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows,
  1. FROM sales a JOIN budget b
  2. ON a.sale_year_month = b.budget_year_month

Reason
 
When calculated fields are used, SQL must do an on-the-fly computing of the field before it can do the comparison even if the fields are indexed.
 

Summary

 
According to Pareto principle’s 80/20 rule, we put 20% effort here to discuss the minimum actions we need to do for SQL performance tuning SELECT statements, and hopefully to get 80% performance improvement; while in the next article, SQL Performance (2), Optimized SELECT Query (B), we will discuss the 80% effort to get another 20% performance improvement.

References