SQL Server Query Optimization - Day 2 (Number of Joins Does Matter)

Before reading this article, I highly recommend reading my previous part:

Query Optimizer is designed in a way to keep a balance between the quality of a query and its execution time. Query Optimizer can’t take hours to evaluate all candidate plans in order to select the best one, but rather it is designed to try within a limited time and select optimum plan. 
To select an optimum performing plan, the most important thing for Query Optimizer is “Join Order." At a single time, only two tables can be joined together, and thena  third can be joined with the output from the first two tables, or it is possible to join the two tables separately and then join the result sets. Ordering depends on the nature of join. But which table should be accessed first and which will be joined later -- all these possible patterns are evaluated to get best one.
 
The number of expected plans (candidate plans) is directly proportional to the number of tables joined together in a query. Query Optimizer will always select best possible plan, if expected plans are very few and Query Optimizer is able to evaluate all possible plan within its limited time. On the other hand, if the database is over-normalized and more tables need to be joined, then there are hundreds or thousands of candidate plans and due to limited time Query Optimizer can evaluate a fewer number of plans, and there are possibilities that all evaluated plans were badly performing and Optimizer has selected the best way out of all the bad plans.

How many query plans are possible with a different number of tables in a join?

To join, query tree can be:

  • Left-Deep Tree
  • Right-Deep Tree
  • Bushy Tree
 

The following table shows, how many candidate plans are possible with increasing number of tables in a query (Next time when you write a query, keep these figures in your mind),

Tables Candidate Plans
Left-deep trees
Candidate Plans
Bushy trees
111
222
3612
424120
51201,680
672030,240
75,040665,280
840,32017,297,280
9362,880518,918,400
103,628,80017,643,225,600

Query with just 8 tables can have more then 17 million candidate plans and you will be lucky to get a good performing plan out of these. That is the reason only 4 or 5 tables are recommended in a single query. Having a higher number of tables means a higher number of candidate plans and as a result more possibilities to have a bad plan for query execution.

One more misconception among a small group of developers is that using simple views can resolve this problem. In fact it’s just an illusion.

Suppose that we have a View, which use five tables in its definition. If we join this view within our query then SQL Server will reopen this view with all its five tables and will use these to join with three tables of query and as a result 17,297,280 plans will be there for Query Optimizer to evaluate.

 

Note: Materialized or indexed views acts like a single table as these are stored physically as single object.

Summary

The number of tables in a query does matter. Keep this number as low as possible by dividing work among queries. Avoid over normalization of your data that will push you to have more tables in every other quer, and last but not least, tables within a simple view are added tothe  final query, and simple views are not a solution. Even derived tables are not a solution for this problem.

Queries perform better when they are written according to Query Optimizer.