Comparison Of Subquery And Joins

Here, in this article, we will learn SQL Joins and SQL Subquery. Also, we are going to see the differences between the two. Let's start!

Joins

Definition

A SQL Join combines data from one or more relational sources (tables) and produces the result as a single table. Joins are useful while fetching the data from a normalized source where it is split into multiple tables.

Types of Joins

Inner Join

An Inner Join is a condition where table results are combined using the values of columns equated in the clause. The Inner Join is performed using the keyword "INNER JOIN" and specifying the condition for the join.

This join can be equi- join ie = or can be non equi-join >,< (rarely used if any)

Syntax

SELECT * FROM firstTable INNER JOIN secondTable ON (firstTable.ID = secondTable.ID) 

SQL Server
Natural join

A Natural Join is a type of equi join where values of the columns in different tables are checked for equality condition. The column name, in this, needs to be same in both the tables. 

Natural Join removes the redundant column present in the result. Although it is easy to perform, it reduces understandability of code as one has to check the schema to understand which column is used to perform the join.

Cross Join

Cross Join is performed on any two tables. Each and every record from the first table is joined with each and every record from the second table. Mostly this is generated in all the joins at their intermediate step.

Cross Join will be large in size and will result in memory overload. It is not a good practice to create such a table.

Syntax

SELECT * FROM tableName1, tableName2 

Self Join

A Self Join is a type of join where a table is joined with itself. This happens in the case when a record in a table is dependent on the other record in the table.

This join is created with the help of an alias.

Syntax

SELECT * FROM tableName AS table1, tableName2 AS table2 WHERE table1.column, table2.column

Outer join

An outer join is required when you require all records from either one table or both tables.

The outer join creates a result which is larger than the original source tables, so decide carefully when to create an outer join. Avoid outer join wherever possible.

Left Outer Join
 
Left outer join is a type of join where all the records from the left side table are joined with all the records from the right table. For records where matching right table record is not present, the NULL value is displayed.

Syntax

SELECT  * FROM tableName LEFT OUTER JOIN tableName1

SQL Server
Right Outer Join
 
Right outer join is a type of join where all the records from the right table are joined with the matching records in the left table. Records where matching left table has no record, will be filled with NULL values.

Syntax

SELECT  * FROM tableName RIGHT OUTER JOIN tableName1 
SQL Server
Full Outer Join
 
Records from both, right and left table are displayed. Wherever matching record is not present, a NULL value is displayed. This is equivalent to UNION of left and right outer join.

Syntax

SELECT  * FROM tableName OUTER JOIN tableName1
SQL Server

Subquery

Definition

A subquery is a query which executes within another query. This is needed when a query depends on the result of another query.

Types of subquery

Subquery can be written in,

  1. Where Clause
    Avoid subquery in where clause as it executes multiple times in a query and degrades the performance of the query. The result of the query is not cached so it may perform poorly over time in large tables.

  2. From Clause
    This is acceptable as a query is executed only once. Although no significant performance upgrade is achieved in this execution as the optimizer provides similar execution plan for corresponding join query.

  3. Select Clause
    This should be avoided as a table is scanned multiple times for each record in the outer table. Similar to where clause, alternative join or temporary table is preferable.

Single row subquery

When a subquery returns one record from the inner query, it is said to be single row subquery.

Single row subquery is also called scalar subquery which is used to match with an aggregate, constant, or expression. It only contains one column and a record.

Operators used for single row subquery -

  1. =
  2. >=
  3. <=
  4. <> 

Multiple row subquery

When a subquery returns more than one query from the inner query it is said to be multiple row subquery.

Operators used for multiple row subquery are

  1. IN
  2. ALL
  3. EXISTS
  4. ANY

Correlated subquery

When the subquery refers to outer table record then it is called correlated subquery. This record is executed multiple times. They perform poorly when used in large tables. Avoid performing correlated subquery whenever possible and use alternative temporary table and joins.

Comparison of Subquery and Joins

Indexing subquery and joins

Index in a table results in faster search from tables. Indexing search is preferred when multiple tables are queried as results are fetched faster. An non indexed search is slower and will degrade the performance of database over time.

Joins are usually performed on indexed columns which results in faster search.

Query with Index

  1. SELECT * FROM employee as e INNER JOIN employee as e1 ON e.EmployeeId = e1.EmployeeId  
  2. SELECT * FROM employee WHERE employee.EmployeeId IN(SELECT EmployeeId FROM employee); 

 

Query without Index

  1. SELECT * FROM employee as e INNER JOIN employee as e1 ON e.EmployeeName = e1.EmployeeName  
  2. SELECT * FROM employee WHERE employee.EmployeeName IN(SELECT EmployeeName FROM employee); 

For Subquery

Without Index

SQL Server

SQL Server

With Index

SQL Server

SQL Server

The result above indicates that querying a non-indexed field requires the database engine to create one hash match; i.e., index, before performing the search. This step adds load to the server.

For Join

Without Index

SQL Server

SQL Server

With Index

SQL Server

SQL Server

Similarly, in a join, the tables need to be hashed first before execution.

Note

  • Perform subquery instead of joins when performing the non-indexed search as the performance of join degrades when intermediate tables are generated. In joins, a cross join table is generated which is then searched where the efficiency of query degrades. On top of it, the size of the result generated is far more than the result generated by the subquery.
  • Add an index to columns where joins are possible to occur, this will make the search plan faster.
  • Do not add index to the large columns (data type size) or volatile columns(columns where updates are frequently expected)
  • Do not create unnecessary indexes as it may affect the performance of insert.

Table size

Table size affects the execution of the query in both indexed and non-indexed situation for joins and subquery.

Even though the plan of execution is same for both options table size creates a difference in the size of the intermediate result and the CPU cost required to parse and search the result from that intermediate result.

The above-mentioned results are for a table with 50,000 records.

When the table size is small ie 1000 records only the performance of join and subquery as follows,

For joins

With Index

SQL Server

SQL Server

Without Index

SQL Server

SQL Server

For subquery

With Index

SQL Server
SQL Server

Without Index

SQL Server

SQL Server

The size of result nearly doubles for join as the intermediate hash match generates extra tuple in the execution plan. The search is faster than non-indexed ones but still results in more CPU utilization time than the subquery. This is because subquery has smaller intermediate result generated.

Note

  1. Indexed search differs with table size.
  2. Both subquery and join are optimized in indexed search
  3. Large size table causes subquery to degrade.
  4. In a smaller table, the join may perform slightly poor because it generates a large intermediate result compared to a subquery.

Issues in subquery

  • If the subquery is executed with reference to the outer query, then it executes for each record presented by the outer query. This makes the subquery perform poorly.
  • A subquery is not cached sometimes even when the result of inner query remains the same. Revaluation of the query may result in the execution of query unnecessary amount of time.
  • Avoid using a subquery in where clause. Instead, use alternatives of the temporary table with a simple where clause as it may work faster than the subquery.

Issues in joins

  • IO should be kept at a minimum to achieve the maximum performance.
  • Performs joins on indexed columns
  • May get complex and is difficult to understand and read.

X

Build smarter apps with Machine Learning, Bots, Cognitive Services - Start free.

Start Learning Now