CTE (Common Table Expression) in SQL

Introduction

CTE stands for Common Table Expression, which is a temporary named result set that can be used within the scope of a single SQL statement. CTE is a part of the SQL standard and is supported by most relational database management systems, including SQL Server, Oracle, MySQL, and PostgreSQL.

Common Table Expression in SQL

Common Table Expressions (CTE) are defined using the "WITH" clause and can reference themselves, other CTEs, and tables in the same SQL statement. The syntax for defining a CTE is as follows- Once a CTE is defined, it can be used in subsequent SELECT, INSERT, UPDATE, and DELETE statements, just like a regular table. CTEs are not stored as permanent database objects and are only visible within the scope of the SQL statement that defines them.

CTEs are particularly useful when dealing with complex queries that involve multiple subqueries or self-referencing queries. They can help improve the readability,  maintainability, and performance of SQL code.CTEs are particularly useful for recursive queries and for breaking down complex queries into smaller, more manageable parts. CTEs are a powerful tool for working with SQL queries, and they can help simplify complex queries and make them more manageable.

Advantages of CTE in SQL

  • Improved readability and maintainability - CTEs can help improve the readability of SQL code by breaking down complex queries into smaller, more manageable parts. This can make understanding and modifying the code easier, especially for large and complex queries.
  • Simplify complex queries - CTEs can simplify complex queries by allowing developers to define temporary result sets that can be reused in subsequent queries. This can make it easier to create complex queries without repeating the same code multiple times.
  • Recursive queries - CTEs are particularly useful for recursive queries, which are queries that refer back to themselves to generate the final result set. Recursive queries can be difficult to write in SQL, but CTEs provide an easy and efficient way to create them.
  • Optimization - CTEs can improve query performance by allowing the database engine to optimize the execution plan for the CTE. This can result in faster query execution times and better overall performance.
  • Portability - CTEs are a standard part of SQL supported by most relational database management systems. This makes SQL code that uses CTEs more portable and allows it to be used across multiple database platforms.

Disadvantages of CTE in SQL

  • Performance overhead - While CTEs can improve query performance in some cases, they can also introduce additional overhead, particularly when working with large datasets. This can result in slower query execution times and reduced overall performance.
  • Limited scope - CTEs are only visible within the scope of the SQL statement that defines them and cannot be accessed from outside that statement. This can limit their usefulness in certain situations where data needs to be shared across multiple queries or statements.
  • Readability - While CTEs can improve the readability of complex SQL queries, they can also make the code more difficult to understand if not used properly. This is especially true if the CTE definition is lengthy or includes multiple subqueries.
  • Complexity - CTEs can add complexity to SQL code, particularly if the query involves multiple levels of nesting or if the CTE references itself or other CTEs. This can make the code more difficult to debug and maintain over time.
  • Portability - While CTEs are a standard part of SQL, the specific syntax and functionality may differ slightly across different database platforms. This can make it difficult to write SQL code that works consistently across multiple database systems.

CTE  Example in SQL

Suppose we have a table called sales that contains information about the sales transactions of a company, including the date of the sale, the product sold, and the total amount of the sale. We want to find the total sales for each month of the year and then calculate the percentage of each month's sales relative to the total sales for the entire year. To accomplish this, we can use a CTE to calculate the total sales for each month and then join the CTE back to the original sales table to calculate the percentage of each month's sales.

WITH monthly_sales AS (
  SELECT MONTH(date) AS month, SUM(amount) AS total_sales
  FROM sales
  GROUP BY MONTH(date)
)
SELECT 
  MONTH(sales.date) AS month, 
  SUM(sales.amount) AS monthly_total, 
  monthly_sales.total_sales, 
  (SUM(sales.amount) / monthly_sales.total_sales) * 100 AS percent_total
FROM sales
JOIN monthly_sales ON MONTH(sales.date) = monthly_sales.month
GROUP BY MONTH(sales.date), monthly_sales.total_sales
ORDER BY month ASC;
--Common Table Expression (CTE) to find the first order date for each customer in the "orders"
--table, and then joins this information with the original "orders" table to calculate
-- the total amount spent by each customer on their first order



WITH first_orders AS (
  SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders
  GROUP BY customer_id
)
SELECT o.customer_id, SUM(o.order_amount) AS total_spent
FROM orders o
JOIN first_orders fo ON o.customer_id = fo.customer_id AND o.order_date = fo.first_order_date
GROUP BY o.customer_id;
--(CTE) to find the most popular products in the "order_items" table by calculating the total
 --quantity sold for each product, and then joins this information with the "products" and
 --"categories" tables to retrieve the category name, product name, and total quantity sold for each --popular product.

WITH popular_products AS (
  SELECT product_id, SUM(quantity) AS total_sold
  FROM order_items
  GROUP BY product_id
  HAVING SUM(quantity) > 100
)
SELECT c.category_name, p.product_name, pp.total_sold
FROM products p
JOIN categories c ON p.category_id = c.category_id
JOIN popular_products pp ON p.product_id = pp.product_id
GROUP BY c.category_name, p.product_name, pp.total_sold
ORDER BY c.category_name, pp.total_sold DESC;
--(CTE) to calculate the total amount spent by each customer in the "orders" table,
-- and then joins this information with the "customers" table to retrieve the customer
-- name, email, and total amount spent.


WITH customer_orders AS (
  SELECT customer_id, SUM(order_amount) AS total_spent
  FROM orders
  GROUP BY customer_id
)
SELECT c.customer_name, c.customer_email, co.total_spent
FROM customer_orders co
JOIN customers c ON co.customer_id = c.customer_id;
--(CTE) to rank employees in the "employees" table by salary, and then selects
-- the top 3 highest paid employees.

WITH ranked_employees AS (
  SELECT employee_id, employee_name, hire_date, salary,
         ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
  FROM employees
)
SELECT employee_name, hire_date, salary
FROM ranked_employees
WHERE rank <= 3;
--CTE) to rank customers in the "orders" table by the number of orders they have placed, 
--and then selects the first order count for each customer.

WITH ranked_customers AS (
  SELECT customer_id, COUNT(*) AS num_orders,
         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY customer_id) AS row_num
  FROM orders
  GROUP BY customer_id
)
SELECT customer_id, num_orders
FROM ranked_customers
WHERE row_num = 1;
WITH employees_over_40 AS (
  SELECT *
  FROM employees
  WHERE age > 40
)
UPDATE employees
SET salary = salary * 1.1
FROM employees_over_40
WHERE employees.id = employees_over_40.id


---Note that the CTE is only used to filter the data and create a temporary table,
-- and is not actually stored in the database. The update is performed on the original
-- table using the temporary table as a filter.
WITH failed_students AS (
  SELECT *
  FROM students
  WHERE grade < 60
)
DELETE FROM students
WHERE id IN (SELECT id FROM failed_students)



--CTE is only used to filter the data and create a temporary table,
-- and is not actually stored in the database. The deletion is performed on
-- the original table using the temporary table as a filter.
WITH department_lookup AS (
  SELECT department
  FROM employees
  WHERE name = 'John Doe'
)
INSERT INTO employees (name, age, department)
VALUES ('Jane Doe', 25, (SELECT department FROM department_lookup))


--CTE is only used to find the department of the existing employee 
--and is not actually stored in the database. The insertion is performed on 
--the original table using the information obtained from the CTE.

Summary

CTEs can be a valuable tool for SQL developers, but they should be used judiciously and with a clear understanding of their potential advantages and disadvantages. Careful consideration of the specific requirements of the query, the size of the dataset, and the database platform being used can help minimize any potential drawbacks of using CTEs.