Introduction
SQL provides several analytic functions that allow us to perform complex calculations and data analysis tasks. LEAD and LAG functions are window functions in SQL that allow us to access data from other rows within the same resulting row. They are often used with the OVER clause, which defines the partition and ordering of the result set. LEAD function gets a value from a row that comes after the current one, while the LAG function gets a value from a row that comes before the current one.
Both functions accept an offset parameter that specifies the number of rows to move forward or backward from the current row. Let's explore how LEAD and LAG functions with their syntax, use cases, and examples.
LEAD and LAG Functions Syntax
--LAEAD Function Syntax:
LEAD(column_name, offset, default_value) OVER (
PARTITION BY partition_expression
ORDER BY order_expression
)
--LAG Function Syntax:
LAG(column_name, offset, default_value) OVER (
PARTITION BY partition_expression
ORDER BY order_expression
)
- column_name: Name of the column from which you want to retrieve the value.
- offset: Number of rows to move forward (LEAD Function) or backward (LAG Function) from the current row. The default value is 1 and should be positive.
- default_value (optional): Default value to return if the lead or lag value is NULL or if there is no subsequent row within the partition.
- PARTITION BY (optional): Divides the result set into partitions based on the specified expression.
- ORDER BY: Specifies the ordering of rows within each partition.
Lets take an example for LEAD and LAG functions.create a table named sales with the following structure:
-- Create salas table
CREATE TABLE sales (
product VARCHAR(50),
year INT,
sales_amount DECIMAL(10,2)
);
-- Add some dummy data into the table
INSERT INTO sales (product, year, sales_amount) VALUES
('Apples', 2021, 1500.50),
('Bananas', 2021, 2500.75),
('Carrots', 2021, 3200.00),
('Apples', 2022, 1700.30),
('Bananas', 2022, 2900.20),
('Carrots', 2022, 3400.60),
('Apples', 2023, 1800.00),
('Bananas', 2023, 3100.45),
('Carrots', 2023, 3600.80),
('Oranges', 2021, 1100.25),
('Oranges', 2022, 1300.50),
('Oranges', 2023, 1400.75),
('Tomatoes', 2021, 1200.00),
('Tomatoes', 2022, 1500.35),
('Tomatoes', 2023, 1600.90);
Retrieving the next product's sales amount using LEAD function
SELECT product,year,sales_amount,
LEAD(sales_amount, 1, 0) OVER (
ORDER BY year, product
) AS next_product_sales
FROM
sales ;
--Sample O/P
+----------+------+---------------+--------------------+
| product | year | sales_amount | next_product_sales |
+----------+------+---------------+--------------------+
| Apples | 2021 | 1500.5 | 2500.75 |
| Bananas | 2021 | 2500.75| 3200.0 |
| Carrots | 2021 | 3200.0| 1100.25 |
| Oranges | 2021 | 1100.25| 1200.0 |
| Tomatoes | 2021 | 1200.0| 1700.3 |
| Apples | 2022 | 1700.3| 2900.2 |
| Bananas | 2022 | 2900.2| 3400.6 |
|....... ..... ....... ......
+----------+------+---------------+--------------------+
In above Query , we use the LEAD function to retrieve the sales amount of the next product in the same year. If there is no subsequent product, the default_value of 0 is returned.
Retrieving the prev product's sales amount using LAG function
SELECT product, year,sales_amount,
LAG(sales_amount, 1, 0) OVER (
ORDER BY year, product
) AS prev_product_sales
FROM sales;
--Sample O/P
+----------+------+---------------+--------------------+
| product | year | sales_amount | next_product_sales |
+----------+------+---------------+--------------------+
| Apples | 2021 | 1500.5 | 0.0 |
| Bananas | 2021 | 2500.75| 1500.5 |
| Carrots | 2021 | 3200.0| 2500.75|
| Oranges | 2021 | 1100.25| 3200.0|
| Tomatoes | 2021 | 1200.0| 1100.25|
| Apples | 2022 | 1700.3| 1200.0|
| ..... .... ..... ......
+----------+------+---------------+--------------------+
In above Query , we use the LEAD function to retrieve the sales amount of the next product in the same year. If there is no subsequent product, the default_value of 0 is returned.
Use Case
Suppose we want to analyze the sales data for each product, not only by comparing the current year's sales with the previous year but also by looking forward to the next year's sales. We can achieve this by combining the LEAD and LAG functions in a single query.
SELECT product, year, sales_amount,
sales_amount - LAG(sales_amount, 1) OVER (
PARTITION BY product
ORDER BY year
) AS year_over_year_diff,
LEAD(sales_amount, 1, 0) OVER (PARTITION BY product
ORDER BY year) - sales_amount AS next_year_diff
FROM sales ORDER BY product, year;
-- Sample O/P
+----------+------+---------------+------------------+----------------+
| product | year | sales_amount | year_over_year_diff | next_year_diff |
+----------+------+---------------+------------------+----------------+
| Apples | 2021 | 1500.5 | NULL | 199.8 |
| Apples | 2022 | 1700.3 | 199.8 | 99.7 |
| Apples | 2023 | 1800.0| 99.7 | -1800.0 |
| Bananas | 2021 | 2500.75| NULL | 399.45|
| Bananas | 2022 | 2900.2 | 399.45| 200.25|
| Bananas | 2023 | 3100.45| 200.25| -3100.45|
| Carrots | 2021 | 3200.0| NULL | 200.6 |
| ..... ... .... ..... ....... ...
+----------+------+---------------+------------------+----------------+
In the LAG Functions we are calculating the year-over-year difference in sales by subtracting the previous year's sales amount from the current year's sales amount. In the LEAD Functions we are calculating the difference between the next year's sales amount and the current year's sales amount. By including both expressions in the same query, we can analyze the sales data for each product by looking at the year-over-year difference as well as the projected difference for the next year.
Summary
This is just a one example of how LEAD and LAG functions can be used in SQL. They are powerful tools for working with ordered data and retrieving values, enabling you to perform complex calculations and data analysis tasks.
Remember, when using these functions, it is essential to understand the ordering and partitioning of your data to ensure accurate results. Also you can combine LEAD and LAG with other window functions and clauses to perform even more advanced data manipulation and analysis tasks. You can find above example and use-case on my Github account.