Group By Aggregate Functions In SQL

Usama Shahid
Dec 19, 2018

38.1k
0
3
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Aggregate Functions/Group Functions

Aggregate functions are actually the built-in functions in SQL. They are used for some kind of specific operations, like to compute the average of numbers, the total count of the records, the total sum of the numbers etc. These are also called Group functions because these functions apply on the group of data. For example - if you want to compute the maximum salary among different records, then obviously, there will be more than 1 record from which you’re finding the maximum salary. Similarly, if you want to compute the sum of the salary, you’ll have a collection of salaries where you’ll apply this function to get the output.

An example of Aggregate Functions is given below.

Sum(), Min(), Max(), Count(), Avg()
SELECT COUNT(*) AS InvoiceCount
, SUM(Total) AS TotalAllInvoices
, AVG(Total) AS AverageTotal
, MAX(Total) AS MaxInvoices
, MIN(Total) AS MinInvoices
FROM Invoice
WHERE BillingCountry = 'USA'

Let me tell you how this query works. First of all, the SQL Engine extracts the record from the Invoice table on the basis of WHERE clause and then, these above GROUP functions will apply to the extracted group of records. Here, we’re getting the results after applying the group functions on the complete filtered table.

Now, let’s say we want to get the results on some groups of records of a relation (whether it is filtered or non-filtered). So, we use GROUP BY clause in SQL.

SELECT COUNT(*) as Records
FROM Invoice
WHERE Total > 2
GROUP BY BillingCountry

In this query, we’re getting the total counts of records on the basis of same BillingCountry attribute values. And when we run this query, we’ll get this result.

As you can see, we can’t understand the result just by output values so it is a best practice to also print the attribute on the basis of what we’re getting so as to make the result more meaningful.

SELECT BillingCountry, COUNT(*) as Records
FROM Invoice
WHERE Total > 2
GROUP BY BillingCountry

And with the help of this query, we’ll get some result.

Now, let’s see one more example.

SELECT CustomerId
, AVG(Total)
FROM Invoice
GROUP BY CustomerId

Look, we put Non-Aggregated attributes with GROUP BY. Now, when this query runs, it will collect the same number of Customer IDs and then take the average of all the column values of Customer Id 2 in the Invoice table. This is how it works.

Now, let’s suppose we’ve another attribute name in the script.

SELECT CustomerId
, InvoiceDate
, AVG(Total)
FROM Invoice
GROUP BY CustomerId

When you run this statement, you’ll see the error in the Messages window stating - ‘Column 'Invoice.InvoiceDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.’

Because Avg() is an aggregate function which applies on the collection and InvoiceDate is a DateTime field whose value is always different, so if we’ve just applied the Group By on CustomerID, it is not enough because we need to cover InvoiceDate as well. So,

SELECT CustomerId
, InvoiceDate
, AVG(Total)
FROM Invoice
-- Just for better understandability
WHERE CustomerId = 2
GROUP BY CustomerId, InvoiceDate

Look what we get.

Before InvoiceDate, we were getting only 1 record of CustomerId 2 because 2 is the discrete value. And after putting InvoiceDate in the query we’re getting nearly all the records because it is continuous. Only that InvoiceDate record is skipped here when InvoiceDate value matches with other any record of InvoiceDate plus both records contain CustomerId 2. And we’re getting the Average value of all those matching records.

Let’s see one more example.

SELECT BillingCountry, BillingCity, COUNT(*) as Records
FROM Invoice
WHERE Total > 2
GROUP BY BillingCountry, BillingCity
ORDER BY BillingCountry

Now, SQL Engine first of all gets the collection of records after applying [FROM and WHERE clause] and then we group the records on the basis of BillingCountry and then it further group the records on the basis of BillingCity which means we’ll get the number of records [COUNT(*) as Records] of each country and of each city [GROUP BY BillingCountry, BillingCity].

Having

Actually, we can’t apply the condition to Aggregate functions with the help of WHERE clause. So, we need Having here.

SELECT CustomerId
, AVG(Total)
FROM Invoice
GROUP BY CustomerId
HAVING AVG(Total) > 6

And if we try to do this with WHERE clause,

‘Incorrect syntax near the keyword 'WHERE'.’

This is what we get.

SELECT [State]
, City
, AVG(Total) AS CityAverage
, SUM(Total) AS CityTotal
FROM Customer
JOIN Invoice
ON Customer.CustomerId = Invoice.CustomerId
WHERE [State] <> NULL
GROUP BY [State], City
HAVING SUM(Total) > 40
ORDER BY [State], City

You might be wondering why we put State column name in square brackets. It is because sometimes these words are reserved in SQL Server. And if you try to run the query without putting State attribute name in square brackets, you’ll get the error. Because SQL Server became confused about whether it is your table attribute name or its reserved State keyword. So we put it into square brackets.

HAVING & BETWEEN

We already know that we use BETWEEN to retrieve the records within a specific range. So,

SELECT InvoiceDate
, AVG(Total) AS DateAverage
, SUM(Total) AS DateTotal
FROM Invoice
GROUP BY InvoiceDate
HAVING InvoiceDate BETWEEN '2012-01-01' AND '2012-12-31'
ORDER BY InvoiceDate DESC

Now you might be thinking InvoiceDate is obviously not an aggregate function and it is working. Yes, it also works with non-aggregate but it would be better to use WHERE clause for better performance.

SELECT InvoiceDate
, AVG(Total) AS DateAverage
, SUM(Total) AS DateTotal
FROM Invoice
WHERE InvoiceDate BETWEEN '2012-01-01' AND '2012-12-31'
GROUP BY InvoiceDate
ORDER BY InvoiceDate DESC

Always put WHERE clause before GROUP BY and HAVING clause after GROUP BY otherwise it will not accept and show an error in the Messages window.

Reporting

Mostly, when we’re doing some kind of reporting in SQL Server and we want to get the results on the basis of collections of records we’ve some functions in SQL Server to summarize the results.

GROUPING SETS
ROLLUP
CUBE
OVER

These functions are really very important when we’re especially working with Reports.

GROUPING SETS

To understand the GROUPING SETS, let’s create a new table of Employees.

Create Table Employees
(
Id int primary key identity(1, 1),
Name nvarchar(50),
Gender nvarchar(10),
Salary int,
Country nvarchar(10)
)

And now, we put the values in this Employees table.

Insert Into Employees Values ('Usama', 'Male', 5000, 'USA')
Insert Into Employees Values ('Safwan', 'Male', 4500, 'India')
Insert Into Employees Values ('Gulraiz', 'Female', 5500, 'USA')
Insert Into Employees Values ('Ayesha', 'Female', 4000, 'India')
Insert Into Employees Values ('Anas', 'Male', 3500, 'India')
Insert Into Employees Values ('Areeha', 'Female', 5000, 'UK')
Insert Into Employees Values ('Raza', 'Male', 6500, 'UK')
Insert Into Employees Values ('Eeman', 'Female', 7000, 'USA')
Insert Into Employees Values ('Faseeh', 'Male', 5500, 'UK')
Insert Into Employees Values ('Hassan', 'Male', 5000, 'USA')

Now let’s generate the Gender and the Sum of Salaries. So what will we do?

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender

And it will shows us the result.

Now we want to show the total salaries of individual countries as well. So,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender
UNION ALL
SELECT Country, NULL, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country

And it shows us this result,

Now we also want total salaries by Gender. So,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender
UNION ALL
SELECT Country, NULL, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country
UNION ALL
SELECT NULL, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Gender

And now if we watch the results in Output,

Look we’re making some kind of reports with all statistics. Now let’s we get the final total salaries without any kind of classification. So,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender
UNION ALL
SELECT Country, NULL, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country
UNION ALL
SELECT NULL, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Gender
UNION ALL
SELECT NULL, NULL, SUM(Salary) as TotalSalary
FROM Employees

And the results we get now,

Now with this final query, we have 2 problems.

The query is getting so huge and complex. And if we need some more kind of reporting feature then we’ll also need some more UNION ALL.
We’re request again and again to the Employees table which is really very bad.

So the solution to this problem is,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY
GROUPING SETS (
(Country, Gender),
(Country),
(Gender),
()
)

Now let’s take a look on the results,

Now this is the time to order the results.

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY
GROUPING SETS (
(Country, Gender),
(Country),
(Gender),
()
)
ORDER BY GROUPING(Country), GROUPING(Gender)

And now our results are in orders. And the concept of GROUPING SETS is clear.

ROLLUP

Let’s rewind once again GROUP BY clause

SELECT Country, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country

If we run this statement, looks what we get,

Now let’s use ROLLUP and see what happens.

SELECT Country, SUM(Salary) as TotalSalary, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY ROLLUP(Country)

And this is the result we get,

It means ROLLUP adds a single row if we’re Grouping one column. And obviously we can also achieve the same results with GROUPING SETS

SELECT Country, SUM(Salary) as TotalSalary, AVG(Salary) AS AverageSalary
FROM Employees
GROUP BY
GROUPING SETS (
(Country),
()
)

Yes we can also apply any number of arguments inside ROLLUP()

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY ROLLUP(Country, Gender)

And this query is equal to,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY
GROUPING SETS (
(Country, Gender),
(Country),
()
)

Here we’re grouping the Salary by Country and Gender and it will also display the Subtotal of Country salaries and then the Grand Total.

The classical way of writing this rollup query is,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender WITH ROLLUP

So don’t confuse this.

CUBE

This is used to get the Sum of Salaries grouped by all the combinations of Country and Gender. So,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY CUBE(Country, Gender)

And it will show the result in output window,

And now we need to order the results again,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY CUBE(Country, Gender)
ORDER BY GROUPING(Country), GROUPING(Gender)

Now you can guess from the results that where we get these results before. Yes it is equivalent to this query.

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY
GROUPING SETS (
(Country, Gender),
(Country),
(Gender),
()
)
ORDER BY GROUPING(Country), GROUPING(Gender)

The classical way of this CUBE sql statement is,

SELECT Country, Gender, SUM(Salary) as TotalSalary
FROM Employees
GROUP BY Country, Gender WITH CUBE

And we’re using the latest approach above.

Note
If you’re applying ROLLUP & CUBE on the single column, you won’t see any difference. Both queries will show you the same results. ROLLUP and CUBE are both performance tools actually. If you want to display the data hierarchically then you should use ROLLUP and if you want to display all the combinations of data then you should use CUBE.

For example,

Suppose we want to get the total population of a country, state and city. So if we use ROLLUP here, it will calculate the population of country, state and city first and then it comes to country and state and then it calculates the population of country and then sum all the grand total population. On the other hand, if we’re working with CUBE. It will calculate all the combinations of data and then sum it like,

Country, State, City
Country, State
Country, City
Country
State, City
State
City
(ALL) GrandTotal

So the conclusion is it all depends upon your data. If you have hierarchical data (Country > State > City or Department > Manager > Salesman) then obviously you’ll use the hierarchical results in most of the cases. And for hierarchical data we need ROLLUP. And if we have non-hierarchical data like (City, Gender, Nationality) then we’ll use non-hierarchical results and for this we’ll use CUBE.

OVER

Over clause actually allows us to use the aggregate functions without GROUP BY.

SELECT Name, Salary, Gender,
GenderTotal = COUNT(Gender) OVER (),
Average = AVG(Salary) OVER (),
MinimumSalary = MIN(Salary) OVER (),
MaxmimumSalary = MAX(Salary) OVER ()
FROM Employees

Now we want to get all these aggregate functions results for each Employee with respect to its Gender. Like this:

SELECT Name, Salary, Gender,
GenderTotal = COUNT(Gender) OVER (PARTITION BY Gender),
Average = AVG(Salary) OVER (PARTITION BY Gender),
MinimumSalary = MIN(Salary) OVER (PARTITION BY Gender),
MaxmimumSalary = MAX(Salary) OVER (PARTITION BY Gender)
FROM Employees

Recommended Free Ebook

Basics of SQL Server

Download Now!