Group By Aggregate Functions In SQL

Aggregate functions are the built-in functions in SQL. They are used for specific operations like to compute the Average of the numbers, Total Count of the records, Total sum of the numbers etc. These are also called Group functions because these functions apply to the group of data.

Aggregate Functions/Group Functions

Aggregate functions are actually the built-in functions in SQL. They are used for some kind of specific operations, like to compute the average of numbers, the total count of the records, the total sum of the numbers etc. These are also called Group functions because these functions apply on the group of data. For example - if you want to compute the maximum salary among different records, then obviously, there will be more than 1 record from which you’re finding the maximum salary. Similarly, if you want to compute the sum of the salary, you’ll have a collection of salaries where you’ll apply this function to get the output.

An example of Aggregate Functions is given below.

  1. Sum(), Min(), Max(), Count(), Avg()  
  2. SELECT COUNT(*) AS InvoiceCount  
  3.       , SUM(Total) AS TotalAllInvoices  
  4.       , AVG(Total) AS AverageTotal  
  5.       , MAX(Total) AS MaxInvoices  
  6.       , MIN(Total) AS MinInvoices  
  7. FROM Invoice  
  8. WHERE BillingCountry = 'USA'  

Let me tell you how this query works. First of all, the SQL Engine extracts the record from the Invoice table on the basis of WHERE clause and then, these above GROUP functions will apply to the extracted group of records. Here, we’re getting the results after applying the group functions on the complete filtered table.

Now, let’s say we want to get the results on some groups of records of a relation (whether it is filtered or non-filtered). So, we use GROUP BY clause in SQL.

  1. SELECT COUNT(*) as Records  
  2. FROM Invoice  
  3. WHERE Total > 2  
  4. GROUP BY BillingCountry  

In this query, we’re getting the total counts of records on the basis of same BillingCountry attribute values. And when we run this query, we’ll get this result.

Group By Having Aggregate Functions 

As you can see, we can’t understand the result just by output values so it is a best practice to also print the attribute on the basis of what we’re getting so as to make the result more meaningful. 

  1. SELECT BillingCountry, COUNT(*) as Records  
  2. FROM Invoice  
  3. WHERE Total > 2  
  4. GROUP BY BillingCountry  

And with the help of this query, we’ll get some result.

Group By Having Aggregate Functions 

Now, let’s see one more example.

  1. SELECT CustomerId  
  2.        , AVG(Total)  
  3. FROM Invoice  
  4. GROUP BY CustomerId  

Look, we put Non-Aggregated attributes with GROUP BY. Now, when this query runs, it will collect the same number of Customer IDs and then take the average of all the column values of Customer Id 2 in the Invoice table. This is how it works.

Now, let’s suppose we’ve another attribute name in the script.

  1. SELECT CustomerId  
  2.         , InvoiceDate   
  3.         , AVG(Total)  
  4. FROM Invoice  
  5. GROUP BY CustomerId  

When you run this statement, you’ll see the error in the Messages window stating - ‘Column 'Invoice.InvoiceDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.’

Group By Having Aggregate Functions 

Because Avg() is an aggregate function which applies on the collection and InvoiceDate is a DateTime field whose value is always different, so if we’ve just applied the Group By on CustomerID, it is not enough because we need to cover InvoiceDate as well. So,

  1. SELECT CustomerId  
  2.         , InvoiceDate   
  3.         , AVG(Total)  
  4. FROM Invoice  
  5. -- Just for better understandability  
  6. WHERE CustomerId = 2  
  7. GROUP BY CustomerId, InvoiceDate  

Look what we get.

Group By Having Aggregate Functions 

Before InvoiceDate, we were getting only 1 record of CustomerId 2 because 2 is the discrete value. And after putting InvoiceDate in the query we’re getting nearly all the records because it is continuous. Only that InvoiceDate record is skipped here when InvoiceDate value matches with other any record of InvoiceDate plus both records contain CustomerId 2. And we’re getting the Average value of all those matching records.

Let’s see one more example.

  1. SELECT BillingCountry, BillingCity, COUNT(*) as Records  
  2. FROM Invoice  
  3. WHERE Total > 2  
  4. GROUP BY BillingCountry, BillingCity  
  5. ORDER BY BillingCountry  

Now, SQL Engine first of all gets the collection of records after applying [FROM and WHERE clause] and then we group the records on the basis of BillingCountry and then it further group the records on the basis of BillingCity which means we’ll get the number of records [COUNT(*) as Records] of each country and of each city [GROUP BY BillingCountry, BillingCity].

Group By Having Aggregate Functions 

Having

Actually, we can’t apply the condition to Aggregate functions with the help of WHERE clause. So, we need Having here.

  1. SELECT CustomerId  
  2.        , AVG(Total)  
  3. FROM Invoice  
  4. GROUP BY CustomerId  
  5. HAVING AVG(Total) > 6  

And if we try to do this with WHERE clause,

‘Incorrect syntax near the keyword 'WHERE'.’

This is what we get.

  1. SELECT [State]  
  2.         , City  
  3.         , AVG(Total) AS CityAverage  
  4.         , SUM(Total) AS CityTotal  
  5. FROM Customer  
  6. JOIN Invoice  
  7. ON Customer.CustomerId = Invoice.CustomerId  
  8. WHERE [State] <> NULL  
  9. GROUP BY [State], City  
  10. HAVING SUM(Total) > 40  
  11. ORDER BY [State], City  

You might be wondering why we put State column name in square brackets. It is because sometimes these words are reserved in SQL Server. And if you try to run the query without putting State attribute name in square brackets, you’ll get the error. Because SQL Server became confused about whether it is your table attribute name or its reserved State keyword. So we put it into square brackets.

HAVING & BETWEEN

We already know that we use BETWEEN to retrieve the records within a specific range. So,

  1. SELECT InvoiceDate  
  2.        , AVG(Total) AS DateAverage  
  3.        , SUM(Total) AS DateTotal  
  4. FROM Invoice  
  5. GROUP BY InvoiceDate  
  6. HAVING InvoiceDate BETWEEN '2012-01-01' AND '2012-12-31'  
  7. ORDER BY InvoiceDate DESC  

Now you might be thinking InvoiceDate is obviously not an aggregate function and it is working. Yes, it also works with non-aggregate but it would be better to use WHERE clause for better performance.

  1. SELECT InvoiceDate  
  2.        , AVG(Total) AS DateAverage  
  3.        , SUM(Total) AS DateTotal  
  4. FROM Invoice  
  5. WHERE InvoiceDate BETWEEN '2012-01-01' AND '2012-12-31'  
  6. GROUP BY InvoiceDate  
  7. ORDER BY InvoiceDate DESC  

Always put WHERE clause before GROUP BY and HAVING clause after GROUP BY otherwise it will not accept and show an error in the Messages window.

Reporting

Mostly, when we’re doing some kind of reporting in SQL Server and we want to get the results on the basis of collections of records we’ve some functions in SQL Server to summarize the results.

  • GROUPING SETS
  • ROLLUP
  • CUBE
  • OVER

These functions are really very important when we’re especially working with Reports.

GROUPING SETS

To understand the GROUPING SETS, let’s create a new table of Employees.

  1. Create Table Employees  
  2. (  
  3.     Id int primary key identity(1, 1),  
  4.     Name nvarchar(50),  
  5.     Gender nvarchar(10),  
  6.     Salary int,  
  7.     Country nvarchar(10)  
  8. )  

And now, we put the values in this Employees table.

  1. Insert Into Employees Values ('Usama''Male', 5000, 'USA')  
  2. Insert Into Employees Values ('Safwan''Male', 4500, 'India')  
  3. Insert Into Employees Values ('Gulraiz''Female', 5500, 'USA')  
  4. Insert Into Employees Values ('Ayesha''Female', 4000, 'India')  
  5. Insert Into Employees Values ('Anas''Male', 3500, 'India')  
  6. Insert Into Employees Values ('Areeha''Female', 5000, 'UK')  
  7. Insert Into Employees Values ('Raza''Male', 6500, 'UK')  
  8. Insert Into Employees Values ('Eeman''Female', 7000, 'USA')  
  9. Insert Into Employees Values ('Faseeh''Male', 5500, 'UK')  
  10. Insert Into Employees Values ('Hassan''Male', 5000, 'USA')  

Now let’s generate the Gender and the Sum of Salaries. So what will we do?

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender  

And it will shows us the result.

Group By Having Aggregate Functions 

Now we want to show the total salaries of individual countries as well. So,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender  
  4.   
  5. UNION ALL  
  6.   
  7. SELECT Country, NULLSUM(Salary) as TotalSalary  
  8. FROM Employees  
  9. GROUP BY Country  

And it shows us this result,

Group By Having Aggregate Functions 

Now we also want total salaries by Gender. So,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender  
  4.   
  5. UNION ALL  
  6.   
  7. SELECT Country, NULLSUM(Salary) as TotalSalary  
  8. FROM Employees  
  9. GROUP BY Country  
  10.   
  11. UNION ALL  
  12.   
  13. SELECT NULL, Gender, SUM(Salary) as TotalSalary  
  14. FROM Employees  
  15. GROUP BY Gender  

And now if we watch the results in Output,

Group By Having Aggregate Functions 

Look we’re making some kind of reports with all statistics. Now let’s we get the final total salaries without any kind of classification. So,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender  
  4.   
  5. UNION ALL  
  6.   
  7. SELECT Country, NULLSUM(Salary) as TotalSalary  
  8. FROM Employees  
  9. GROUP BY Country  
  10.   
  11. UNION ALL  
  12.   
  13. SELECT NULL, Gender, SUM(Salary) as TotalSalary  
  14. FROM Employees  
  15. GROUP BY Gender  
  16.   
  17. UNION ALL  
  18.   
  19. SELECT NULLNULLSUM(Salary) as TotalSalary  
  20. FROM Employees  

And the results we get now,

Group By Having Aggregate Functions 

Now with this final query, we have 2 problems.

  • The query is getting so huge and complex. And if we need some more kind of reporting feature then we’ll also need some more UNION ALL.
  • We’re request again and again to the Employees table which is really very bad.

So the solution to this problem is,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY  
  4.     GROUPING SETS (  
  5.         (Country, Gender),  
  6.         (Country),  
  7.         (Gender),  
  8.         ()  
  9.     )  

Now let’s take a look on the results,

Group By Having Aggregate Functions 

Now this is the time to order the results.

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY  
  4.     GROUPING SETS (  
  5.         (Country, Gender),  
  6.         (Country),  
  7.         (Gender),  
  8.         ()  
  9.     )  
  10. ORDER BY GROUPING(Country), GROUPING(Gender)  

And now our results are in orders. And the concept of GROUPING SETS is clear.

ROLLUP

Let’s rewind once again GROUP BY clause

  1. SELECT Country, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country  

If we run this statement, looks what we get,

Group By Having Aggregate Functions 

Now let’s use ROLLUP and see what happens.

  1. SELECT Country, SUM(Salary) as TotalSalary, AVG(Salary) AS AverageSalary  
  2. FROM Employees  
  3. GROUP BY ROLLUP(Country)  

And this is the result we get,

Group By Having Aggregate Functions 

It means ROLLUP adds a single row if we’re Grouping one column. And obviously we can also achieve the same results with GROUPING SETS

  1. SELECT Country, SUM(Salary) as TotalSalary, AVG(Salary) AS AverageSalary  
  2. FROM Employees  
  3. GROUP BY   
  4.     GROUPING SETS (  
  5.         (Country),  
  6.         ()  
  7.     )  

Yes we can also apply any number of arguments inside ROLLUP()

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY ROLLUP(Country, Gender)  

And this query is equal to,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY   
  4.     GROUPING SETS (  
  5.         (Country, Gender),  
  6.         (Country),  
  7.         ()  
  8.     )  

Here we’re grouping the Salary by Country and Gender and it will also display the Subtotal of Country salaries and then the Grand Total.

Group By Having Aggregate Functions 

The classical way of writing this rollup query is,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender WITH ROLLUP  

So don’t confuse this.

CUBE

This is used to get the Sum of Salaries grouped by all the combinations of Country and Gender. So,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY CUBE(Country, Gender)  

And it will show the result in output window,

Group By Having Aggregate Functions 

And now we need to order the results again,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY CUBE(Country, Gender)  
  4. ORDER BY GROUPING(Country), GROUPING(Gender)  

Now you can guess from the results that where we get these results before. Yes it is equivalent to this query.

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY   
  4.     GROUPING SETS (  
  5.         (Country, Gender),  
  6.         (Country),  
  7.         (Gender),  
  8.         ()  
  9.     )  
  10. ORDER BY GROUPING(Country), GROUPING(Gender)  

The classical way of this CUBE sql statement is,

  1. SELECT Country, Gender, SUM(Salary) as TotalSalary  
  2. FROM Employees  
  3. GROUP BY Country, Gender WITH CUBE  

And we’re using the latest approach above.

Note
If you’re applying ROLLUP & CUBE on the single column, you won’t see any difference. Both queries will show you the same results. ROLLUP and CUBE are both performance tools actually. If you want to display the data hierarchically then you should use ROLLUP and if you want to display all the combinations of data then you should use CUBE.

For example,

Suppose we want to get the total population of a country, state and city. So if we use ROLLUP here, it will calculate the population of country, state and city first and then it comes to country and state and then it calculates the population of country and then sum all the grand total population. On the other hand, if we’re working with CUBE. It will calculate all the combinations of data and then sum it like,

  1. Country, State, City  
  2. Country, State  
  3. Country, City  
  4. Country  
  5. State, City  
  6. State  
  7. City  
  8. (ALL) GrandTotal  

So the conclusion is it all depends upon your data. If you have hierarchical data (Country > State > City or Department > Manager > Salesman) then obviously you’ll use the hierarchical results in most of the cases. And for hierarchical data we need ROLLUP. And if we have non-hierarchical data like (City, Gender, Nationality) then we’ll use non-hierarchical results and for this we’ll use CUBE.

OVER

Over clause actually allows us to use the aggregate functions without GROUP BY.

  1. SELECT Name, Salary, Gender,  
  2.     GenderTotal = COUNT(Gender) OVER (),  
  3.     Average = AVG(Salary) OVER (),  
  4.     MinimumSalary = MIN(Salary) OVER (),  
  5.     MaxmimumSalary = MAX(Salary) OVER ()  
  6. FROM Employees  

Now we want to get all these aggregate functions results for each Employee with respect to its Gender. Like this:

  1. SELECT Name, Salary, Gender,  
  2.     GenderTotal = COUNT(Gender) OVER (PARTITION BY Gender),  
  3.     Average = AVG(Salary) OVER (PARTITION BY Gender),  
  4.     MinimumSalary = MIN(Salary) OVER (PARTITION BY Gender),  
  5.     MaxmimumSalary = MAX(Salary) OVER (PARTITION BY Gender)  
  6. FROM Employees