Hello, welcome to Appsloveworld Technologies, In in theis post will talk about how the group rows using group by, filtering groups, and the difference between where and having clauses.

What is group by clause?

Now, if you look at the definition, the group by clause is used to group a selected set of rows into a set of summary rows by the values of one or more columns or expressions. It is always used in conjunction with one or more aggregate functions.Source

There are many aggregate function available in the SQL server. For example, we have Sum(),Average(), count(), Min(), Max(), etc.

Let’s look at them by example. So I have TblOrganizationEmployee which has got EmployeeName, EmployeeSalary, Country, Adress, and DepartmentId columns. And then I have the Department table which has got the Id and Department Name.

TblOrganizationEmployee and Department are related using foreign key DepartmentId.

if you need sql script of tables you find at the end of post.

what is the use of group by in sql server

If you look at the TblOrganizationEmployee table I have got the EmployeeSalary column.

Let’s say I want the total salary that I am paying within my organization. So obviously, what will you do if you have to do it manually? You will take each row and add them together, but in the SQL server, we have an aggregate function called sum(), which we can use. Like the below query.

select SUM(EmployeeSalary) from [dbo].[TblOrganizationEmployee]

that gives us the total salary.

sql aggregate functions

Similarly, if you want to say let’s say, for example, I want to find out total employees in my organization.

select count(*) from [dbo].[TblOrganizationEmployee]

I want the total salary that I’m paying to my employees by country, how do we achieve this? Using group by.
sql aggregate functions group by

if you have to do it manually, what will you do?. You will take each Australia record and then add the total salary.

Similarly, you take each India record and add their salaries. so basically what are you doing? You first group the records by country and then sum the salary column across that group.

So that’s why we use group by clause here. So let’s see how to use that.

Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
Group by Country

Query Explantion

if you look at the query that we have written, we are using this Sum() aggregate function to sum up the entire salary that we are paying to our employees.

In our select list, we need to have the country as well as the totalsalary. Now, if you look at for the totalsalary, we didn’t have a column name, so we give a column name by using ‘as’ and we will say totalsalary.

So now let’s look at query select country, the sum of salary from TblOrganizationEmployee.

you are saying, I want the country and the total salary, but you should also tell I want to group by country, so group by country and then calculate the total salaries across that country.so when we execute query you should see the same output.

Let’s see what’s going to happen if I remove this group by clause and then execute this query.

Select Country, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee

this is going to give an error because you’re saying, select country column, and select the sum of salaries.

so you’re saying you want country column and the total salary, but you are not telling the query to group them by country and then total salaries. if I don’t use the salary country column then it makes sense.

I want the total salaries in the entire table, but when I use the country column you should tell it, group by country as well. So obviously if I execute this query you will get below error
“Msg 8120, Level 16, State 1, Line 4
Column ‘TblOrganizationEmployee.Country’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

Now let’s remove the sum aggregate function in the select list and execute the query.

Select Country, EmployeeSalary as TotalSalary
from TblOrganizationEmployee
Group by Country

“Msg 8120, Level 16, State 1, Line 8you will get below error
Column ‘TblOrganizationEmployee.EmployeeSalary’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

so the columns that you use in the select list, they have to satisfy one of these two conditions. You should either have an aggregate function applied on that column or that column should be part of the group by clause. If none of these conditions are met, then you get the above error.

Group by based on multiple columns

Let’s take this one more level, not only by country. I want to break it down further by DepartmentId.
For example, in Australia, how much is the total salary that I’m paying for each department Marketing,Research and Development,  Accounting and Finance employees?.

So we are grouping by multiple columns. we are grouping this total salary first by country and then by DepartmentId. so how do we do that?
sql sum group by multiple columns

It’s pretty simple, in the select list we should now have three columns. We already have a country and TotalSalary columns.

We want the departmentId column as well also specified in the group by clause as well, to tell the SQL server engine, I want you to total the salaries first by country and then by departmentId because we want that breakdown in that order

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId

when we execute this query, you should see the output.But if you look at our output, it’s not sort
So obviously, we used the order by clause to order the result.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary

Now we have seen we can group based on multiple columns.

How to use multiple aggregate functions?

if it’s possible to use multiple aggregate functions let’s understand with example .we are showing the totalsalary by country and DepartmentId. Let’s say I want the total number of employees as well.

Now, if you remember, if you want to find out the total number of employees we can use Count(),

if you look at TblOrganizationEmployee table, we’ve got 11 rows there.

And if I want to find the total number of employees within my organization, I can say count(*) from TblOrganizationEmployee and we get the total count, which is 11.
For performance reasons, instead of using * you as a specific column name, you know, maybe EmployeeName so I get the same result.

select count(EmployeeName) from [dbo].[TblOrganizationEmployee]

So what we want? We not only want total salaries, buy a country, and DepartmentId, we also want the total number of employees. So if you want the total number of employees, we use the count() aggregate function.
sql count group by multiple columns

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
order by TotalSalary

All right, so we have seen how to use multiple aggregate functions as well.

How to filter the groups?

But let’s say I want to see the Marketing(i.e DepartmentId=1) department salaries. Now usually to filter the rows, we can use the where clause, you know, where you can say DepartmentId is equal to 1.
sql group by multiple columns having count

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
where DepartmentId=1
group by Country, DepartmentId
order by TotalSalary

The other way to get the same result is instead of using the where clause, we can use the having clause, but the having clause should come after the group by.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having DepartmentId=1
order by TotalSalary

Difference between having and where clause

Now let’s see what’s the difference between these two queries? when you use where clause, only DepartmentId=1 records are retrieved, and then they are grouped.

But whereas in this when you use the having clause, all the rows from the table TblOrganizationEmployee are retrieved, they are grouped by Country,DepartmentId and then only DepartmentId=1 groups are shown.

Where clause filters row before aggregations, before groupings are performed, whereas having clause filters groups after the aggregations are performed.

Another difference is that you can use having clause only with the select statement, whereas where clause can be used with other than select statements as well, like insert and updates as well.

Aggregate functions cannot be used in the where clause what do we mean by aggregate functions cannot be used in the where clause.Let’s understand with an example.

select * from [dbo].[TblOrganizationEmployee] where count(id)>5

Excute above query you will get the error
“Msg 147, Level 15, State 1, Line 11
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.”

you cannot use aggregate functions with the where clause, whereas aggregate functions can be used in the having clause.

Select Country, DepartmentId, SUM(EmployeeSalary) as TotalSalary,COUNT(EmployeeName) as TotalEmployees
from TblOrganizationEmployee
group by Country, DepartmentId
having COUNT(EmployeeName)=1
order by TotalSalary

result

Table Script

/****** Object:  Table [dbo].[Department]    Script Date: 09/22/2020 6:06:10 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Department](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Department] [nvarchar](max) NULL,
 CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)
)
GO
/****** Object:  Table [dbo].[TblOrganizationInfo]    Script Date: 09/22/2020 6:06:11 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TblOrganizationInfo](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Location] [nvarchar](50) NULL,
    [TravelAgent] [nvarchar](50) NULL,
    [BookingAmount] [float] NULL,
 CONSTRAINT [PK_TblOrganizationInfo] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)
)
GO
SET IDENTITY_INSERT [dbo].[Department] ON 
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (1, 'Marketing')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (2, 'Research and Development')
GO
INSERT [dbo].[Department] ([Id], [Department]) VALUES (3, 'Accounting and Finance')
GO
SET IDENTITY_INSERT [dbo].[Department] OFF
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] ON 
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (1, 'France', 'Michael', 2000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (2, 'USA', 'Jackson', 2200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (3, 'UK', 'Jacob', 3000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (4, 'India', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (5, 'USA', 'Michael', 1500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (6, 'France', 'Jacob', 1000)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (7, 'India', 'Luke', 1200)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (8, 'USA', 'Jacob', 1700)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (9, 'UK', 'Michael', 1900)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (10, 'France', 'Jacob', 2100)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (11, 'USA', 'Luke', 2500)
GO
INSERT [dbo].[TblOrganizationInfo] ([Id], [Location], [TravelAgent], [BookingAmount]) VALUES (12, 'India', 'Michael', 3000)
GO
SET IDENTITY_INSERT [dbo].[TblOrganizationInfo] OFF
GO

The post How to use group by in SQL Server query? appeared first on Software Development | Programming Tutorials.



Read More Articles