The GROUP BY clause in the SELECT statement can help to divide the rows that are being returned from the SELECT statement into specific groups. We can then apply an aggregate function to each group. So far, we have worked with aggregate functions that return a single aggregate value across the result set. Using the GROUP BY clause, we can calculate aggregates within a group of rows.
The structure of a statement that uses the GROUP BY clause looks like this:
SELECT <column1>…, <aggregate function>
FROM <tablename>
GROUP BY <column1>…;
We have the columns that we want to group in both the SELECT clause and the GROUP BY clause.
For example, if we wanted to know the number of customers that we have in each country, we can use the GROUP BY clause like this:
SELECT country, COUNT(*)
FROM customer
GROUP BY country;
It is important that the column is listed in both the SELECT and GROUP BY clause. If we do not include it in the GROUP BY clause:
SELECT country, COUNT(*)
FROM customer;
We will get an error:
If we did not have the column in the SELECT clause:
SELECT COUNT(*)
FROM customer
GROUP BY country;
There will not be an error. But the result set isn’t useful at all, because it doesn’t indicate what the aggregate function is in relation to:
As you see in the result set above, even though we grouped based on country, we do not know which country each of the counts is for.
We can run a similar query to find the total amount of orders by country from the invoice table:
SELECT billing_country, SUM(total)
FROM invoice
GROUP BY billing_country;
We can also add more complexity by grouping by multiple columns and returning more than one aggregate value. For example, if we wanted to find the total and average of each invoice based on the state and country, we can do the following:
SELECT billing_country, billing_state, SUM(total), ROUND(AVG(total),2)
FROM invoice
GROUP BY billing_country, billing_state
ORDER BY billing_country, billing_state;
Notice above, we have added the ORDER BY clause so that the results are sorted in an order that makes logical sense:
The possibilities are endless for us to group the data and run aggregate functions on those subsets of data.
[MUSIC PLAYING] The group by clause within a select statement helps us to be able to divide different rows into various groups and then do some aggregate functions associated with those different items. So let's go ahead and take a look at an example in this case here.
So if we want to take a look at the number of customers within different countries, one approach that we could take is looking at the country and then identifying those. However, if we want to be able to break things down a little bit further, what we want to do is utilize the group by clause. So the group by clause allows us to be able to filter based on specific columns. So in this case here, being that we want to utilize the country as a choice, we want to include country.
One thing to note, though, is that when we are having anything that's in the group by clause, it should also appear within select. And only items that are in the group by can appear in the select. Otherwise, it'll generate an error. For example, we'd run it as is, it'll identify that the customer ID. In this case here, is the first column, has to appear in the group by clause. So if we choose country in this case here, it would be very similar to running a unique clause in this case to display every single one of different countries only one time.
However, the power of the GROUP BY clause is when we're able to utilize different aggregate functions associated with these components. So for example, in this case here, if you want to take a look at the number of customers that are within each country, we can go ahead and make a change with the count in this case here.
So we have the country and then the count of the number of customers within each one. So as we kind of slow down, we'll see that USA has 13, Brazil has five and so forth. We can utilize any option in this case here with as part of the aggregates as long as there are specific items associated with it. As a reminder when it comes to group by clause, you can only have items that are in the group by clause within the select statement. Otherwise everything else has to be aggregate functions.
[MUSIC PLAYING]
Source: Authored by Vincent Tran