The complexity of the SELECT statements can increase based on the criteria being asked. In some cases, we may make use of both the WHERE and HAVING clauses. Recall that the WHERE clause is used as the filter for individual rows and the HAVING clause is for groups of rows.
Anything that we have in the aggregate function results are items that we can use in the HAVING clause. As an example, if we were asked to get all of the invoices and cost from the invoice_line table of those that had the cost as greater than 1, ordered by the invoice_id, the query would look like the following:
SELECT invoice_id, SUM(unit_price * quantity)
FROM invoice_line
GROUP BY invoice_id
HAVING SUM(unit_price * quantity) > 1
ORDER BY invoice_id;
Remember that the WHERE clause sees one row at a time, so we would not be able to evaluate the SUM across all of the invoice_id values. The HAVING clause is executed after the groups have been created.
If we were asked to expand on this to find invoice_id values that were greater than 100, we could add this to the HAVING clause to act similar to the WHERE clause:
SELECT invoice_id, SUM(unit_price * quantity)
FROM invoice_line
GROUP BY invoice_id
HAVING SUM(unit_price * quantity) > 1
AND invoice_id > 100
ORDER BY invoice_id;
The reason we could do this is due to the fact that the invoice_id is part of the GROUP BY clause. However, if we needed to filter based on another column like the unit_price to check if it was more than 1 before we grouped them, and added it to the HAVING clause:
SELECT invoice_id, SUM(unit_price * quantity)
FROM invoice_line
GROUP BY invoice_id
HAVING SUM(unit_price * quantity) > 1
AND unit_price > 1
ORDER BY invoice_id;
We would have an error generated:
This is due to the fact that the unit_price column is not part of the GROUP BY field nor a result of an aggregate function. To be valid in the HAVING clause, we can only compare the aggregate functions or the column part of the GROUP BY. For it to be a valid query, the check on the unit_price needs to be moved to the WHERE clause:
SELECT invoice_id, SUM(unit_price * quantity)
FROM invoice_line
WHERE unit_price > 1
GROUP BY invoice_id
HAVING SUM(unit_price * quantity) > 1
ORDER BY invoice_id;
We should see now that the results look quite different:
This is because we are filtering out the rows that have the unit_price being greater than 1 before we combine each into groups.
Let's look at another scenario where we are interested in invoices for a set of customers (customer_id between 20 and 30) that have their billing country in the USA. We want to find those that have had at least one invoice that has a total larger than 15. We want to also get the total amount they have ordered at all times. This may seem like a very complex query, but we will want to break the query down first.
First, we know that we are looking for data using the invoice table.
SELECT *
FROM invoice;
Exploring this data, we know that we want to look at a specific set of customers. We can identify two criteria without aggregate conditions that we can add to a WHERE clause:
SELECT *
FROM invoice
WHERE billing_country = 'USA'
AND customer_id BETWEEN 20 AND 30;
This gives us 63 records that fit these criteria:
However, we need only the customer_id to be returned along with the SUM of the total and the MAX of the total. The customer_id should be what we are grouping by as well. This would change our SELECT statement to look like:
SELECT customer_id, SUM(total),MAX(total)
FROM invoice
WHERE billing_country = 'USA'
AND customer_id BETWEEN 20 AND 30
GROUP BY customer_id;
The next step is to find the groups that have the maximum of the total being greater than 15. As this is looking at an aggregate function, it has to go into the HAVING clause.
SELECT customer_id, SUM(total),MAX(total)
FROM invoice
WHERE billing_country = 'USA'
AND customer_id BETWEEN 20 AND 30
GROUP BY customer_id
HAVING MAX(total) > 15;
In looking at the query, one thing to note is that the customer_id is in the GROUP BY clause, so the comparison for the customer_id could have also appeared in the HAVING clause as well, like this:
SELECT customer_id, SUM(total),MAX(total)
FROM invoice
WHERE billing_country = 'USA'
GROUP BY customer_id
HAVING MAX(total) > 15
AND customer_id BETWEEN 20 AND 30;
This would deliver the same result set:
Queries can get quite complex, but it is important to note that you can just break it down to separate pieces, to have the where clause to be on separate rows, whereas the group by and the having clauses are based on actual aggregate values. So for example, if we're trying to find out from this invoice table the customer's IDs, and then along with, the total amount that they've actually ordered, the maximum in terms of the order that they've actually made, of all the individuals that are in USA. And then we're going to group that based on the customer ID and then having that total of the max value being greater than 5.
Let's go ahead and take a look at how we can actually break that down. So as this first starting point, it's always a good idea to first, define out what your items are going to be. So in this case here, we're going to go ahead and do, select customer_id. And then we want to look at the sum of the total as well as the max of the total.
Then we wanted to identify that, because we're looking at individuals that are in USA, we're looking at billing_country, in this case here. That's equal to USA. Now it's important to note that we couldn't have this as part of the having clause, being that we're not really grouping by the country at all, grouping only by the customer_id. And then what we also want to do is having, such that we're looking at the maximum of the total is at least $15.
So this will result in three customers that all live in the USA based on the where clause. We're going to group based on that. The total sum of the orders for each one of them is going to be 48, 44, and 43, while the maximum total is going to be 24, 16, and 19.
If we try to remove one of these-- like for example, we just change this to 1, this should result in pretty much all of them that's going to be displayed, that live in the USA. In this case here, there are going to be 13 of them. And then we'll be able to see, in some of these cases, many of them are actually at 14, which is why they were eliminated.
Once we've actually made that shift, in this case here to 15, if we remove the where clause, we'll see that we're actually returning 11. We don't know what those countries are going to be, being that we'd have to include that in. However, that should give you a good starting point to be able to make those comparisons and then add in some of those complexities in this case here.
Source: Authored by Vincent Tran