Like most other relational database products, PostgreSQL supports aggregate functions.
An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the count, sum, avg (average), max (maximum) and min (minimum) over a set of rows.
For example if you want to find out the product with highest unit price then we can query like this:
I am using the database available in my Github public repo: https://github.com/ydchauh/yogeshchauhan.com-public
SELECT MAX(unit_price) FROM products; //Output Max 263.5
What if we want to find out the name of the product with that price as well:
SELECT product_name FROM products WHERE unit_price = MAX(unit_price); //THIS IS WRONG
This will not work since the aggregate max cannot be used in the WHERE clause.
This restriction exists because the WHERE clause determines the rows that will go into the aggregation stage; so it has to be evaluated before aggregate functions are computed.
However, as is often the case the query can be restated to accomplish the intended result, here by using a subquery:
SELECT product_name FROM products WHERE unit_price = (SELECT MAX(unit_price) FROM products); //Output product_name "Côte de Blaye"
This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in the outer query.
We saw GROUP BY clause examples in this post: Order By And Group By In Postgres
Aggregates are also very useful in combination with GROUP BY clauses.
For example, we can get the maximum unit_price observed in each category_id with:
SELECT category_id, MAX(unit_price) FROM products GROUP BY category_id; //Output category_id, MAX 8 62.5 7 53 1 263.5 5 38 ..... .....
Of course the results are unordered as I haven't specified ORDER BY clause.
You can learn about ORDER BY clause in the same blog post: Order By And Group By In Postgres
Each aggregate result is computed over the table rows matching that category_id.
We can filter these grouped rows using HAVING:
SELECT category_id, MAX(unit_price) FROM products GROUP BY category_id HAVING max(unit_price) < 40; //Output category_id, MAX 5 38
We can get the category_name using JOIN as category details are in the different table but that's not the point of discussion as per now.
We can also use WHERE and HAVING together:
Let's say we just care about only the product whose names start with 'Ma'. So, we can apply query like this:
SELECT product_name, category_id, MAX(unit_price) FROM products WHERE product_name LIKE 'Ma%' GROUP BY category_id, product_name HAVING max(unit_price) < 40; //Output product_name, category_id, MAX "Maxilaku" 3 20 "Mascarpone Fabioli" 4 32
Now, of course when you want product_name to be displayed then you must specify it in the GROUP BY clause if you're not using it in aggregate functions.
It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses.
The fundamental difference between WHERE and HAVING is this:
WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.
Thus, the WHERE clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates.
On the other hand, the HAVING clause always contains aggregate functions.
Strictly speaking, you are allowed to write a HAVING clause that doesn't use aggregates, but it's seldom useful. The same condition could be used more efficiently at the WHERE stage.