I will be using this database for all examples which is available in my Github public repo: https://github.com/ydchauh/yogeshchauhan.com-public

PostgreSQL GROUPING SETS

More complex grouping operations are possible using the concept of grouping sets.

There are 3 steps.

1. The data selected by the FROM and WHERE clauses is grouped separately by each specified grouping set,

2. aggregates computed for each group just as for simple GROUP BY clauses,

3. and then the results returned. 

I will go though each step in separate queries later on but for now, let's go through GROUPING SET syntax and example.

PostgreSQL GROUPING SET Syntax


SELECT
    column_1,
    column_2,
    aggregate_function(column_3)
FROM
    table
GROUP BY
    GROUPING SETS (
        (column_1, column_2),
        (column_1),
        (column_2),
        ()
);

The above syntax has 4 GROUPING SETS but we can add or remove as per our requirement. 

PostgreSQL GROUPING SET Example

Note: References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear.


SELECT
    ship_city,
    ship_country,
    SUM (freight)
FROM
    orders
GROUP BY
    GROUPING SETS (
        (ship_city, ship_country),
        (ship_city),
        (ship_country),
        ()
    );

//Output
ship_city,    ship_country     sum
null          null            64942.74
"Seattle"     "USA"           1353.0599
"Kirkland"    "USA"           70.009995
"Paris"       "France"        108.28
"Genève"      "Switzerland"   1001.2899
"Campinas"    "Brazil"        322.38004
....
....
162 rows

I know that for the first time, it does seem confusing. But I am going to break it down in small chunks of query.

In this part we are just telling Postgres to display these 3 columns from the table order:


SELECT
    ship_city,
    ship_country,
    SUM (freight)
FROM
    orders

And then we are telling postgres to GROUP BY using different GROUPING SETS.

The first one is :


(ship_city, ship_country),

which is going to group by the query results using ship_city and ship_country and the next two will separately GROUP BY the query results using ship_city and ship_country.

The last one, where we don't specify anything in the brackets (), will returns the SUM of freight shipped to all cities and countries. It defines an empty grouping set.

Now, how GROUPING SETS make it easier to write the query?

Let's write down the query for the same results that we got using GROUPING SETS in the example above.


SELECT
    NULL,
    NULL,
    SUM (freight)
FROM
    orders
	
UNION ALL
	
SELECT
    ship_city,
    ship_country,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_city,
    ship_country

UNION ALL

SELECT
    ship_city,
    NULL,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_city

UNION ALL

SELECT
    NULL,
    ship_country,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_country

//Output

ship_city,    ship_country     sum
null          null            64942.74
"Seattle"     "USA"           1353.0599
"Kirkland"    "USA"           70.009995
"Paris"       "France"        108.28
"Genève"      "Switzerland"   1001.2899
"Campinas"    "Brazil"        322.38004
....
....
162 rows

Let's break down this monstrous query.

Because the UNION ALL requires all result sets to have the same number of columns with compatible data types, you need to adjust the queries by adding NULL to the selection list of each.

In all the following queries you can remove the null. I have kept it to make it easy to understand

1st chunk


SELECT
    NULL,
    NULL,
    SUM (freight)
FROM
    orders

//Output

column?  column?    sum

null     null	    64942.74. 

This query finds the total freight shipped to all cities and countries. It defines an empty grouping set.

2nd chunk


SELECT
    ship_city,
    ship_country,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_city,
    ship_country

//Output

ship_city     ship_country     sum
"Seattle"	  "USA"	           1353.0599
"Kirkland"    "USA"	           70.009995
"Paris"	      "France"	       108.28
...
...
70 rows

This query defines a grouping set of the ship_city and ship_country. It returns the freight shipped to each city with it's country.

3rd chunk


SELECT
    ship_city,
    NULL,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_city

//Output

ship_city   column?           sum
"Butte"		null              129.96
"Paris"		null              108.28
"Cowes"		null              363.65
...
...
70 rows

This query finds the freight shipped to ship_city. It defines a grouping set of the city.

4th chunk


SELECT
    NULL,
    ship_country,
    SUM (freight)
FROM
    orders
GROUP BY
    ship_country

//Output

column?    ship_country     sum
null       "Argentina"	    598.58
null       "Spain"	        861.88995
null	   "Switzerland"    1368.5298
...
...
21 rows

This query finds the freight shipped to ship_country. It defines a grouping set of the country.

All those queries combined make the unified result set we saw earlier with the aggregated data for all grouping sets.

Summary

Even though the query gives us the result set we want, it is really really lengthy. And because of that it has a performance issue as PostgreSQL has to scan the orders table separately for each chunk of small queries.

That is why PostgreSQL has the GROUPING SETS which is the sub clause of the GROUP BY clause.

The GROUPING SETS allows you to define multiple grouping sets in the same query.

0 Comments

Leave a reply

Your email address will not be published. required fields are marked *