YogeshChauhan.com
Recursive WITH Queries in Postgres (Common Table Expressions)
June 20, 2020

SELECT in WITH RECURSIVE

We saw the use of WITH in Common Table Expressions in this post: Common Table Expressions (CTE) In PostgreSQL

CTE are nothing but temporary tables that we can use in the same query execution.

The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL.

Using RECURSIVE, a WITH query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

//Output

5050

In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE clause, and so the query terminates.

From the query above, we can easily make the syntax:


WITH RECURSIVE CTE_name AS(
    CTE_query -- non-recursive
    UNION [ALL]
    CTE_query  -- recursive
) SELECT * FROM CTE_name;

The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output.

Such a query is executed as follows:

Recursive Query Evaluation

1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

2. So long as the working table is not empty, repeat these steps:

1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

Example

Recursive queries are typically used to deal with hierarchical or tree-structured data.

I am using this database for the next example which is available on my Github public repo

The following query returns the list of all the employees who reports to employee with id 5.


WITH RECURSIVE employee_list AS (
	SELECT
		employee_id,
		reports_to,
		first_name, 
	    title
	FROM
		employees
	WHERE
		reports_to = 5
	UNION
		SELECT
			e.employee_id,
			e.reports_to,
			e.first_name, 
			e.title
		FROM
			employees e
		INNER JOIN employee_list s ON s.employee_id = e.reports_to
) SELECT
	*
FROM
	employee_list;

//Output

employee_id   reports_to    first_name      title
6                     5	    "Michael"       "Sales Representative"
7                     5     "Robert"        "Sales Representative"
9                     5     "Anne"          "Sales Representative"

End of results

When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely.

Sometimes, using UNION instead of UNION ALL can accomplish this by discarding rows that duplicate previous output rows.

However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before.

The standard method for handling such situations is to compute an array of the already-visited values. 

For example, consider the following query that searches a table graph using a link field:


WITH RECURSIVE search_graph(id, link, data, depth) AS (
        SELECT g.id, g.link, g.data, 1
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1
        FROM graph g, search_graph sg
        WHERE g.id = sg.link
)
SELECT * FROM search_graph;

This query will loop if the link relationships contain cycles.

Because we require a "depth" output, just changing UNION ALL to UNION would not eliminate the looping.

Instead we need to recognize whether we have reached the same row again while following a particular path of links.

We add two columns path and cycle to the loop-prone query:


WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[g.id],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || g.id,
          g.id = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

Aside from preventing cycles, the array value is often useful in its own right as representing the "path" taken to reach any particular row.

In the general case where more than one field needs to be checked to recognize a cycle, use an array of rows.

For example, if we needed to compare fields f1 and f2:


WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
        SELECT g.id, g.link, g.data, 1,
          ARRAY[ROW(g.f1, g.f2)],
          false
        FROM graph g
      UNION ALL
        SELECT g.id, g.link, g.data, sg.depth + 1,
          path || ROW(g.f1, g.f2),
          ROW(g.f1, g.f2) = ANY(path)
        FROM graph g, search_graph sg
        WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

Tip: Omit the ROW() syntax in the common case where only one field needs to be checked to recognize a cycle. This allows a simple array rather than a composite-type array to be used, gaining efficiency.

Tip: The recursive query evaluation algorithm produces its output in breadth-first search order. You can display the results in depth-first search order by making the outer query ORDER BY a "path" column constructed in this way.

A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT in the parent query.

For example, this query would loop forever without the LIMIT:


WITH RECURSIVE t(n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;

This works because PostgreSQL's implementation evaluates only as many rows of a WITH query as are actually fetched by the parent query.

Using this trick in production is not recommended, because other systems might work differently.

Also, it usually won't work if you make the outer query sort the recursive query's results or join them to some other table, because in such cases the outer query will usually try to fetch all of the WITH query's output anyway.

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries.

Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work.

Another possible application is to prevent unwanted multiple evaluations of functions with side-effects.

However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query.

The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

The examples above only show WITH being used with SELECT, but it can be attached in the same way to INSERT, UPDATE, or DELETE.

In each case it effectively provides temporary table(s) that can be referred to in the main command.

dreamhost

Leave a Reply

Most Read

#1 How to check if radio button is checked or not using JavaScript? #2 How to set opacity or transparency using CSS? #3 Pagination in CSS with multiple examples #4 How to make HTML form interactive and using CSS? #5 Solution to “TypeError: ‘x’ is not iterable” in Angular 9 #6 How to uninstall Cocoapods from the Mac OS?

Recently Posted

Mar 4 How to use data-* Attributes in HTML? Mar 4 The substr() method in JavaScript and how it’s different from substring() Mar 4 A complete guide to add responsive YouTube videos using HTML and CSS Mar 3 How to embed YouTube or other video links in WordPress? Mar 3 How to change the Login Logo in WordPress? Mar 3 substring() Method in JavaScript

You might also like these

The SQL UNION OperatorSQL/MySQLHigher Order Functions in JavaScript with ExamplesJavaScript3 Types of Arrays in PHPPHPSQL Inner JoinSQL/MySQL4 different Ways to Get JavaScript OutputJavaScriptHow to remove special characters (dash, asterisk etc) from any string in PHP?PHP