# Do subqueries add expressive power to SQL queries?

Does SQL need subqueries?

Imagine a sufficiently generalized implementation of the structured query language for relation databases. Since the structure of the canonical SQL SELECT statement is actually pretty important for this to make sense, I don’t appeal directly to relational algebra, but you could frame this in those terms by making appropriate restrictions on the form of expressions.

An SQL SELECT query generally consists of a projection (the SELECT part) some number of JOIN operations (the JOIN part), some number of SELECTION operations (in SQL, the WHERE clauses), and then set-wise operations (UNION, EXCEPT, INTERSECT, etc.), followed by another SQL SELECT query.

Tables being joined can be the computed results of expressions; in other words, we can have a statement such as:

SELECT t1.name, t2.address
FROM table1 AS t1
FROM table2 AS t3
WHERE t3.id = t1.id) AS t2
WHERE t1.salary > 50,000;


We will refer to the use of a computed table as part of an SQL query as a subquery. In the example above, the second (indented) SELECT is a subquery.

Can all SQL queries be written in such a way as to not use subqueries? The example above can:

SELECT t1.name, t2.address
FROM table1 AS t1
JOIN table2 AS t2
ON t1.id = t2.id
WHERE t1.salary > 50,000;


This example is somewhat spurious, or trivial, but one can imagine instances where considerably more effort might be required to recover an equivalent expression. In other words, is it the case that for every SQL query $q$ with subqueries, there exists a query $q'$ without subqueries such that $q$ and $q'$ are guaranteed to produce the same results for the same underlying tables? Let us limit SQL queries to the following form:

SELECT <attribute>,
...,
<attribute>
FROM <a table, not a subquery>
JOIN <a table, not a subquery>
...
JOIN <a table, not a subquery>
WHERE <condition>
AND <condition>
...
AND <condition>

UNION
-or-
EXCEPT
-or-
<similar>

SELECT ...


And so on. I think left and right outer joins don’t add much, but if I am mistaken, please feel free to point that out… in any event, they are fair game as well. As far as set operations go, I guess any of them are fine… union, difference, symmetric difference, intersection, etc… anything that is helpful. Are there any known forms to which all SQL queries can be reduced? Do any of these eliminate subqueries? Or are there some instances where no equivalent, subquery-free query exists? References are appreciated… or a demonstration (by proof) that they are or aren’t required would be fantastic. Thanks, and sorry if this is a celebrated (or trivial) result of which I am painfully ignorant.

There is some terminology confusion; the query block within parenthesis

SELECT t1.name, t2.address
FROM table1
FROM table2 AS t3
WHERE t3.id = t1.id)


is called inner view. A subquery is query block within either WHERE or SELECT clause, e.g.

select deptno from dept
where 3 < (select count(1) from emp
where dept.deptno=emp.deptno)


In either case, inner view or subquery can be unnested into “flat” project-restrict-join. Correlated subquery with aggregation unnests into inner views with grouping, which then unnests into flat query.

select deptno from dept d
where 3 < (select avg(sal) from emp e
where d.deptno=e.deptno)

select d.deptno from dept d, (
select deptno from emp e
group by deptno
having avg(sal) > 3
) where d.deptno=e.deptno

select d.deptno from dept d, emp e
where d.deptno=e.deptno
group by d.deptno
having avg(sal) > 3


As for algebraic rules for query optimization, relational algebra is known to be axiomatized into Relational Lattice which simplifies query transformations as demonstrated here and there.