Does SQL need subqueries?
Imagine a sufficiently generalized implementation of the structured query language for relation databases. Since the structure of the canonical SQL
SELECTstatement is actually pretty important for this to make sense, I don’t appeal directly to relational algebra, but you could frame this in those terms by making appropriate restrictions on the form of expressions.
SELECTquery generally consists of a projection (the
SELECTpart) some number of
JOINpart), some number of
SELECTIONoperations (in SQL, the
WHEREclauses), and then set-wise operations (
INTERSECT, etc.), followed by another SQL
Tables being joined can be the computed results of expressions; in other words, we can have a statement such as:
SELECT t1.name, t2.address FROM table1 AS t1 JOIN (SELECT id, address FROM table2 AS t3 WHERE t3.id = t1.id) AS t2 WHERE t1.salary > 50,000;
We will refer to the use of a computed table as part of an SQL query as a subquery. In the example above, the second (indented)
SELECTis a subquery.
Can all SQL queries be written in such a way as to not use subqueries? The example above can:
SELECT t1.name, t2.address FROM table1 AS t1 JOIN table2 AS t2 ON t1.id = t2.id WHERE t1.salary > 50,000;
This example is somewhat spurious, or trivial, but one can imagine instances where considerably more effort might be required to recover an equivalent expression. In other words, is it the case that for every SQL query q with subqueries, there exists a query q′ without subqueries such that q and q′ are guaranteed to produce the same results for the same underlying tables? Let us limit SQL queries to the following form:
SELECT <attribute>, ..., <attribute> FROM <a table, not a subquery> JOIN <a table, not a subquery> ... JOIN <a table, not a subquery> WHERE <condition> AND <condition> ... AND <condition> UNION -or- EXCEPT -or- <similar> SELECT ...
And so on. I think left and right outer joins don’t add much, but if I am mistaken, please feel free to point that out… in any event, they are fair game as well. As far as set operations go, I guess any of them are fine… union, difference, symmetric difference, intersection, etc… anything that is helpful. Are there any known forms to which all SQL queries can be reduced? Do any of these eliminate subqueries? Or are there some instances where no equivalent, subquery-free query exists? References are appreciated… or a demonstration (by proof) that they are or aren’t required would be fantastic. Thanks, and sorry if this is a celebrated (or trivial) result of which I am painfully ignorant.
There is some terminology confusion; the query block within parenthesis
SELECT t1.name, t2.address FROM table1 JOIN (SELECT id, address FROM table2 AS t3 WHERE t3.id = t1.id)
is called inner view. A subquery is query block within either WHERE or SELECT clause, e.g.
select deptno from dept where 3 < (select count(1) from emp where dept.deptno=emp.deptno)
In either case, inner view or subquery can be unnested into “flat” project-restrict-join. Correlated subquery with aggregation unnests into inner views with grouping, which then unnests into flat query.
select deptno from dept d where 3 < (select avg(sal) from emp e where d.deptno=e.deptno) select d.deptno from dept d, ( select deptno from emp e group by deptno having avg(sal) > 3 ) where d.deptno=e.deptno select d.deptno from dept d, emp e where d.deptno=e.deptno group by d.deptno having avg(sal) > 3