Real Life SQL and .NET : SQL Subqueries: Part VII


SQL Subqueries

Subqueries are extremely useful, particularly for web-based database applications where you need to take two queries and manually put them together to reach a desired result -- subqueries allow SQL to do all of the heavy lifting! Subqueries can also be used in many cases to replace a self-join. A SQL join is usually quicker but, as we've discussed many times before, there is usually more than one way to perform any given SQL task.

Is there a query in a query?

The subquery is fairly straightforward part of the SQL specification. In a nutshell, a subquery is a SQL SELECT statement that is placed in the predicate of any other SQL statement we've explored -- SELECT, INSERT, UPDATE, or DELETE. You're quite smart enough on your own to figure them out without my intervention, but we'll cover them here to make sure we've hit everything!

A subquery can be used in a number of scenarios:

  • SELECT/UPDATE/DELETE... WHERE (SELECT ...) which can be used to filter data before an action is applied to the results of that filter;

  • INSERT INTO.... SELECT.... which can be used to copy tables or portions of tables into a new table for further manipulation;

  • Another subquery which can then be nested again up to the limits of your database platform -- or your sanity and understanding.

You've probably found yourself at various points in SQL development mentally creating subqueries in your head -- things like finding all of your high-volume customers who are also the ones that pay on time or maybe updating information about all the employees that are also managers in the company. Every major RDBMS lets you do at least some level of subquerying to address that exact issue.

Using a subquery

Let's say we want to find the names of all of the managers in the Employees table. Starting with the following table

we want to first select all of the values for ManagerID and then associate them with a name. We can do that with the following set of queries

SELECT EmployeeName AS Employee
FROM Employees
WHERE EmployeeID IN (SELECT DISTINCT ManagerID FROM Employees)

The queries are addressed from the inside out, so the first step is to perform the statement which returns the result set (61, 63).

SELECT ManagerID FROM Employees

This means the outer query becomes which then gives us the record set ("Edward Appleton", "Troy Parker").

SELECT EmployeeName AS Employee
FROM Employees
WHERE EmployeeID IN (61,63)

The only caveat with subqueries is that you must be aware of exactly what the query will return as far as fields, field types, and values. Our subquery in this example returns a single column of values which are valid values for the WHERE EmployeeID IN clause. If the subquery returned the employee name, for example, you'd get a data type mismatch because "Edward Appleton" is a string and EmployeeID is a number field -- WHERE EmployeeID IN ('Edward Appleton'....) makes no sense. This caveat is especially true when you are using a SELECT subquery inside of an INSERT INTO statement -- the number of fields, their order, and the data types must match up or the INSERT will fail.

The SQL outer join

Every type of SQL join effectively multiplies the row in one table by the rows in the other table that is participating in the join (and which can be extended to include as many tables are participating in the JOIN operation).

When we discussed the Cartesian product (the "cross join"), we saw the raw results of a JOIN operation. The INNER JOIN statement provided a useful filter to that raw result by picking out only rows where the key fields have the same value. The result of the INNER JOIN operation is one row for each key value that exists in both joined tables. But what about situations where there are rows in one table that do not have a corresponding match in the other table? An inner join ignores these rows.

An outer join is used to include the rows that are "missing" in an inner join. Using our ongoing BOOK database as an example may make things clearer. So far, we've created joins that pull our Author data and BOOK data together to create a catalog of the BOOK collection. But let's say I've heard a new Author on the radio and immediately decide to put them in the database -- Antony Gray, for example.

If I create an inner join between the Authors and BOOK tables, I get the following result.

SELECT * FROM Authors, BOOKs WHERE Authors.AuthorID *= BOOKs.AuthorID

These results accurately describe my catalog. But what about Antony Gray and all the others I'm planning on adding to the catalog? They completely disappear from the result because there is no matching row where BOOKs.AuthorID=5. Big deal you may say, but what if the two tables were Customers and Orders instead? Or maybe Orders and Items? In the first case, only customers with orders would show up in the report; in the second case, only items that had actually been ordered would show up.

This sort of "missing" information could completely ruin some types of calculations! An "outer join" operation will fix this problem.

Using the Outer Join

Outer joins come in three distinct flavors:

  • LEFT OUTER JOIN (*=)
  • RIGHT OUTER JOIN (=*)
  • FULL OUTER JOIN

Knowing that the purpose of an outer join is to include the "missing" or unmatched rows, you can probably figure out what each of these flavors means. The LEFT, RIGHT, and FULL syntax all describe which of the table's unmatched columns to include. If we wanted to fix the BOOK collection example, we could change the inner join to the following outer join:

SELECT * FROM Authors, BOOKs WHERE Authors.AuthorID *= BOOKs.AuthorID

The LEFT OUTER JOIN operator ensures that all rows on the "left" side of the join, in this case the Authors table, will be included. It is important to note that "left" and "right" are completely dependent on the order of the tables in the SQL statement. The following SQL statements are all identical:

SELECT * FROM Authors, BOOKs WHERE Authors.AuthorID *= BOOKs.AuthorID
SELECT * FROM BOOKs, Authors WHERE BOOKs.AuthorID =* Authors.AuthorID
SELECT * FROM Authors LEFT OUTER JOIN BOOKs ON Authors.AuthorID = BOOKs.AuthorID
SELECT * FROM BOOKs RIGHT OUTER JOIN Authors ON BOOKs.AuthorID = Authors.AuthorID

And as you can probably gather, a "full outer join" includes all unmatched rows from both tables in the result. The outer join is particularly useful for creating aggregate data reports. You could count the orders placed for each item in an inventory table using a SQL statement something like the following:

SELECT Inventory.InventoryID, COUNT(Orders.OrderID)
FROM Inventory LEFT OUTER JOIN Orders
ON Inventory.InventoryID = Orders.InventoryID

GROUP BY Inventory.InventoryID

This would count all of the orders for each inventory row, even those with no matching orders (this example assumes that the records in the Orders table contains orders for single inventory items). You could also use this technique to count the orders per customer, inventory per vendor, or any other type of relationship where it is useful to know that some rows in one table have no corresponding values.

Using the Inner Join

Inner Join combines records from two tables whenever there are matching values in a common field. Here is the syntax of inner join:

SELECT * FROM table1
INNER JOIN table2 ON table1.field1 compopr table2.field2

The INNER JOIN operation has these parts:

You can use an INNER JOIN operation in any FROM clause. This is the most common type of join. Inner joins combine records from two tables whenever there are matching values in a field common to both tables.

You can use INNER JOIN with the Departments and Employees tables to select all the employees in each department. In contrast, to select all departments (even if some have no employees assigned to them) or all employees (even if some aren't assigned to a department), you can use a LEFT JOIN or RIGHT JOIN operation to create an outer join.

If you try to join fields containing Memo or OLE Object data, an error occurs. You can join any two number fields of like types. For example, you can join on AutoNumber and Long fields because they are like types. However, you cannot join Single and Double types of fields. The following example shows how you could join the Categories and Products tables on the CategoryID field:

SELECT CategoryName, ProductName
FROM Categories INNER JOIN Products
ON Categories.CategoryID = Products.CategoryID;

In the preceding example, CategoryID is the joined field, but it isn't included in the query output because it isn't included in the SELECT statement. To include the joined field, include the field name in the SELECT statement in this case, Categories.CategoryID.

You can also link several ON clauses in a JOIN statement, using the following syntax:

SELECT fields
FROM table1 INNER JOIN table2
ON table1.field1 compopr table2.field1 AND
ON table1.field2 compopr table2.field2) OR
ON table1.field3 compopr table2.field3)];

You can also nest JOIN statements using the following syntax:

SELECT fields
FROM table1 INNER JOIN
(table2 INNER JOIN [( ]table3
[INNER JOIN [( ]tablex [INNER JOIN ...)]
ON table3.field3 compopr tablex.fieldx)]
ON table2.field2 compopr table3.field3)
ON table1.field1 compopr table2.field2;

A LEFT JOIN or a RIGHT JOIN may be nested inside an INNER JOIN, but an INNER JOIN may not be nested inside a LEFT JOIN or a RIGHT JOIN.

The LEFT and RIGHT JOIN

The Left Join is one of the two outer joins in the SQL syntax. Although Inner Join returns only those rows that have corresponding values in both tables, the outer joins return all the records from one side of the join, whether or not there is a corresponding match on the other side of the join. The Left Join clause returns all the records from the first table on the list (the leftmost table) and any records on the right side of the table that have a matching column value.

SELECT Publishers.Name,[Publisher Comments].Comments
FROM Publishers LEFT JOIN [Publisher Comments]
ON Publishers.PubID = [Publisher Comments].PubID

The Right Join works the same as the Left Join except that the result set is based on the second (right-hand' table in the JOIN statement. You can use the Right Join in the same manner you would use the Left Join.

We have basic information about SQL to use in our applications. Now, we will have basic knowledge about how we can use SQL in C# environment. First we will take a look namespaces, objects, and classes.

continue article