In short, the filter placement could change the query results if the query uses an outer join. Left (right) outer joins add back non-matching rows from the left (right) table after the ON filter. A row might match the ON criteria but still return to the final result set. The following examples demonstrate this behavior.
USE Northwind
GO
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
LEFT JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
WHERE c.Country = 'France' AND c.City = 'Paris'

The query begins logical processing at the JOIN.
The join first inserts the cartesian of Customers and Orders into a virtual table (VT1). Next, the join inserts rows from VT1 into a new virtual table (VT2) where o.CustomerID = c.CustomerID. The last step of the join inserts VT2 and non-matching rows from customers into a new virtual table (VT3). The query then advances to the WHERE clause. Here are the steps summarized:
1. Cartesian product between Customers and Orders inserted into VT1
2. VT1 rows inserted into VT2 where o.CustomerID = c.CustomerID
3. VT2 rows and non-matching Customers rows inserted into VT3
4. WHERE clause filters VT3
Now suppose the query needs to filter by the CompanyName column. The user only wants to see rows for Spécialités du monde. Let's add the filter to the ON clause first.
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
LEFT JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
AND c.CompanyName = 'Spécialités du monde'
WHERE c.Country = 'France' AND c.City = 'Paris'

The result set is identical to the previous query. These results might surprise the user if they expect to see rows for Spécialités du monde only. So we must move the filter from the ON to the WHERE.
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
LEFT JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
WHERE c.Country = 'France' AND c.City = 'Paris'
AND c.CompanyName = 'Spécialités du monde'

But what about Performance?
Let's forget query accuracy for a moment to discuss performance. Filter placement in outer joins can have a performance impact. SQL creates two distinct plans for the queries above.
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
LEFT JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
WHERE c.Country = 'France' AND c.City = 'Paris'
AND c.CompanyName = 'Spécialités du monde'
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
LEFT JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
AND c.CompanyName = 'Spécialités du monde'
WHERE c.Country = 'France' AND c.City = 'Paris'

The second query gets a lower estimated cost because it performs fewer IO than the first. The startup expression saves reads on the Orders table. Reading the query plan from a logical perspective (left to right) helps illustrate. The rightmost nested loop seeks into orders for each row from the index scan on customers. Before the seek on orders occurs, the row goes through the filter iterator. The startup expression (CompanyName = 'Spécialités du monde') filters rows before they seek into Orders. Fewer seeks into Orders saves this query a few IO. Despite this observation, we can't always assume ON filters perform better than WHERE filters. Many factors can sway performance one way or another, and each query requires testing.
Up to this point, we've focused on outer joins. We've neglected inner joins because filter placement doesn't matter in our examples. We get the same results, same execution plan, same IO, and same CPU time.
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
INNER JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
AND c.CompanyName = 'Spécialités du monde'
WHERE c.Country = 'France' AND c.City = 'Paris'
SELECT c.CustomerID,c.CompanyName,C.City,C.Country
,o.OrderID,o.CustomerID,o.OrderDate
FROM dbo.Customers c
INNER JOIN dbo.Orders o
ON o.CustomerID = c.CustomerID
WHERE c.Country = 'France' AND c.City = 'Paris'
AND c.CompanyName = 'Spécialités du monde'

In summary, predicate placement can impact query results and performance in outer joins. I remember when I learned the ON clause could have more than one filter. I moved a filter from the WHERE to an ON and "voilà" the query performance improved. I didn't stop to question how the change might have altered the query results. In retrospect, I should have focused on query accuracy before the performance.
Comments