Ramblings of a SQL Hillbilly

Or, the mad ravings of a data warehouse forklift operator

T-SQL Trickery: An Alternative to Or

One of the less pleasant aspects of doing ETL coding is dealing with requirements that don’t allow for straightforward, well-performing code. One of the trickier aspects of pulling from multiple source tables is determining whether those tables have changed. Many times, you have to make use of staging tables to avoid having to string together several performance-killing OR conditions, but sometimes there is an easier way…

Let’s say you have the following predicates in your WHERE clause:

1
2
3
4
5
AND (A.ETL_LOAD_TIMESTAMP > ETL.LAST_LOADED_TIMESTAMP
  OR B.ETL_LOAD_TIMESTAMP > ETL.LAST_LOADED_TIMESTAMP
  OR C.ETL_LOAD_TIMESTAMP > ETL.LAST_LOADED_TIMESTAMP
  OR D.ETL_LOAD_TIMESTAMP > ETL.LAST_LOADED_TIMESTAMP
  OR E.ETL_LOAD_TIMESTAMP > ETL.LAST_LOADED_TIMESTAMP)

Depending on indexing and the query plan SQL Server chooses, this can perform pretty badly. I dealt with one query this summer that did this for fourteen tables – and the “developer lead” I was working with couldn’t figure out why it ran so slow!

Anyway, after a little bit of deliberation, I came up with an alternative approach which took our runtime down from 20 minutes to about 10:

1
2
3
4
5
6
7
8
9
10
11
AND ETL.LAST_LOADED_TIMESTAMP <
    (SELECT MAX(ETL_LOAD_TIMESTAMP) FROM (
            SELECT A.ETL_LOAD_TIMESTAMP
             UNION ALL
            SELECT B.ETL_LOAD_TIMESTAMP
             UNION ALL
            SELECT C.ETL_LOAD_TIMESTAMP
             UNION ALL
            SELECT D.ETL_LOAD_TIMESTAMP
             UNION ALL
            SELECT E.ETL_LOAD_TIMESTAMP) CANDIDATE_TIMESTAMPS )

This is a T-SQL design pattern which I’ve found handy time and again. If you don’t care if all the timestamps are greater as long as one or more is greater, then this is the ticket. It’s also handy for working with date overlaps – you can take the highest begin and the lowest end from all choices, which gives you the full overlap as long as your joins are correct. You can use this in WHERE clauses or in SELECTs. And note that I used UNION ALL instead of UNION – in this case, we’d lose more from the SORT and the DISTINCT functionality of UNION than we’d gain from not dealing with duplicate values.

Hope it helps!