Ramblings of a SQL Hillbilly

Or, the mad ravings of a data warehouse forklift operator

Troubleshooting Empty String Comparison Issues

While testing an SSIS package today, I discovered that I was missing some rows I expected to build. Digging in revealed that the missing rows had a column in the key that looked like an empty string to me, but that didn’t seem to be comparing correctly. SQL Server is usually pretty good about this, so I had a mystery to solve!

The first thing to realize about SQL Server is that an empty string and a string composed of all spaces are considered to be equivalent. That is, the following code will return Yes:

1
2
3
4
5
SELECT CASE
       WHEN '' = '    '
       THEN 'Yes'
       ELSE 'No'
       END as Are_empty_strings_equivalent_to_spaces

Note that the empty string ‘’ and a string with a NULL value are not the same thing! An empty string is just a string with a length of zero. So, if I compare what appears to be two CHAR(22) fields that are composed of spaces for equality, I should get a value of TRUE returned. The fact that I wasn’t seeing this was an indicator that something weirder was going on.

The first tool I pulled out of the box was the trusty pair of LTRIM() and RTRIM(). LTRIM() removes leading spaces from the passed string, and RTRIM() removes trailing spaces. I really didn’t expect this to work, but gave it a try anyway because I tend to stumble upon weird stuff like this from time to time where SQL Server doesn’t behave as I expect. That’s not to say that it behaves incorrectly – I’ll believe my understanding is at fault first! Anyway, I did check my strings for their LEN() attributes when wrapped in LTRIM() and RTRIM(), and discovered that one version of the string showed length 0 (the empty string), and the other showed length 20! Curiouser and curiouser.

My next stop on this train was to make use of the ASCII function to examine what was actually in those strings. This function tells you the ASCII code of the first character in the string you pass it. When combined with SUBSTRING, you can examine any character in your string to see what it is. I was looking for ASCII codes of 32 (space), but to my surprise the source column had 0 (NUL) in 20 of the 22 characters! As it turns out, at some point in the past, our DB2 source-side system put those NUL terminators into the first 20 bytes of that column, and our previous ETL solution maintained the NUL bytes into the data warehouse source layer. However, SSIS quite helpfully substituted space characters for the NUL characters, meaning that when the time came to join back to the original table, the comparison failed and we lost rows.

In our case, this is old data so we’re just going to plug it. However, I could have stuck a REPLACE(colname,CHAR(0),CHAR(32)) in the staging query and solved the problem as well. If you run into a similar problem, that may be the solution you want, though you may be dealing with different characters. Hope it helps!