Finding and removing hidden characters in your data using dynamic T-SQL
Today I was struggling with one issue which keeps my SSIS package failing in one of the data imports.And came across one of the blog. The problem was a conversion problem from varchar type to float and I kept getting the following error:
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.
However, my data was numeric I checked it many times… or was it really 100% numeric?
Apparently there are some crazy hidden characters that you cannot see when you just glimpse at data. One of the best ways to find out if your data has some non-numeric characters in it is by running a simple query.
SELECT ColumnName FROM TableName WHERE ISNUMERIC(ColumnName) = 0
Pretty straight forward. But which character in my data is non-numeric? Well you could use the CHARINDEXfunction to figure out where a certain character set exists in your data but you would have to know which code to use. The link below has the ASCII Character set:
I didn’t have time to test all 255 characters to see what was in my numeric field. So I had to write some dynamic t-sql to help me out. In the code below, just insert your own table name (with schema if you want) and column name.
DECLARE @tablename VARCHAR(1000) ='Schema.Table' DECLARE @columnname VARCHAR(100)='ColumnName' DECLARE @counter INT = 0 DECLARE @sql VARCHAR(MAX) WHILE @counter <=255 BEGIN SET @sql= 'SELECT TOP 10 '+@columnname+','+CAST(@counter AS VARCHAR(3))+' as CharacterSet, CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') as LocationOfChar FROM '+@tablename+' WHERE CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') <> 0' PRINT (@sql) EXEC (@sql) SET @counter = @counter + 1 END
After running the code, you will see in the results table which character yielded results and you can go ahead and update your data accordingly. It will display the ASCII character code number and the first position of that character in your column. In my case, I had a problem with character #13, which is a carriage return that was added to each row of data in my flat file and somehow made it to my staging table during import. So I ran the following update to fix the problem:
UPDATE Schema.Table SET ColumnName= REPLACE(Columnname,CHAR(13),'')
Previously I was cleansing data with Char 32, which is a space and cannot really be seen by just glancing at the data. So now I have a quick way to go through and figure out what characters exist in any column on my table. The code can be used for finding those annoying hidden non-alpha numeric characters or if you are just interested to see what type of characters exist in your data.
You are a very clever person!