Friday, March 23, 2012

excel data problem

I have a foreach loop that loops through each excel file and load data to a staging table. Lets say i have file A.xls and B.xls. When loading data from A.xls to the staging table data looks fine, however, loading data from B.xls one of the column did not convert data as string, instead it converted some of the number as 1.000e+7. In the advance editor, i've set the output as a Unicode string and my excel connection string is appended with IMEX=1. Sill, the data is converted correctly for some number, others it converted as x.xxxe+x. Anyway i can force the data to convert "as is"?

Here's another interesting thing. When i have the excel files open and run the load process, data is converted correctly. OR, when i format the column as text and the green triangle appears in the upper left, then the data is converted correctly (without file open). I just can't seem to know whats going on with the excel source. I have many of these files and this problem need to be fix through SSIS.

Thanks.

|||

Personally I think the Excel import process sucks and should be ignored at all costs. I was using it for a bunch of packages and just gave up. I find that even if i comes it as string you still need to perform a bunch of "derived column" transformations in order for the database to accept it.

I THINK that some of the problems stem from what seems to be Excel ignoring any column settings you set but "checking" some of the data before decided what datatype to use. So, if your column has a mix of numbers and text, and enough numbers come first, then the type is set to numeric. If enough text comes first, you get strings. Does not seem to matter what you set when you define the package.

|||I understand that excel pick the first 8 rows and determine the column is a numeric or a string. This can be fix by adding IMEX=1 in the connection string. However, the problem i stated above seems to be happening if i don't format the column as text with the green triangle appearing in each cell. Thanks for your input.|||

OK, the problem is still not solved, but here is what i've found out so far.

This happens when having IMEX=1 in the connection string (for a single column, lets say column A):

1. If the 1st 8 rows are all numeric, then row 9-65535 will be NULL if a cell has a string value in it.

2. If the 1st 8 rows has at least 1 string in a cell, then row 9-65535 values will be recognize "as is". However, if the numeric value is large, say greater than 12 digits, then the value will be converted as 1.xxxxxxxxxx+exx WHEN VIEWING THE EXCEL DATA IN SSIS AND HAVE THE EXCEL OPEN. WHEN THE EXCEL IS CLOSE AND VIEWING THE DATA IN SSIS, then the large numeric value is "as is".

Now, if a cell is formatted as text (without re-entry, the green triangle) and the value is numeric, and less than 12 digits, then the value will be "as is" WHEN VIEWING THE EXCEL DATA IN SSIS AND HAVE THE EXCEL OPEN. WHEN THE EXCEL IS CLOSE AND VIEWING THE DATA IN SSIS, then the large numeric value thats less than 12 digits is converted as 1.xxxxxxx+exx.

Blah blah blah, too many comobinations, this sucks!!!!!!!!!!!!!!

Just format the column as numeric in excel, have IMEX=1 and you'll be fine.

|||

I had the inverse of this problem where I wanted 13 digit numbers with leading zeroes (eg 0001234567891) to be processed through as text, but they were coming through a null values. It didn't matter that I had the IMEX=1 set in the connection string, or what I formatted the column as in excel. Yay. Anyway, in my analysis I found something interesting - that the metadata that is assigned to the excel sheet is not editable and automatically assigns the data type based on the value - as mentioned above. It kept telling me my column was numeric, even though i formatted it as text. You can see the metadata in the properties of the data flow path.

Anyway, the only way I was able to get around the problem was: When designing the package, I created a single line dummy file with text value in each field - then the meta data comes through as text (DT_WSTR type). The dummy file is like the configuration template for the excel source data types.

In my case I'm using a file loop and variables with the excel connection and source objects, so I just renamed the dummy file to .dat after designing the package. When the file loop runs (for .xls only) it picks up my real files and interprets the data types per the dummy file to the real file(s). In this case DT_WSTR. The problem remains if the 13 digit numbers with non-leading zeroes are formatted as text, it will apply scientific formatting to them. Looks like I'm gonna have to write a script block to open excel and format everything as general, or make a rule that the files can't be formatted at text :(

At this stage I would suggest avoiding excel if you have the choice, which I don't...

|||

I also have the same problem despite I'm using IMEX=1 in my connection strings.

Is there any official Workaround from Msft on this ?

sql

No comments:

Post a Comment