Converting data and splitting data are at the core of almost any type of data application. This tutorial presents an example that details how to perform data conversion and conditional data splitting using SQL Server 2005 Integration Services (SSIS).
The exampleI will begin this example where I left off in my article about executing SQL statements inside an SSIS package. I created three tables using TSQL and placed them inside of a Sequence Container object to ensure the statements execute together. Building on that example, drag a Data Flow task object onto the Control Flow pane and rename it Import CSV. Next, drag the green success flow arrow from the Sequence Container to the Data Flow task (Figure A). Figure A
I use the Data Flow task to import a comma separate values (CSV) file, and based upon the values from the file, place the records into the separate tables I created in the previous article. To start, I drag a Flat File Source task item from the Data Sources tasks onto the Control Flow window (Figure B). Figure B
Double-click the Flat File Source task to open the Flat File Source Editor. I will use this Editor to specify the CSV location and the columns contained in the CSV file. To do so, I need to create a new Flat File connection manager, so click the New button on the Flat File Source Editor screen (Figure C). Figure C
This opens the Flat File Connection Manager Editor, where I can specify the location of the CSV file, along with any additional file details, such as how the file is delimited, row headers, etc. I also click the Columns option from the left pane on the editor screen to define the columns I want to return from the CSV file (Figure D). Figure D
Because of the manner in which the CSV file will return some of the data, I tell Integration Services to convert the data so that I can insert it into my database tables. To do this, I drag a Data Conversion task from the Data Flow Transformation pane onto the Data Flow tab.
Once I add the Data Conversion task item, I open it and select the data conversions that I want to occur. The example in Figure F lists the data as it initially occurs in the transformation. Notice that the SaleID and the SalePrice fields are listed as string (DT_STR) data types. This will not work for entering this data into the database tables. Figure F
Data conversions are relatively simple. I am switching the data type of the SaleID field to an unsigned integer (DT_I4) and the data type of the SalePrice to currency (DT_CY). I am also renaming the fields to SaleIDConverted and SalePriceConverted. This conversion adds these two fields to the current data set, which allows me to use these fields in later tasks.The data types in this example are not what you typically see in SQL Server; these data types are native to Integration Services, so it may take you a little while to get used to them. See Figure G. Figure G
Now that my data is converted to the data types I will need, I specify how I want to direct the rows of data from the CSV file. To do this, I use a Conditional Split data flow task. This task item allows you to break up your data set into smaller data sets based upon field criteria specified in the task. This means that once my data set from the CSV and Data Conversion task hits the Conditional Split task, I will have smaller subset data sets to work with. Drag the Conditional Split task item from the Data Flow Transformations pane onto the Data Flow tab (Figure H). Figure H
Open the Conditional Split transformation to edit the task. In the exhibit in Figure I, I have dragged the Product column from the Columns list onto the grid three times. The grid represents the different data splits that I want to occur. As you can see, I want to create three separate data sets based upon the BigScreen, PoolTable, and Computer products. If necessary, I can specify what needs to occur if I encounter data that I am not expecting (which may occur often). Note that you can see the SaleIDConverted and the SalePriceConverted fields from the Columns list. As mentioned below, these fields were added from the Data Conversion task and are available to any subsequent data flow task. Figure I
Now that I have my Conditional Split task defined, I need to specify the three different tables where I want my data to eventually be inserted. To do this, I add three new SQL Server Destination tasks onto the Data Flow tab (Figure J). These three destinations represent the three tables that I previously created. Figure J
The next task is to drag the green arrow from the Conditional Split data flow task to each of the SQL Server Destination tasks. When I do, I am presented with an Input Output Selection, which allows me to specify which condition from the Conditional Split task I want to use as a data source to the SQL Server Destination. I need to add this information for each of the three destinations, specifying a different output in each step. An example of this is Figure K. Figure K
All that is left is to open each SQL Server Destination task item and specify which SQL Server table the data set will be inserted into (Figure L). These three tables will be the tables I defined earlier. Figure L
I also need to specify the mappings to be inserted. Because I converted data from two fields in the Data Conversion task, I want to use these fields instead of the fields originally imported. See Figure M. Figure M
Figure N is an example of my completed Data Flow tab. Figure N
Switch back to the Control Flow tab and execute the package. A successful Control Flow example is listed in Figure O. Figure O
If I switch to the Data Flow tab, I can see a successful completion. At each step of execution, SSIS lists how many records were transferred for that step. After the Conditional Split task occurs, I can see that much smaller subsets are transferred to their respective tables. See Figure P. Figure P
More coverage about SSIS
SSIS is a very powerful tool, and I've barely grazed the feature surface so far. In future articles, I will tackle more advanced subjects, such as the use of variables, scheduled package execution, fuzzy lookups, and deployment.
Tim Chapman a SQL Server database administrator and consultant who works for a bank in Louisville, KY. Tim has more than eight years of IT experience, and he is a Microsoft certified Database Developer and Administrator. If you would like to contact Tim, please e-mail him at firstname.lastname@example.org.
————————————————————————————————————————————-Get SQL tips in your inbox
TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!
Tim Chapman is a SQL Server MVP, a database architect, and an administrator who works as an independent consultant in Raleigh, NC, and has more than nine years of IT experience.