Everybody talks about Big Data, like it’s either a monolithic thing like Ayres Rock in Australia that has been around since before there were people. You can almost imagine companies springing up around deposits of Big Data like gold miners along a stream. Could that be why so many people talk so much about data mining? I think not, because Big Data is almost an organic thing, like a little acorn that grows into a might oak, and like a seed it needs to be fed and nurtured.

While there are multiple methods of feeding a database, one of the more popular is SSIS (SQL Server Integration Services). If you’re unfamiliar with SSIS, it’s a data migration tool that is part of SQL Server Standard Edition or above, which, unfortunately, means that it isn’t part of the free Express Edition of SQL Server.

In this blog, I’ll walk you along the creation of a data project using SSIS. To play along at home, in addition to SSIS, you’ll need Microsoft’s Adventure Works Sample database and Microsoft Excel.

In the Windows start menu, SSIS can be found under Microsoft SQL Server 2012 as SQL Server Data Tools. Clicking it and selecting new project brings up the New Project dialog shown in Figure A.

Figure A

Click on figures to enlarge.

When a new project is created, you’re presented with the IDE shown in Figure B. Notice various tabs along the top, the SSIS Toolbox with its tasks on the left and the Connection Managers along the bottom.

Figure B

Looks kind of familiar, doesn’t it–like Visual Studio?  That’s because it is Visual Studio. But don’t let that scare you, if you’re not really a .Net programmer. With the exception of Connection Managers, this IDE is basically drag and drop.

Now about those Connection Managers: they identify what kind of inputs and outputs are available for processing, as listed in Table A.

Table A

Type

Description

ADO Connection manager for ADO connections
ADO.NET Connection manager for ADO.NET connections
CACHE Connection manager for cache
DQS Connection manager for DQS server
EXCEL Connection manager for Excel files
FILE Connection manager for files
FLATFILE Connection manager for flat files
FTP Connection manager for FTP connections
HTTP Connection manager for HTTP connections
MSMQ Connection manager for Message Queue task
MSOLAP100 Connection manager for Analysis Services connections
MULTIFILE Connection manager for multiple files
MULITFLATFILE Connection manager for multiple flat files
ODBC Connection manager for ODBC connections
OLEDB Connection manager for OLE DB connections
SMOServer Connection manager for SQL Server transfer tasks
SMTP Connection manager for SQL Server Compact connections
WMI Connection manager for WMI connections

Using the list above, let’s say, rather than feed the beast, we’d like to drag some information like addresses and put them in an Excel spreadsheet. So, we’ll need a Database Connection Manager, like the OLE Connection Manager shown in Figure C. For the Excel spreadsheet, use an Excel Connection Manager, an example of which is displayed in Figure D.

Figure C

Figure D

There is one caveat: If you’re using a 64-Bit edition of Windows, you’ll need to make sure that you do not use the 64-Bit runtime. The package will execute badly. Here’s how to make sure this doesn’t happen. Open up the project’s property pages (it’s the last item under the Project tab). Next, click Configuration Properties and Debugging. Under Debugging Options, set Run64BitRuntime to False as shown in Figure E, and you’re good to go.

Figure E

There is one more item to remember. All of the record-level processing is done in Data Flow tasks, not in the Control Flow. The Control Flow is where you  control the flow using the For Loop Container, the Foreach Loop Container, or the Sequence Container. But since our project is so simple, all we need to do is drag a Data Flow Task onto the Control Flow, giving us something that looks like Figure F.

Figure F

Now it’s time to click on the Data Flow Task or the Data Flow Tab, and open up the Data Flow. While it’s similar to the Control Flow, the SSIS toolbox offers some different tools. The two that we need are under Other Sources and Other Destinations, namely the OLE DB Source and the Excel Destination. Figure G shows what this looks like.

Figure G

Now drag an OLE DB Source onto the page and double-click on it. Up comes the OLE DB Source Editor. Here is where you use the OLE DB Connection Manager and select which table or stored procedure or SQL to use in order to retrieve data, like Figure H shows. This is also where you can pick the columns that you want to retrieve. Finally, here is where the error row handling can be defined.

Figure H

Once the source is defined drag an Excel Destination onto the Data Flow, then click on the OLE DB Source and drag the arrow to the Excel Destination, getting something that looks like Figure I.

Figure I

This last step is where the Excel Destination is defined using the Excel Connection Manager. It looks a lot like setting up the OLE DB Source as Figure J and Figure K show.

Figure J

Figure K

Now all that is left is to run the package by clicking the obligatory green triangle and, if you don’t see any red, looking at the result in Excel, Figure L.

Figure L

Now, this really isn’t a fair test of everything that is possible with SSIS; that would be somewhat longer than an article. It does show something of what is possible. Consider, for a moment, what if the input was Excel and the output was a database table, or a flat file and a table, or an FTP and a table, or what if both were tables?

The Data Flow doesn’t have to be a straight move; there are a bunch of tools that we never even looked at. You can sum, split, sort, fuzzy group, lookup, pivot, unpivot and even add your own code in C# or Visual Basic .Net.