Everybody talks about Big Data, like it's either a monolithic thing like Ayres Rock in Australia that has been around since before there were people. You can almost imagine companies springing up around deposits of Big Data like gold miners along a stream. Could that be why so many people talk so much about data mining? I think not, because Big Data is almost an organic thing, like a little acorn that grows into a might oak, and like a seed it needs to be fed and nurtured.
While there are multiple methods of feeding a database, one of the more popular is SSIS (SQL Server Integration Services). If you're unfamiliar with SSIS, it's a data migration tool that is part of SQL Server Standard Edition or above, which, unfortunately, means that it isn't part of the free Express Edition of SQL Server.
Click on figures to enlarge.When a new project is created, you're presented with the IDE shown in Figure B. Notice various tabs along the top, the SSIS Toolbox with its tasks on the left and the Connection Managers along the bottom.
Looks kind of familiar, doesn't it—like Visual Studio? That's because it is Visual Studio. But don't let that scare you, if you're not really a .Net programmer. With the exception of Connection Managers, this IDE is basically drag and drop.Now about those Connection Managers: they identify what kind of inputs and outputs are available for processing, as listed in Table A.
|ADO||Connection manager for ADO connections|
|ADO.NET||Connection manager for ADO.NET connections|
|CACHE||Connection manager for cache|
|DQS||Connection manager for DQS server|
|EXCEL||Connection manager for Excel files|
|FILE||Connection manager for files|
|FLATFILE||Connection manager for flat files|
|FTP||Connection manager for FTP connections|
|HTTP||Connection manager for HTTP connections|
|MSMQ||Connection manager for Message Queue task|
|MSOLAP100||Connection manager for Analysis Services connections|
|MULTIFILE||Connection manager for multiple files|
|MULITFLATFILE||Connection manager for multiple flat files|
|ODBC||Connection manager for ODBC connections|
|OLEDB||Connection manager for OLE DB connections|
|SMOServer||Connection manager for SQL Server transfer tasks|
|SMTP||Connection manager for SQL Server Compact connections|
|WMI||Connection manager for WMI connections|
Now it's time to click on the Data Flow Task or the Data Flow Tab, and open up the Data Flow. While it's similar to the Control Flow, the SSIS toolbox offers some different tools. The two that we need are under Other Sources and Other Destinations, namely the OLE DB Source and the Excel Destination. Figure G shows what this looks like.
Now drag an OLE DB Source onto the page and double-click on it. Up comes the OLE DB Source Editor. Here is where you use the OLE DB Connection Manager and select which table or stored procedure or SQL to use in order to retrieve data, like Figure H shows. This is also where you can pick the columns that you want to retrieve. Finally, here is where the error row handling can be defined.
Now all that is left is to run the package by clicking the obligatory green triangle and, if you don't see any red, looking at the result in Excel, Figure L.
Now, this really isn't a fair test of everything that is possible with SSIS; that would be somewhat longer than an article. It does show something of what is possible. Consider, for a moment, what if the input was Excel and the output was a database table, or a flat file and a table, or an FTP and a table, or what if both were tables?
The Data Flow doesn't have to be a straight move; there are a bunch of tools that we never even looked at. You can sum, split, sort, fuzzy group, lookup, pivot, unpivot and even add your own code in C# or Visual Basic .Net.