For this edition of Five Apps, we take a look at five tools to help you analyze your big data.
Datameer is, on its surface, a basic analysis tool. It has a spreadsheet-like interface and contains many of the same charts and graphs. However, it surpasses Excel and other spreadsheet programs by allowing the user link to active data sources as well as import flat files as well as joining two tabs together into a third, much like you join tables in a database.
It also is much more column-focused than a spreadsheet - the tasks you perform, such as Group Bys, are all done with reference to a column and occupy a column of their own on the destination sheet. Since it is so columnar, you can also drag-and-drop columns into charts and graphs easily instead of having to specify ranges like in Excel. Charts and graphs come with many configuration options including manual colors, font sizes, layout, and positioning.
The final feature of Datameer is Smart Analytics which includes Clustering, Decision Trees, Recommendations (Heat Charts), and Column Dependencies tools. Datameer starts at $299/year for a single user and has Workgroup and Enterprise licensing available.
Jaspersoft is a drag-and-drop GUI that allows you to combine your data in various ways using the built-in charts, graphs, and crosstab views. You can see various types of data side-by-side by dropping those as Columns and break it down by various categories as Rows. One of the nicest features of Jaspersoft is the Data Level filter at the top right. It allows you to scale back your Rows or Columns to a lower level of detail (such as viewing sales by Country instead of by Country and then by Store Type) without having to remove those data points from the graph altogether. Jaspersoft offers several different editions of their software from the free Community Edition to various on-site versions licensed by server processor to an AWS-based version licensed per-hour. Pricing info is available from the sales team.
Instead of being a dynamic reporting tool, Pentaho allows you to create fixed structure reports and dashboards which are then tied to a dynamic data source. This is great for companies whose users do not have the skill or are unwilling to take the time to create their own visualizations. Pentaho has the typical charts and graphs, such as pie, bar, line, etc., as well as crosstab views. It also has heat grid reports to compare performance among various measures. Like the other systems on this list, Pentaho can link up with various source databases. Pricing is available from the sales team.
4. SAS Visual Analytics
Easily recognized as the biggest name on this list, SAS has entered the big data fray with their Visual Analytics software. However, it is, for the most part, roughly equal to the other products here. Data is brought into the system either by flat file or database links, and various charts, graphs, and visualizations are easily created.
It stands out, however, in the way that it displays that information. Where the other products were somewhat vague as to what the data values were, SAS Visual Analytics always seems to provide a legend, especially in geographical visualizations, heat maps, and the like. One visualization I did not see present in their set was the pie chart, however they seem to have replaced it with a treemap, which can have the same effect, although may be harder for some to understand.
The other standout feature, to me was the quick glance feature when selecting data filters. You can easily see the relative size of the data in each data point so you know somewhat what you're getting into. SAS Visual Analytics pricing is available from the sales team.
While it connects to traditional data sources like the other systems on the list, Splunk is the only product that can connect to system event logs, system performance monitors, directory trees, TCP/UDP connections, and Active Directory systems. Given that vast array of non-traditional data sources, Splunk is a great solution for monitoring big data that, on the surface, doesn't seem like big data. However, event log monitoring alone can generate as much raw data as enterprise EHR and CRM systems.
While it provides the common charts and graphs, Splunk also has its own query language that makes it difficult to jump right in to. Anything beyond basic charts requires knowledge of the query language. Pricing is simple: you pay by the gigabyte indexed per day by the system whether that is on-site or cloud-based.
There are many more products available for analyzing your own big data; these are just a handful offering necessary features. Has your organization delved into analyzing their big data? If so, have you used any of the tools above or different tools? Share your thoughts in the comments below.
Going Deep on Big Data
Big data is transitioning from one of the most hyped and anticipated tech trends of recent years into one of the biggest challenges that IT is now trying to wrestle and harness. We examine the technologies and best practices for taking advantage of big data and provide a look at organizations that are putting it to good use.