Over the last few years, big data has become a big deal. Between
sites like Data.gov, the massive amounts of
data each person generates both privately and on social media, and every
organization’s rapidly increasing databases, big data is one of the most
important things IT professionals need to understand and deal with. Reports and
Dashboards are just the beginning with big data – requirements now include
predictive analyses and other more advanced tools.
For this edition of Five Apps, we take a look at five tools
to help you analyze your big data.
Datameer is, on its
surface, a basic analysis tool. It has a spreadsheet-like interface and
contains many of the same charts and graphs. However, it surpasses Excel and
other spreadsheet programs by allowing the user link to active data sources as
well as import flat files as well as joining two tabs together into a third, much
like you join tables in a database.
It also is much more column-focused than a spreadsheet – the
tasks you perform, such as Group Bys, are all done with reference to a column
and occupy a column of their own on the destination sheet. Since it is so columnar,
you can also drag-and-drop columns into charts and graphs easily instead of
having to specify ranges like in Excel. Charts and graphs come with many
configuration options including manual colors, font sizes, layout, and
The final feature of Datameer is Smart Analytics which
includes Clustering, Decision Trees, Recommendations (Heat Charts), and Column
Dependencies tools. Datameer starts at $299/year for a single user and has
Workgroup and Enterprise licensing available.
Jaspersoft is a
drag-and-drop GUI that allows you to combine your data in various ways using
the built-in charts, graphs, and crosstab views. You can see various types of
data side-by-side by dropping those as Columns and break it down by various
categories as Rows. One of the nicest features of Jaspersoft is the Data Level
filter at the top right. It allows you to scale back your Rows or Columns to a
lower level of detail (such as viewing sales by Country instead of by Country
and then by Store Type) without having to remove those data points from the
graph altogether. Jaspersoft offers several different editions of their
software from the free Community Edition to various on-site versions licensed
by server processor to an AWS-based version licensed per-hour. Pricing info is
available from the sales team.
Instead of being a dynamic reporting tool, Pentaho allows you to create fixed structure
reports and dashboards which are then tied to a dynamic data source. This is
great for companies whose users do not have the skill or are unwilling to take
the time to create their own visualizations. Pentaho has the typical charts and
graphs, such as pie, bar, line, etc., as well as crosstab views. It also has
heat grid reports to compare performance among various measures. Like the other
systems on this list, Pentaho can link up with various source databases. Pricing
is available from the sales team.
4. SAS Visual Analytics
Easily recognized as the biggest name on this list, SAS has
entered the big data fray with their Visual Analytics
software. However, it is, for the most part, roughly equal to the other
products here. Data is brought into the system either by flat file or database
links, and various charts, graphs, and visualizations are easily created.
It stands out, however, in the way that it displays that
information. Where the other products were somewhat vague as to what the data
values were, SAS Visual Analytics always seems to provide a legend, especially
in geographical visualizations, heat maps, and the like. One visualization I
did not see present in their set was the pie chart, however they seem to have
replaced it with a treemap, which can have the same effect, although may be
harder for some to understand.
The other standout feature, to me was the quick glance
feature when selecting data filters. You can easily see the relative size of
the data in each data point so you know somewhat what you’re getting into. SAS
Visual Analytics pricing is available from the sales team.
While it connects to traditional data sources like the other
systems on the list, Splunk is the only
product that can connect to system event logs, system performance monitors,
directory trees, TCP/UDP connections, and Active Directory systems. Given that
vast array of non-traditional data sources, Splunk is a great solution for
monitoring big data that, on the surface, doesn’t seem like big data. However,
event log monitoring alone can generate as much raw data as enterprise EHR and
While it provides the common charts and graphs, Splunk also
has its own query language that makes it difficult to jump right in to. Anything
beyond basic charts requires knowledge of the query language. Pricing is
simple: you pay by the gigabyte indexed per day by the system whether that is
on-site or cloud-based.
There are many more products available for analyzing your
own big data; these are just a handful offering necessary features. Has your
organization delved into analyzing their big data? If so, have you used any of
the tools above or different tools? Share your thoughts in the comments below.
Going Deep on Big Data
Big data is transitioning from one of the most hyped and
anticipated tech trends of recent years into one of the biggest challenges that
IT is now trying to wrestle and harness. We examine the
technologies and best practices for taking advantage of big data and
provide a look at organizations that are putting it to good use.