Sometimes trying to choose among so many options is more crippling than it should be. Here are some things to consider when shopping for big data tools.
A plethora of big data tools exists to help IT do its job. But at what point are there so many tools that it becomes difficult to decide which to use?
"The presumption is, self-determination is a good thing and choice is essential to self-determination," said Barry Schwartz, Ph.D., a Swarthmore College psychologist and author of The Paradox of Choice: Why More is Less. "But there's a point where all of this choice starts to be not only unproductive, but counterproductive—a source of pain, regret, worry about missed opportunities and unrealistically high expectations."
Choosing the right big data tools from many different choices can be a formidable task. Those tasked with it do the research and review many "best of" lists in an effort to decide.
One common technique for hedging bets on new software tools is to arrange with prospective vendors for "try and buy" pilot projects that leave you with no obligation to buy if you don't like the results of the pilot. But this is a difficult approach with big data because of the volume and velocity of the data involved, and the technical complications of setting up a trial test bed to test out the tool in your own processing environment, using your own data, data models, and algorithms.
This is what makes companies turn to experienced CTOs, technology forums, or other companies in their industries, to see which tools are best.
What other steps can you take to ensure that you get the best tools without experiencing overload from too many choices? Here are some things to consider when doing the research.
Set up your tool workbench
One reason there are so many big data tool choices in the open market is the number of organizations that use Hadoop as a big data platform. Hadoop is an open source platform, which opens up an endless universe of open source big data tools.
Open source tools have advantages because with open source there are no license fees to pay. On the other hand, there is still the problem of too many choices.
SEE: How to maintain your big data analytics software (TechRepublic)
There is a good analogy in carpentry that might help. You can either carry six different screwdrivers with Phillips, flat, star, hex and other head styles to fit the variety of fasteners you might have to work with, or you can purchase a single screw driver with insertable heads that can handle all of those different fasteners.
Tool choices like these exist in the big data world.
You can purchase a single-purpose, highly specialized tool, or you can invest in a tool or tool platform that is diversified and can handle a variety of different functions and scenarios. Usually, the multi-purpose option is best, unless you find yourself in a unique situation where there are so many advantages a particular single-purpose tool brings that you don't feel you can do without it.
Check for interoperability
To overcome interoperability challenges with big data, health care tool providers must address issues in data sharing, visualization, security, updates, reports, queries, cleaning, capture, stewardship, and storage.
These challenges are present in every industry and make it imperative that companies carefully examine a product's interoperability with existing IT data and infrastructure before investing in a tool.
One good piece of news is NIST's Final Big Data Interoperability Framework, which was released in 2019. This framework sets standards for big data tools and should make interoperability easier to achieve going forward.
Verify vendor support
All big data tools have a learning curve, but some tools are supported better than others.
During the RFP process, thoroughly look into your vendor's support options. Check with vendor clients to see what their experiences with the vendor have been in product support and problem resolution. If you choose an open source solution, vendor support should be a major concern, as open source tool support frequently underperforms the support offered by more traditional software vendors.
- How to become a data scientist: A cheat sheet (TechRepublic)
- Big data's role in COVID-19 (free PDF) (TechRepublic download)
- Power checklist: Local email server-to-cloud migration (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Big data: More must-read coverage (TechRepublic on Flipboard)