Managing big data can introduce a host of issues, but not when you follow the tips below.
Data oversight can be challenging since it involves everything from security and privacy to meeting compliance standards and the ethical use of data. When it comes to big data, management problems grow even bigger because the data is unstructured and unpredictable.
Below are three common big data management challenges and three solutions.
SEE: Feature comparison: Data analytics software, and services (Tech Pro Research)
Challenge 1: Data quality
Big data must be cleaned, prepped, secured, vetted for compliance and continuously maintained.
The problem with these tasks is that data comes in so fast companies find it difficult to perform all of the data preparation steps to ensure optimum data quality. In some cases, organizations simply store all of their incoming big data without doing much to it.
This creates data pollution. Plus, inaccurate data can raise the risk of business decisions being based on erroneous information.
Define your business rules for data cleaning and preparation and seek out automation tools that can perform data prep tasks for you. Second, determine which data you absolutely don't need and establish data purging automation at the front of your data collection processes to jettison this data before it ever hits your network.
SEE: Big data policy (Tech Pro Research)
Challenge 2: Platform integration
Big data integration often centers around integrating data from different business departments into a "single version of the truth" that everyone in the business can use. However, it is just as challenging for IT to manage big data that comes in all flavors and on many different hardware and software platforms.
"There are a plethora of backend distributed data stores, " said Mansour Raad, senior software architect at ESRI. "Some of these distributed data stores are not natively supported by [our] platform....Depending on the data store, I will have to use a different API, mostly Python-based, to handle these situations. It's not optimal. Accessing and storing data in unsupported data stores requires developers to constantly change their program for each data store. This slows development cycles and makes it much longer for customers to get insights from the data."
Basically, different big data processing platforms make it difficult to simplify IT infrastructure for easier data management and big data process flows. This is an enormous challenge for IT.
There are software automation tools available with hundreds of pre-developed APIs for a wide spectrum of data, databases, and files. You might still find yourself hand-developing an API on a case-by-case basis, but these tools can do a majority of the work.
SEE: Data analytics: A guide for business leaders (free PDF) (TechRepublic)
Issue 3: Access and security
Who gets access to which data, and at what level of permission? For example, a document management system contains text-based documents, photos, images, drawings, and videos. Who has access to what documents and who has the right to modify the documents?
This is a policy (and sometimes political) question. It must be resolved in a sit-down meeting between IT and end users to determine who should gain access. No one likes these meetings, and the meetings are frequently neglected for years. Minimally, meetings should be held annually, especially with so much big data coming in. IT needs to make this happen.
- How to become a data scientist: A cheat sheet (TechRepublic)
- 60 ways to get the most value from your big data initiatives (free PDF) (TechRepublic)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Best cloud services for small businesses (CNET)
- Big data: More must-read coverage (TechRepublic on Flipboard)