Image: Nuthawut Somsuk, Getty Images/iStockphoto

I visited the Athenaeum Library in Providence, RI, a couple of years ago. Started in 1831, the Athenaeum houses a fine collection of rare and very brittle manuscripts dating back to the 18th century. The library is superbly curated. Much love and care has gone into preserving invaluable books and treatises that the public can enjoy.

In the big data world, curation is important, too. Curation is a necessary part of data management with an end goal very similar to that of the Athenaeum’s: To render information useful and to take the necessary steps to preserve it.

SEE: The data scientist job interview: Questions to expect and questions to ask (free PDF) (TechRepublic)

I first wrote about the importance of curation for big data in August 2016.

IDC now expects worldwide big data analytics revenue to be at $274.3 billion by 2022—but the 2018 research reveals that as much as 73% of corporate data goes unused.

In short, at most organizations, it seems that no one is actively curating data, determining which data and data combinations are most useful, and getting the most value out of data for analytics.

Kelly Stirman, vice president of strategy and chief marketing officer at Dremio, which delivers data query tools for data lakes, wrote about this last year.He said data curation is a gap in data management because it’s not regularly performed by the business analysts, IT engineers, and data scientists that manage data.

“Data curators fill this gap and streamline the process of sourcing, organizing, and accelerating data for analysis,” Stirman said. “They know the data and understand the analytics workloads better than data engineers because they are closer to the business units. Data curators also have a good understanding of the types of systems that store the data and the types of tools that can be used for processing the data, even if they are not practitioners of these technologies themselves. They have up-to-date knowledge about data sets, their provenance, and what data curation is needed.”

SEE: Prescriptive analytics: An insider’s guide (free PDF) (TechRepublic)

Where does data curation fit into organizations? Stirman said being a data curator is a new role and that curation could be performed by a business analyst, a data engineer, or even a data scientist.

I argue that a good place for data curation is in the database group, since this group is all about the data and has a thorough technical understanding of where the data resides, and knows the relationships between different datasets and databases.

What a data curator adds to the database function is a contextual understanding of the business and of the data that matters most. This enriches the business knowledge of data and data combinations in the database group, and for the company overall. Data curation can also identify data that is of relatively little use and eliminate it.

SEE: Building an effective data science team: A guide for business and tech leaders (free PDF) (TechRepublic)

For the data curation function to succeed in organizations, CIOs and database administrators must actively support the function and the person—an initial challenge for any new role that is trying to establish itself. But if they do, the quality of their big data analytics will improve.

“Few (organizations) curate their data wisely, identifying only the best data and communicating it effectively, so all stakeholders can move forward with confidence,” said Divya Singh, a big data developer. “When you take this approach, your data becomes more than just data. It becomes a powerful tool that leads to better outcomes.”