Data curation is the art of maintaining the value of data. A data curator does this by collecting data from many different sources and then aggregating and integrating it into an information source that is many times more valuable than its independent parts. During this process, data might be annotated, tagged, presented, and published for various purposes. The goal is to keep the data valuable so it can be reused in as many business applications as possible.

“Through the curation process, data are organized, described, cleaned, enhanced, and preserved for public use, much like the work done on paintings or rare books to make the works accessible to the public now and in the future,” according to ICPSR, which provides data stewardship and rich data resources to the scientific and academic communities. “With the modern Web, it’s increasingly easy to post and share data. Without curation, however, data can be difficult to find, use, and interpret.”

SEE: Quick glossary: Big data (Tech Pro Research)

Data curation in IT: Then and now

Data curation is just now starting to enter corporate parlance because of big data and the need to aggregate many different types of data from diverse sources to form a unique picture of a business situation.

In the not-so-distant days of storing and maintaining data that only came from transactional systems of record (SOR), IT performed rudimentary data curation through the processes of data retention and archiving. Decisions on which data to keep were driven largely by regulators and by how far back end user departments felt data needed to be stored. Little effort went into the inherent value that might be locked into data, or how data could be transformed into something larger and more useful.

In the last 24 months, these historical methods of data retention and value are starting to shift for a couple of reasons.

  • Industry prognosticators and companies are beginning to think about their data as a corporate asset that even has a physical presence on corporate balance sheets.
  • Companies are beginning to understand that they can’t just continue to blindly “store up” the vast piles of data streaming into them without developing a way to value this data and to determine which data has present or potential value, and which will always virtually remain useless.

Two examples of data curation

In addition, organizations are thinking of compelling use cases in data curation, and how the inherent value of each data element can be enriched by uniquely combining it with other elements to yield a breakthrough business application. One of these applications involves mapping, document integration, and 3D simulations that attorneys are starting to use in courtrooms to demonstrate a point.

“Ultimately, the goal of any attorney is to get the jury to understand the case facts as they see them, so anything you can do to educate the jury to the forensics is extremely helpful,” said Jason Fries, CEO of 3D-Forensic, which provides 3D forensics video and simulation solutions. “Attorneys tend to use this integrated evidence as a two-stage system. In stage one, they educate the jury on the forensic process performed to create the analysis. In stage two, they use the integrated 3D product to help explain the opinion of other experts involved in the case.”

SEE: Clear out dark data to make room for useful big data

In another example, the Geological Survey of Alabama (GSA) is in the process of reviving decades of dark (dead) data that could provide value. The GSA’s mission is to explore, characterize, and report Alabama’s mineral, energy, water, and biological resources in support of economic development, conservation, management, and public policy for the betterment of the state’s citizens, communities, and businesses. As part of that effort, the GSA is undertaking data curation to discover which of this data has locked-in value, even if it is old, that can be redirected to the benefit of users.

In both cases, the goals are to extract maximum value out of data by examining it for both present and future usefulness, and then recombine it with other data into integrated data visions that take data value to a new level.

How to get started with data curation

First, companies can inject additional data assessments into their reviews of data with end users that evaluate how data can be used or redirected. One way this can be done is by making data retention reviews a collaborative process across business functions. The collaboration enables users who ordinarily wouldn’t be exposed to some types of data to evaluate if there are ways that this data can be plugged in and used in their own departmental analytics processes.

Second, IT and the business should articulate rules governing data purges. Presently, there is fear of discarding any data, no matter how useless. Rules for data purges are needed so organizations don’t clog up on-premises and cloud storage, without even knowing what types of data they are squirreling away.

Third, companies should consider adding a data curator, which is a librarian-like curation function, to their big data and analytics staffs. Sensing the need, some universities are beginning to offer programs and courses in data curation.