SHARE

Are companies linking document management into their big data strategies? They should be

Many big data implementations are leaving document management systems behind, but DMSes house major stores of unstructured data. Should data analysts think again?

Written By

Mary Shacklett

Feb 8, 2022

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

The earliest document management systems (DMSes) appeared in the 1980s. They moved beyond physical file cabinets and PC server storage and appeared on networks where multiple people and departments within a single company could gain access to a trove of documents in electronic form.

Since then, document management systems have been the primary movers and shakers behind companies’ efforts to digitalize. These systems scan, index, store, retrieve and transform documents. They have been instrumental in moving paper-based documents and images out of file cabinets and storage rooms and onto widely distributed networks that everyone uses.

Read about the state of contract lifecycle management in this Icertis report

Get 30+ datapoints from Icertis-sponsored research that demonstrate where AI-powered contracting technology stands today, where it is going, and how your organization can seize the moment in 2024 and beyond.

Visit Icertis

The question is: Are companies linking document management with their big data strategies?

In many cases, companies are lagging.

SEE: Microsoft Power Platform: What you need to know about it (free PDF) (TechRepublic)

The big data repositories that are being built combine systems of record data with incoming Internet of Things and outside source data that is unstructured. Document management systems are used in this process, but there isn’t necessarily a concerted effort in big data strategies to maximize all of the data in a DMS.

On the DMS side, users search data and digitize and organize it — but other big data technologies, such as data cleaning and normalization, artificial intelligence, machine learning and more advanced algorithm development, aren’t yet broadly used.

Of course, there are niche exceptions.

One of these exceptions is the legal discovery process that pores through reams of documents that are often housed in corporate document management systems. The goal of a legal discovery software is to analyze unstructured documents and to use AI and machine learning to determine which documents (out of thousands) are most relevant to a potential upcoming legal case, and which are not.

In this instance, there are no lengthy corporate arguments about whether it’s necessary to import documents into a big data repository from a DMS. The use case stands by itself.

However, in other cases, a compelling reason to mesh a DMS with a big data repository might not be there. For instance, does a genome sequencing experiment really rely on what a DMS would typically include?

SEE: Digital transformation: 3 things your organization can’t afford to overlook (TechRepublic)

The takeaway is not really whether a DMS is needed for a big data repository but simply that it should be considered. The DMS so often becomes an outlier for big data strategy because data scientists and IT data analysts have a tendency to overlook it.

What should companies do to ensure that their DMS systems are included as potential sources for information that flows into a big data repository? Here are four steps.

Document the types of data that are in the DMS so they can be evaluated for inclusion in big data repositories.
Verify that the DMS systems that the company is using have a full set of APIs (application programming interfaces) that make data transfers into big data repositories easy.
Develop a standard extract, transform and load methodology that can take incoming data from a DMS and prepare it for use in a big data repository.
Determine any outgoing results from big data analytics that should be exported to DMS systems for user access.

Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.