Big Data

Unstructured data: The smart person's guide

Unstructured data is approximately 80% of the data that organizations process daily. This primer covers what unstructured data is, why it enriches business data, and how it speeds up decision making.

By 2025, IDG projects that there will be 163 zettabytes of data in the world, and estimates indicate that 80% of this data is unstructured.

With structured data, data fields are aligned side-by-side in fixed record lengths, with specific data fields appearing at static locations within each record. Unstructured data does not contain a set record format—it can come in any shape or form. Unstructured data comes from documents, social media feeds, digital pictures and videos, audio transmissions, sensors used to gather climate information, and unstructured content from the web.

Learn more about unstructured data by reading this smart person's guide. We will update this resource periodically with the latest information and tips about unstructured data.

SEE: All of TechRepublic's smart person's guides

Executive summary

  • What is unstructured data? Unstructured data is data that aren't stored in a fixed record length format. Examples include documents, social media feeds, and digital pictures and videos.
  • Why does unstructured data matter? Since 80% of data that organizations see and process daily is unstructured, businesses must adapt to handle the increasing stores of unstructured data.
  • Who does unstructured data affect? Internally, almost every corporate department uses unstructured data in some form; externally, unstructured data is used to monitor and report on movements of shipments and/or assets with sensors and more.
  • When will businesses use unstructured data? Unstructured data is used in every company and organization.
  • How can companies use unstructured data? The preparation and processing of unstructured data, and the ability to append it to systems of record or store it for future use, is available in an on-premise system and in the cloud.

SEE: Free PDF download—How to build a successful data scientist career (TechRepublic)

What is unstructured data?

bigdataistock-683827222aelitta.jpg
Image: iStock/aelitta

Unstructured data is any data that aren't stored in a fixed record length format, which is known as transactional data. Examples of unstructured data include:

  • A paper-based vendor invoice that comes into your accounting department;
  • A product photo that gets associated with an order;
  • A bar code that is assigned to an item in your warehouse;
  • A doctor treating a patient needs more than a patient's written records—she might also require x-rays, MRIs, and other types of imagery;
  • A police officer wants to be able to call up a photo of a possible suspect, as well as the suspect's driving record; and
  • A surveyor mapping a section of land needs more than his own measurements; he combines this data with satellite and topographical imagery that enhances his work.

The records in your accounting, inventory management, or order systems are not unstructured data because those records are all structured in the same, uniform way. Every record consists of a series of contiguous data fields, and one of these fields is the access point, or key, into the records (e.g., the key to an order record might be the order number).

SEE: Online Course: The Big Data Bundle (TechRepublic Academy)

To process unstructured data, your systems and databases need be able to read this data by looking at a key or a reference point; for example, the key to a stored photo of Jim Smith would likely be Jim Smith. After that, the system must access the entire data object (i.e., the photo of Jim Smith).

The difference between an unstructured object of Jim Smith and a structured record of him is that the unstructured data object would be Jim's photo, which would require quite a bit of storage; the structured record of Jim Smith would be information about him, such as his address or his phone number. There is much more data in a full photo of Jim than in a small, fixed string of data elements like an address, phone number, etc. that you would find in a structured data record; for this reason, unstructured data, with the large objects that are assigned to its keys, requires more processing and more storage.

Additional resources:

Why does unstructured data matter?

Because 80% of the data in companies is unstructured, organizations need to understand the types of unstructured data they are accumulating and the best ways to process and store this data for business advantages. Without data management strategies and guidance in these areas, companies run the risks of not capitalizing on unstructured data, failing to keep up with competitors, or storing more unstructured data than they really need, thereby running up data center costs.

In a majority of cases, unstructured data is ultimately related back to the company's structured data records. As an example, every x-ray or MRI image for a patient is related back to the patient's record in the hospital's record system. The patient record in the record system is enriched with unstructured data that is linked to it, and the doctor gets a more complete picture of the patient.

This is the value of unstructured data: It enriches corporate data and enables leaders to work smarter.

Additional resources:

Who does unstructured data affect?

Unstructured data can affect everyone at the company, from the entry-level staffer to the CEO.

Internally, almost every corporate department uses unstructured data in some form—from engineering with its rastor drawings to marketing with its social media engagements and photo imagery, to financial and office operations with scanned documents.

Externally, unstructured data is used to monitor and report on movements of shipments and/or assets with sensors, to monitor school campuses with security cameras, and to exchange videos, photos, images, audio transmissions, etc. with suppliers and other business partners.

All of these unstructured data users need access, policies, and guidelines for using the data.

SEE: Big data policy (Tech Pro Research)

The organizations that do the best job of using unstructured data assess their business and determine where they need unstructured data most. Then they find ways to blend this data with systems of record transactional data so that employees have more complete information at their fingertips.

This unstructured data could be strategic, such as a map that integrates sales and demographics information with warehouse locations so a company can plan its next facility moves.

On the operational side, unstructured data can be appended to manufacturing rework orders to show how and where parts failed, so problems can be anticipated and eliminated in the future.

In both cases, leaders have access to more information than ever before, which speeds up decision making.

Additional resources:

When will businesses use unstructured data?

It was reported in 2016 that the Economist Intelligence Unit interviewed 476 executives around the world, and according to Forbes contributor Bernard Marr, the report concluded that: "Big Data analysis, or the mining of extremely large data sets to identify trends and patterns, is fast becoming standard business practice. Global technology infrastructure, too, has matured to an extent that reliability, speed, and security are all typically robust enough to support the seamless flow of massive volumes of data, and consequently encourage adoption."

The same report stated that 58% of companies in the US, 56% of companies in Europe, and 60% of companies in Asia were deriving high business value from their use of data.

Additional resources:

How can companies use unstructured data?

Every business has unstructured data—the key is knowing how to use and process it to the smartest business advantage, and which unstructured data to store and what to discard. It is very expensive to store and maintain unstructured data because of its size, the processing power it requires, and the fact that not all of it is useful.

Organizations use many types of unstructured data at face value, such as photographs, documents, audio and video recordings, and web content. The next stage is figuring out how to get at the data so that all of the information it contains is fully used. In the case of a photograph, the photo can be linked with spatial events through GPS technology that links it to a specific location; or, the photo can be enriched by being linked to contextual and associative data, such as who is in the photo, the year the photo was taken, and so on.

The preparation and processing of unstructured data, and the ability to append it to systems of record, or to store it for future use, is available in an on-premise system and in the cloud. In both cases, data analysts must groom the unstructured data so it can work hand-in-hand with other types of unstructured and structured data. Adequate storage and processing must also be provisioned.

The ways that companies use unstructured data varies widely. Larger enterprises often have their IT staffs and experts prepare and process unstructured data in house. Small, midsize, and even some large businesses use cloud-based unstructured data preparation and processing. Cloud-based options such as Amazon AWS, Microsoft Azure, and IBM Cloud are services that help make big data affordable and attainable to companies of any size.

Additional resources:

About Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...

Editor's Picks

Free Newsletters, In your Inbox