Image: undefined, Getty Images/iStockphoto
A study conducted by Dell EMC in 2014 estimated that we would reach 1.7 megabytes of data produced for every person, every second in 2020. This is a daunting amount of data for companies to manage, let alone try to aggregate into a meaningful data mart or report that can be used for analytics.
SEE: TechRepublic Premium editorial calendar: IT policies, checklists, toolkits, and research for download (TechRepublic Premium)
“Aggregating big data in a way that provides results for an organization is hard,” said Helene Servillon, head of partner marketing at Voicebase, a speech analytics company. “A lot of companies are currently doing it, but few are doing it well. While 85 percent of companies have a goal to become more data-driven, only 37 percent feel that their efforts have been
successful.”
One data management challenge is ensuring that you are working with a “single version of the truth,” which enterprises can accomplish by normalizing and/or eliminating data. However, when you begin to aggregate data from disparate data sources, you also need a methodology for data aggregation.
Here are four best practices for data aggregation:
1. Understand your company’s short- and long-term analytics objectives
Your goal might be getting to know your customer’s buying preferences today, but tomorrow you might want to aggregate data from new sources to determine customers’ hobbies and interests so you can predictively sell to them. This data could be in structured or unstructured form. Your company’s goals could be in improving customer experience personalization or in learning more about your product manufacturing and engineering processes to improve product quality. In either case, there are likely to be both immediate and longer-term goals that will change your data aggregation requirements.
SEE: Tableau business analytics platform: A cheat sheet (free PDF download) (TechRepublic)
Your data aggregation strategy should reflect this. You might not need customer lifestyle or defective product return data today, but you might need aggregation of outside data from new sources in the future.
2. If you purchase data from outside partners, ensure that their governance and privacy standards are compatible with your own
Healthcare data is a prime example. If you are acquiring data from an outside source on the genetic makeups of patients with certain types of diseases for purposes of analysis and treatment, the data most likely will need to be anonymized to protect the privacy of patients. There is an even greater need for anonymization of data when you are promising your own patients that their data will remain private.
3. Determine how data will be stored and how users will access it
Is your intent to deliver aggregated data to users in a specific functional area in your company, or to departments across the board? This will dictate whether you choose to aggregate and store data in a large data repository with many different access choices or in a smaller data mart that is tailored to the needs of a specific user set.
4. Automate data integration as much as possible
Whether you want to aggregate data from your call center audio, your website text messages, or from outside pay-for-data sources, you’ll need an easy way to vet and integrate this data into your target data repository or data mart.
What you want to avoid is the necessity of having to hand-code every data integration interface. The preferred methods of integration for data aggregation are through standard APIs or through automated integration tools that can perform much of the data integration for you.