Expert: Data integrity should be part of the return-to-work conversation

Look at several models that forecast the future scenarios of COVID-19 cases before making decisions about reopening offices and resuming services.

How Google Cloud users can combat coronavirus-themed phishing emails
7:42

As data models predicting the course of the coronavirus continue to guide life and death decisions, data quality and governance should be part of the discussion, one expert says. Before making more return-to-work decisions, company executives and government officials should give the source of COVID-19 data as much importance as the analysis of the data.

Karen Way, managing director of the Health Plan Data & Intelligence unit at NTT Data Services, said that every discussion of COVID-19 predictive models should include caveats that describe the limitations of the data. She works with health insurance companies to analyze data and develop new strategies and data management solutions.

"Data governance and data quality are very often the first things to get forgotten, especially in a pandemic where there is an urgency to get the information out," she said.

SEE: Coronavirus: Critical IT policies and tools every business needs (TechRepublic Premium)

Way said that one of the first data quality issues she noticed early on in the epidemic was that most people who had been tested were already in the hospital.

 "With the lack of testing, the only thing we could use to build models was information on people who were tested because they were already in the hospital," she said. "This skews the data as well as how accurately we measure the problem."

SEE: GDPR: A cheat sheet

Way recommended that companies look at several models that forecast the future scenarios of Covid-19 cases to make decisions about reopening offices and services and not rely on one predictive model.

 "Our best defense is data and understanding that data and using it to make the best decisions possible even though we know the data is limited," she said.

Setting rules for counting and analyzing data directly impacts the quality of the analysis. Way used the example of how coronavirus deaths have been counted differently in Italy and New York. Italy counted almost every death as virus related during the peak days of the epidemic while New York took a more conservative approach. Both entities are revising the total count.

"Maybe Italy counted deaths that weren't related to the virus but in New York City, the numbers are being revised upward," she said.

When looking at a collection of data or the results of an analysis, Way recommends looking at how the data was collected as well as the characteristics of the data set such as demographic including age, gender, and race.

"Sometimes unconsciously you select data that supports pre-conceived notions, and I see that happening with Covid-19," she said.
 
Way suggested that managers err on the side of caution when making workforce decisions around Covid-19 and always take into account the potential error factor.

"If you've only looked at men in the ICU who tested positive, you should caveat your decision making by saying, 'We don't know how this will affect women,'" she said.

The best approach is to be honest about the limits of the data and include appropriate context about the data set, such as how it was collected. Way said that she has less confidence in models that don't include this kind of context.

The other complicating factor is that there is no one generic data model that can reflect the conditions in a meat packing plant vs. an office building vs. nursing home. Also, implementing data governance rules and standardization across takes time.

Way is working with healthcare clients to understand the financial impact of the coronavirus epidemic on the industry in the short term and the long term.

"Anyone who is hospitalized is going to become high cost and high risk, and not everybody is going to have access to remdesivir," she said. "Most of the major health plans have waived copayments relative to hospital stays, they realize people are out of work."

Way's data analysis also will help clients figure out how to pay for future treatments related to Covid-19.

"Will a coronavirus vaccine be treated like the flu vaccine and people will get it for free or will there be a co-pay?" she said.

Implementing data governance

The goal for data governance is to manage data provenance and access rights on the organizational level, according to Matthew Carroll, co-founder and CEO of data governance firm Immuta.

"Everyone in an organization can rapidly gain access to data in a compliant way that is recognized and monitored to help ensure success no matter the role (compliance, data scientists, leadership)," he told TechRepublic's sister site ZDNet.

Data management company Collibra recommends these best practices for guiding data governance:

  1. Focus on the operating model

  2. Identify data domains

  3. Identify critical data elements within the data domains

  4. Define control measurements


In 2019, data research firm Forrester released a study on the privacy laws of 61 countries, finding that many nations spent 2018 passing stringent privacy regulations and more would continue to do so in 2019.

The California Consumer Privacy Act  and other new privacy laws have added financial penalties for companies that don't have a plan for disposing of data after a certain amount of time. Data governance policies could address these "secure disposal laws" and help companies avoid penalties for holding on to data for longer than they should.

Also see

QA Quality Assurance and Quality Control Concept

Quality Assurance and Quality Control Concept - Modern graphic interface showing certified standard process, product warranty and quality improvement technology for satisfaction of customer.

Image: Blue Planet Studio/Getty Images/iStockphoto