As the escalating spread of the deadly COVID-19 coronavirus in China became more urgent in late December 2019, employees of data analysis software vendor Tableau carefully analyzed the headlines to seek data that could help the company decide how the virus was affecting its own personnel and operations in China.
SEE: Tableau business analytics: Tips and tricks (free PDF) (TechRepublic)
That’s when Tableau workers discovered raw data from the outbreak that was being gathered and publicized by Johns Hopkins University, which was collecting information about the escalation and spread of COVID-19 cases from government and other sources around the world, including the World Health Organization (WHO) and the Centers for Disease Control and Prevention (CDC). But all that data, which was being entered into a web dashboard created by the Center for Systems Science and Engineering at Johns Hopkins, was messy and hard to use for most people because it was in multiple formats and required a lot of clean-up work to make it understandable.
“What Johns Hopkins has been doing with that data set is collecting day-over-day counts from situation reports” around the world, including confirmed cases, recoveries, and active cases, said Steve Schwartz, Tableau’s director of public affairs. “As you can imagine, when you bring in data from some 190 countries, you also bring in data from sub-agencies and more. It has a lot of potential for double-counting, different naming conventions, and language differences” when it is all brought into a single data set.
SEE: Coronavirus: Critical IT policies and tools every business needs (TechRepublic Premium)
That meant that to make it usable for anyone who wanted to review it and see what was happening, it was going to need a lot of data cleansing extract transform and load (ETL) procedures to make it more palatable for the masses, Schwartz said.
For example, the data in the Johns Hopkins dashboard listed the Vatican and The Holy See as separate geographic designations, when they are actually the same place and needed to be brought together to ensure accurate data. “The data needs to be standardized so people can work with that data,” Schwartz said.
By early February, Tableau, which builds data visualization software, began hearing from some of its enterprise customers who were having the same problems making use of the Johns Hopkins data in its raw forms. The customers were becoming inundated by the “dirty” data and asked Tableau for help in sorting it all out. It was a problem that Tableau was hearing about across its user community.
In mid-February, members of the Tableau online community began working to create a method to fix the data to make it more usable. In a short time, they created a Python script to accomplish this but quickly learned it wasn’t scalable to react accurately to the constant flow of new data coming in without needing to be recoded many times a day. Four Tableau “Zen Masters,” who are members of a select group of some of the top Tableau users around the world, specifically helped make the efforts possible through their work to clean, shape, and transfer the Johns Hopkins data, according to the company. Those members are Anya A’Hearn, Tamas Foldi, Allan Walker, and Jonathan Drummey, who helped make this difficult work possible and led to the next progression of the overall project.
“That’s when we switched up our approach, bringing in our Tableau Prep data management software to take the role that the Python script was playing,” Schwartz said. Tableau Prep uses visualizations to combine, shape, and clean data, making it easier to see and use the data.
And with that step, Tableau’s own “starter dashboard” was created, giving any user anywhere a place to begin to find the coronavirus statistics and information that they are seeking. Here users can find details about the number of confirmed cases and deaths from COVID-19 in a wide range of countries, as well as related metrics. All of this makes it so much easier for users, who otherwise might have tried to call up the data by themselves for analysis from the Johns Hopkins GitHub repo page–a task that requires its own level of expertise and not easy for most people.
The starter dashboard is a simple, austere tool that’s built to allow people and organizations to use or download it so they can use it to bring their own data in for their own analyses, said Schwartz.
Also created to help provide information during the crisis is a Tableau COVID-19 Data Hub, which provides even more data from a wide range of other sources on the effectiveness of social distancing, the effects of the pandemic on restaurants, detailed country and state maps, and much more through a COVID-19 Data Visualization Gallery. Those additional links to other credible data sets from other sources are designed to help provide even more information during the pandemic.
Inside the Data Hub, users will find that the consolidated data is also available directly in Tableau’s own .hyper and .tde formats, as well as in Google Sheet and CSV formats, so it can be used with other data analysis tools from other vendors. The .hyper, .tde, and CSV versions of the datasets are also available through online data catalog platform data.world, which also allows users to view and collaborate with their data in new ways.
Some real-world uses so far for the Tableau resources include a healthcare company that’s using some of the Johns Hopkins data to manage its supply chains, as well as other companies that are using the resources to help manage their human resources issues using regularly updated information on the spread of the disease so it can be blended in with their own data, Schwartz said. “It is becoming a helpful resource for them. And there’s a company that’s involved in COVID-19 testing that’s making decisions on where to move supplies for testing based on the data. By using this data, organizations can contextualize it and make decisions for their own environments.”
SEE: Tableau business analytics platform: A cheat sheet (free PDF) (TechRepublic)
All of these efforts continue to be done to make the voluminous Johns Hopkins data more accessible to a wider group of people and organizations, including ordinary citizens, who can use it and help in the global fight against this dangerous and frightening virus, Schwartz said.
“This is a very uncertain situation,” he said. “We’ve all never been through anything like this before, and data can help with public understanding. Right now every business decision-maker is facing an unprecedented situation. So, we are taking the view of providing what is useful to help get our country back to functioning.”
A wide range of other companies have been helping Tableau with these efforts, including Mapbox, Path, Snowflake, DataBlick and Starschema, Schwartz stressed. “We already have a coalition of technology partners. They are all providing really valuable resources, and it is a Tableau-driven effort where we’re all doing this together.”
As of noon eastern time Friday, the Johns Hopkins figures showed 581,502 cases of COVID-19 in some 176 countries, with 25,336 deaths so far around the globe. In the US, there are 86,012 confirmed cases, and there have been 1,301 deaths so far from the disease.
COVID-19 has quickly become an international public health emergency that is bigger than the SARS outbreak of 2003 that caused havoc around the world. Unlike SARS, though, scientists now have better genome sequencing, machine learning, and predictive analysis tools to understand and monitor outbreaks as they occur. In addition, they also have social media tools like Facebook and Twitter, which along with a wide range of other resources they can use to track the spread of diseases.