Researchers tested the quality of synthetic health data generated by the MDClone platform and shared the results in “Spot the difference: Comparing results of analysis from real patient data and synthetic derivatives,” published by Oxford University Press on behalf of the American Medical Informatics Association in December 2020.
Image: Randi E Foraker and Sean C Yu in JAMIA Open

MDClone, a synthetic data company, has a new partnership with the Veterans Health Administration that it says will make it easier to customize healthcare for patients.

MDClone’s health data platform uses actual patient data to create synthetic data sets. The new data set matches the original without sharing private health data from patients. The company says this will make it easier for researchers and doctors in the VHA to use data that they might not have had access to before. Doctors and researchers at the VHA will use the MDClone platform to analyze data and determine what treatment plans work best for certain groups of patients.

With most healthcare research projects, anyone who has access to data from patients has to complete privacy and data management training and be formally approved as part of the Institutional Review Process. This review process protects patient data and individual privacy. It also limits the use of data from these projects to a small number of people.

SEE: Natural language processing: A cheat sheet (TechRepublic)

Using the original data to create a matching synthetic data set opens up these data sets to all researchers in the VA as well as other experts around the world. Josh Rubel, the chief commercial officer, said that this new access will reduce innovation time at an organizational level from years to weeks.

“Our job is to make sure whatever data we generate has the same math as the original data so that it tells the same story,” he said.

The project at the VA will focus on behavioral health issues such as suicide as well as COVID-19 treatment plans. Doctors and data scientists will use the MDClone platform to query data sets and test treatment plans. The platform also can help healthcare providers prioritize limited resources for people who need it the most at a given time.

“If a vet has called a suicide hotline and they have a social history that includes a leading indicator for potential suicide risk, where should you slot that patient for outreach?” he said. “The VA wants to get in front of bad things happening to those veterans and they need to know where to aim their efforts.”

The MDClone platform organizes healthcare data in a longitudinal structure to make it easier for non-data scientists to run queries. The platform has access control features so that approved users can see both the original data set and the synthetic data.

“This gives researchers the chance to iterate and test more hypotheses,” he said. “They could start with the synthetic data and then switch over to the real data as they refine the query.”

Researchers at Washington University in St. Louis published a paper in JAMIA Open in December 2020 analyzing the synthetic data produced by the MDClone platform. To test the quality of the synthetic data, researchers ran three queries against actual patient data and against synthetic data generated from the original data. The researchers found that the results from each data set were similar enough to draw the same conclusions. The authors concluded that synthetic data “will accelerate the conduct of data-driven research studies” and “reduce barriers to data sharing.”

SEE: Navigating data privacy (free PDF) (TechRepublic)

Amanda Purnell, a senior innovation fellow at the VHA Innovation Ecosystem, will lead the project on the VHA side. This program is part of the Care and Transformational Initiatives (CTI) at the VHA and will test innovative care models that can be scaled to impact veteran care.

Doctors and researchers have used MDClone data to develop treatment plans for chronic kidney disease, to build a machine learning model for predicting sepsis, track side effects of cancer treatments, and optimize insulin dosing for people with diabetes.

The company also is working with the National Institutes of Health on a COVID-19 project. MDClone is helping to build out a dataset from clinical data as part of the NIH’s National COVID Cohort Collaborative. Approved researchers can get access to this dataset that includes data from both actual patients and synthetic data created by MDCLone.

MDClone announced in July 2020 the launch of The Global Network, which includes partners from doctors and hospitals in the US, Canada, and Israel, where the company was founded. Partners in this work include Intermountain Healthcare and Jefferson Health and Thomas Jefferson University in the US and Rambam Health Care Campus in Israel.