Everyone has a hunch about what keeps them healthy or makes them sick, whether or not there is evidence to back up those theories. It’s a complicated calculation because multiple factors interact to influence a person’s health: Genetics, living conditions, environmental factors, diet, family history, exercise and financial status. IBM’s new causal inference toolkit aims to analyze the multiple factors in similarly complex situations and determine what makes a difference and what doesn’t. The idea is to replace a best guess with a decision backed up by data.
Causal inference is a method of analysis that considers the assumptions, study designs and estimation strategies that allow researchers to draw causal conclusions based on data. IBM’s goal for the website is to help data scientists quantify cause and effect relationships in data. Tutorials on the site include:
- How does smoking cessation affect weight loss?
- Do agricultural techniques affect water pollution?
- How do marketing campaigns affect long-term bank deposits and purchases?
- Does job training increase earnings for underprivileged individuals?
The Causal Inference 360 Open Source Toolkit includes tutorials, background information and demonstrations. The analysis is relevant for many sectors including healthcare, agriculture and marketing in finance and banking.
IBM built the open source site to bring “long-standing machine-learning methodologies to the field of causal inference.” The resource includes methods to train causal models and evaluation methods for selecting the most appropriate method, underlying model and parameter tuning tactics.
The company has been using the toolkit to study new uses for existing prescription drugs at a research lab in Haifa, Israel. IBM researcher Michal Rosen-Zvi explained in a blog post that the team found that a drug used to treat insomnia may be able to treat dementia that often develops with Parkinson’s.
The researchers created virtual clinical trials with simulated patients and assessed the effectiveness according to outcomes documented in electronic health records and insurance claims. The team looked for drugs that showed a statistically significant effect in both EHR and claims data to consider for repurposing. As Rosen-Zvi explained, the “analysis unraveled therapeutic benefits of two drugs in decreasing the population-level incidence of dementia associated with Parkinson’s.”
She concluded that, “While our research is an important use case, there is tremendous potential to repurpose other drugs for a range of neurodegenerative and infectious diseases. And AI can be of huge help.”
Rosen-Zvi also was one of the researchers that analyzed healthcare data to understand why women skipped appointments for breast cancer screenings during 2020. The team applied advanced machine learning methods to known predictors of this common problem in healthcare as well as new factors that could be influencing the behavior. The team used causal inference methodology to infer the effect of closures on no-shows, after accounting for confounding biases. In a pre-publication research paper, the researchers state that the results “imply that a patient’s perceived risk of cancer and the COVID-19 time-based factors are major predictors.”
As part of the toolkit release, IBM also updated its open-source Python library this week. The new functionalities include:
- New models: Matching (estimator and preprocessing transformer); Overlap Weights; HEMM
- Weight models now have same fit() API as outcome models
- Updated dependency: Dropped seaborn; pandas at 0.25; scikit-learn at 0.25
This latest toolkit is a part of a collection of open source AI tools that IBM has released over the years to build trustworthy AI, including AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, AI FactSheets 360 and Uncertainty Quantification 360.