Artificial intelligence framework reveals nuance in performance of multimodal AI for health care
What defines a good tool? Let’s suppose that a good tool is characterized as one that excels at its main purpose, whereas a mediocre tool breaks easily or performs poorly for its intended usage. But multimodal artificial intelligence could change the way we look at how the performance of AI tools can be improved.
MIT researchers have developed a multimodal framework for health-care analytics called Holistic AI in Medicine (HAIM), recently described in Nature’s NPJ Digital Medicine, which leverages multiple data sources to build forecast models more easily in health care settings: from identifying a number of chest pathologies like lung lesions and edema, to predicting 48-hour mortality risk and patient length-of-stay. And to do it, they created over 14,000 AI models for testing.
In AI, most tools are single-modality tools, meaning that they synthesize one category of information to generate results — for example, feeding a machine learning model thousands of lung cancer CT scans so that it learns how to correctly identify lung cancer from medical images.
Moreover, most multimodality tools are heavily reliant on medical imaging and tend to weigh other factors with less importance, even though there are a number of ways doctors can determine if someone has lung cancer or is at risk of developing lung cancer: a persistent cough, chest pains, loss of appetite, family history, genetics, etc. If an AI tool is supplemented with a more complete picture of a patient’s other symptoms and health history, could it possibly identify lung cancer or any other diseases even earlier and with more precision?
“This idea of using single data to drive important clinical decisions didn’t make sense to us,” says Abdul Latif Jameel Clinic for Machine Learning in Health postdoc and lead co-author of the study Luis R. Soenksen. “Most physicians in the world work in fundamentally multimodal ways, and would never present recommendations based on narrow single-modality interpretations of the state of their patients.”
Over two years ago, the field of AI in health care exploded. The amount of funding in AI-enabled digital health startups had doubled from its previous year to $4.8 billion, and doubled again in 2021 to $10 billion.
At the time, Soenksen, Jameel Clinic executive director Ignacio Fuentes, and Dimitris Bertsimas, the Boeing Leaders for Global Operations Professor of Management at the MIT Sloan School of Management and Jameel Clinic faculty lead, decided to take a step back to reflect on what was missing from the field.
“Things were coming out left and right, but it was also a time when people were getting disenchanted because the promise of AI in health care hadn’t been fulfilled,” Soenksen recalls. “We basically realized that we wanted to bring something new to the table, but we needed to do it systematically and providing the necessary nuance for people to appreciate the benefits and downfalls of multimodality in health care.”
The novel idea they cooked up was seemingly common sense: Building a framework to easily generate machine learning models capable of processing various combinations of multiple data inputs the way a doctor might take into account a patient’s symptoms and health history before making a diagnosis. But there was a marked absence of multimodal framework models in the health field, with only a few papers published about them that were more conceptual than concrete. Furthermore, when it came to developing a unified and scalable framework that could be applied consistently to train any multimodal model, single-modality models often outperformed their multimodal counterparts.
Looking at this gap, they decided it was time to assemble a team of experienced AI researchers at MIT and began to build HAIM.
“I was very fascinated by the whole idea of [HAIM] because of its potential to greatly impact our current health-care system’s infrastructure to bridge the gap between academia and industry,” Yu Ma, a PhD student advised by Bertsimas and co-author of the paper, says. “When [Bertsimas] asked me if I would be interested to contribute I was immediately on board.”
Although large quantities of data are typically viewed as a boon in machine learning, in this case the team realized that this wasn’t always the case when using multimodal systems; there was a need for a more subtle approach to evaluating data inputs and modalities.
“A lot of people do multimodality learning, but it’s rare to have a study of every single possible combination of the model, data sources, all of the hyperparameter combinations,” Ma says. “We were really trying to rigorously understand exactly how multimodality performs under different scenarios.”
According to Fuentes, the framework “opens an interesting path for future work, but we need to understand that multimodal AI tools in clinical settings face multiple data challenges.”
Bertsimas’ plans for HAIM 2.0 are already in the works. Under consideration is the inclusion of more modalities (e.g., signal data from electrocardiograms and genomics data) and ways to assist medical professionals with decision-making, rather than predicting the likelihood of certain outcomes.
HAIM is also an acronym that Bertsimas came up with, which happens to be the Hebrew word for “life.”
This work was supported by the Abdul Latif Jameel Clinic for Machine Learning in Health and the National Science Foundation Graduate Research Fellowship.