Do You Trust AI Doctors?

December 24, 2020 - 7 minutes read

Emerging technologies like IoT, artificial intelligence (AI), and wearables are changing the patient and provider experience for the better. Ever since AI applications, like identifying cancer from medical images or predicting disease from a patient’s medical history, were introduced to healthcare, we’ve seen AI routinely match or beat providers’ performance.

So where do we draw the line on how deep we let the AI technology go in our healthcare system? And can we fully trust the results that come out of an AI’s calculations, especially since those calculations are black-boxed (hidden) from those who are interested in learning the reasons behind the AI’s final decision? A few months ago, academics debated these questions against developers from Google Health.

The Root Problem

AI-enabled medical applications, like those we mentioned earlier, span a variety of fields and disciplines, from academia to industry enterprises like Amazon, Google, and Apple. Everyone involved has different incentives, ideas for the future, and thoughts about where the technology stands today. The two sides engaged in a heated discussion in Nature, one of the most well-known science journals. The two sides included AI researchers from the University of Toronto, Stanford, Johns Hopkins, MIT, Harvard, and the Princess Margaret Cancer Centre (PMCC), as well as researchers from Google Health.

It began when Google Health published a study about their AI’s phenomenal results in screening for breast cancer. The company said that its AI vastly outperformed professional radiologists in finding and diagnosing breast cancer. It also added that this technology could be generalized beyond training populations, which is a very confident statement, given that large datasets of high-quality medical images are incredibly rare.

The academics argued that the study was non-replicable, which is a must for published studies so that other researchers can help bolster the claim or warn others that the authors are just blowing hot air. Google Health didn’t include sufficient descriptions about its model and code, requiring readers to blindly trust the results and model’s performance. Google Health’s rebuttal stated that they couldn’t release that information in order to protect patients and the AI.

In science, discourse like this is necessary to push the boundaries of ethics, morality, and regulations surrounding emerging technologies. When big-name companies and schools argue, even the public takes notice.

Academics’ Arguments

Without a replicable method and enough details, published scientific studies are at risk of losing credibility and trustworthiness. Anyone could publish something, make up stellar results, and get praise and credit from the media. But science is formed in replicated experiments that return the same results, follow-up experiments that explore a slightly different question, and novel ideas on how to apply the study. Thus, in all published studies, you’ll read about the exact number of people used in the research, how they were split up, what technologies and tools were used, what experimental procedures were carried out and when, and much more.

But with AI mixed in, it’s difficult to publish replicable studies. Dr. Benjamin Haibe-Kains is a senior scientist at the Toronto-based PMCC. He explains, “In computational research, it’s not yet a widespread criterion for the details of an AI study to be fully accessible. This is detrimental to our progress.” As any developer knows, asking five developers to write code to solve one problem will likely result in five different programs. So without having the exact program, a developer might have a different way of accomplishing the same thing using the very few details he or she knows, which in turn could impact the performance of the AI.

The academics said Google Health’s study is just an example of a major, far-reaching problem. The study lacked transparency and not just for the AI algorithm’s code but for the dataset used for training the algorithm as well. Many medical datasets are under license and can cause patient privacy problems, but that’s not an excuse as datasets can be anonymized. Ultimately, the authors wrote, “such resources can considerably accelerate model development, validation and transition into production and clinical [implementation].”

Google Health’s Rebuttal

Google Health’s team, led by Dr. Scott McKinney, centered around protecting patients and the AI’s code. They also mentioned that regulations require that the AI algorithm remain private, for example, because it could be classified as “medical device software”. Without the proper permissions and regulatory body overseeing the release, developers, patients, and providers can be at risk. Google Health also mentioned that the largest dataset they used is accessibly online by putting in an application to Google. But the other datasets cannot be shared at all.

The Next Big Debate

This is just the start of a series of debates that multiple stakeholders will have over new AI algorithms, their performance, their underlying black box calculations, and the effects on patients and providers. These arguments could rage on for decades, as the line between “right” and “wrong” is still unclear and can vary from stakeholder to stakeholder. One thing is for sure: because this debate was held publicly, it opened up the possibility of hosting these discussions with input from patients. If nothing else, the researchers on both sides piqued the public’s interest in the pitfalls of AI, publishing AI research, and medical experimentation.

Would you trust an AI to diagnose you from your medical scans, or do you prefer a physician to make a decision before looking at what the AI thinks? Let us know in the comments below!