Many studies claiming that artificial intelligence (AI) is as good as, or better than, human experts at interpreting medical images are of poor quality and potentially exaggerated.
The claim is published in the BMJ, in a paper that argues this could pose a risk for the “safety of millions of patients”.
The findings raise concerns about the quality of evidence underpinning many of these studies and highlight the need to improve design and reporting standards.
AI is an innovative and fast-moving field with the potential to improve patient care and relieve overburdened health services.
Deep learning is a branch of AI that has shown particular promise in medical imaging.
The volume of published research on deep learning is growing, and some media headlines that claim superior performance to doctors have fuelled hype for rapid implementation.
However, the methods and risk of bias of studies behind these headlines have not been examined in detail, it is claimed.
To address this, a team of researchers reviewed the results of published studies over the past 10 years, comparing the performance of a deep-learning algorithm in medical imaging with expert clinicians.
They found just two eligible randomised clinical trials and 81 non-randomised studies. Of the non-randomised studies, only nine were prospective (tracking and collecting information about individuals over time) and just six were tested in a “real world” clinical setting.
The average number of human experts in the comparator group was just four, while access to raw data and code (for independent scrutiny of results) was limited.