Mammogram AI shows more promise, performing as well as radiologists in breast cancer screening study

A study of mammograms taken of more than 55,000 women found that an artificial intelligence program could potentially serve as a backup set of computerized eyes when searching for the signs of breast cancer.

Developed by the South Korea-based company Lunit, the Insight MMG program has previously collected green lights for use in the U.S. and Europe. The AI-powered algorithm highlights nodules that may be malignant while automatically estimating breast tissue density.

The prospective study, conducted in Stockholm, Sweden, put Lunit’s AI up against the traditional European guideline-recommended approach of double reading, which involves relying on two trained radiologists to come to a diagnosis—and explored the different permutations available when combining one or two specialists with AI.

Not only did researchers find that pairing up a single human radiologist with AI software led to effectively similar, if not slightly better, results—with a 4% boost in detection rates—but they also demonstrated that the use of AI alone did not lead to significantly fewer positive breast cancer findings.

A single pass with AI spotted 98% of the same number of confirmed cases as two radiologists, a performance the researchers described as non-inferior. At the same time, they found that the addition of AI to the traditional two-radiologist team was a superior method. The study’s results were published last week in The Lancet Digital Health.

“While double readings by two radiologists have been established as the common practice across Europe and Australia, many countries are experiencing great difficulties due to the shortage of radiologists,” study leader Fredrik Strand, M.D., Ph.D., an associate professor at the Karolinska Institutet, said in the company’s announcement.

“This prospective study lays the groundwork for the widespread adoption of AI in breast cancer screening by filling the role of one radiologist, which in turn can reduce medical costs and lead to healthcare reimbursement,” Strand added.

The study enlisted 11 breast radiologists at Stockholm’s Capio Sankt Göran Hospital representing a median experience level of 17 years, with some reaching as high as three decades. The trial was not randomized: Lunit’s AI program worked in the background during examinations of each patient, with no changes to the hospital’s standard workflow, though radiologists were blinded to the AI’s conclusions before making their own.

“This study signifies a milestone in healthcare, ushering in an era where AI seamlessly complements and elevates the standards of breast cancer screening. AI is redefining cancer screening standards,” said Lunit CEO Brandon Suh.

A separate commentary published alongside the study in the Lancet Digital Health said that while AI appears poised to improve the efficiency of breast cancer screening, the health impacts are still unknown, and it’s unlikely researchers will be able to realize the impact on long-term cancer death rates.

“However, surrogate endpoints, such as interval cancer rates, will be crucial to understanding potential health impact (if any) and, importantly, to reassure those providing screening programmes about the safety of substituting AI for human readers,” wrote Nehmat Houssami, Ph.D., and Luke Marinovich, Ph.D., both of the University of Sydney in Australia.

“With the emergence of new evidence from prospective trials of AI for breast screening, population screening programmes and imaging services will also need to consider participants' views and expectations of the performance of AI before it can be implemented in the screening process. Maintaining public trust in cancer screening programmes is crucial to ensuring that potential benefits from AI screening are fully realised,” they added.

This week, Lunit also spotlighted a separate study of Insight MMG, conducted in the U.K. and published in the journal Radiology.

This retrospective research applied the AI to two test sets of mammogram data, with each containing what researchers described as challenging cases collected from 60 patients—findings that had previously been reviewed by a total of 552 human readers.

The study came to a similar conclusion as the Lancet Digital Health paper: The results pointed to no significant difference in the AI program’s accuracy compared to a two-person team of trained specialists.

“There are no other studies to date that have compared such a large number of human reader performance in routine quality assurance test sets to AI, so this study may provide a model for assessing AI performance in a real-world setting,” said Yan Chen, a professor of digital screening at the University of Nottingham.