Small nodules, big problems: AI's role in thyroid nodule diagnosis

November 11, 2019 | Federal Practitioner

REPORTING FROM ATA 2019

CHICAGO – A new image-analysis algorithm for benign thyroid nodules that uses a technique similar to facial recognition showed good sensitivity and specificity, with the potential to reduce biopsies by more than 50%.

The negative predictive value of the ultrasound analysis algorithm was 93.2%, a figure approximating the false-negative rate of about 5% that is seen in fine-needle aspiration of thyroid nodules, said Johnson Thomas, MD, at the annual meeting of the American Thyroid Association.

“Millions of people have thyroid nodules,” many of which are detected incidentally, said Dr. Thomas, an endocrinologist with the Mercy health care system in Springfield, Mo. Fewer than 10% of thyroid nodules turn out to be malignant, but each year, millions of patients undergo biopsies to determine the status of their thyroid nodules.

Faced with evaluating a thyroid nodule, an endocrinologist can currently turn to a risk-stratification scheme, such as those developed by the American College of Radiology and the American Thyroid Association. However, there’s a big subjective component to risk stratification – significant inter- and intraobserver variation has been observed, said Dr. Thomas, and not all nodules are classifiable. The result is a system that still has low specificity and positive predictive value, he said.

Even after a decision to proceed to biopsy, one in seven thyroid nodule biopsies will not produce a definitive diagnosis, he said.

“We are doing millions of thyroid biopsies based on very subjective criteria to find thyroid cancer in a very small percentage of the population, with an invasive technique that may not be diagnostic one out of seven times,” Dr. Thomas said in summing up the current medical situation as he sees it.

Dr. Thomas, who writes his own computer code, said he was searching for a reliable and explainable noninvasive technique, and one that lacked subjective room for error, to address the thyroid nodule problem.

The question was whether an artificial intelligence (AI) algorithm could match radiologist performance in classifying thyroid nodules according to the characteristics of their ultrasound images.

Other algorithms use AI to predict which nodules are malignant, but they function as “black boxes” – a common criticism of AI. The outside observer cannot ordinarily see how the AI algorithm “knows” what it knows. This characteristic of AI poses at least a theoretical problem when such algorithms are used for diagnosis or medical decision making.

Dr. Thomas’s* approach was to use a set of training data to allow the algorithm he constructed to see 2,025 images from a total of 482 nodules. The thyroid nodules used for training had been subjected to biopsy or excised in surgery, so they all had a definitive status of being benign or malignant.

Then, after the algorithm was refined, a set of 103 nodules with known malignancy status was used to test the algorithm’s sensitivity and specificity.

The algorithm, dubbed AiBx, used a convolutional neural network to build a unique image vector for each nodule. The AiBx algorithm then looked at the training database to find the “nearest neighbors,” or the images it found to be the most similar to those of the nodule being examined.

For example, said Dr. Thomas, a test image of a benign nodule would have an output from the AiBx analysis of three similar images from the database – all benign. Hence, rather than making a black-box call of whether a nodule is benign or malignant, the algorithm merely says: “This nodule resembles a benign nodule in our database.” The interpreting physician can then use the algorithm as a decision aid with confidence.

The overall accuracy of AiBx was 81.5%, sensitivity was 87.8%, and specificity was 78.5%. Positive predictive value was 65.9%.

As more images are added to the database, AiBx can easily be retrained and refined, said Dr. Thomas.

“It’s intuitive and explainable,” he added, noting that the algorithm is also a good teaching tool for residents and fellows.

“This AI model can be deployed as an app, integrated with [medical imaging systems] or hosted as a website. By using image-similarity AI models we can eliminate subjectivity and decrease the number of unnecessary biopsies,” he explained in the abstract accompanying the presentation.

However, he said that the algorithm as it currently stands has limitations: It has been tested on only 103 images thus far, and there’s the potential for selection bias.

Dr. Thomas* reported that, although he developed the AiBx algorithm, he has not drawn income or royalties from it. He reported no other relevant conflicts of interest.

SOURCE: Thomas* J et al. ATA 2019, Oral Abstract 27.

*Correction, 21/11/2019: An earlier version of this story misstated Dr. Thomas's last name.