Advertisement

Performance of Machine-Learning Classifiers in the Diagnosis of Pigmented Skin Lesions

Advertisement

Key Points

  • The highest-performing human readers—27 experts with more than 10 years of experience—diagnosed a mean of 18.78 out of 30 cases correctly, while the top 3 machine algorithms achieved a mean of 25.43 correct diagnoses.
  • The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set.

In a study conducted under the supervision of researchers from the MedUni Vienna, human experts competed against computer algorithms in diagnosing pigmented skin lesions. The algorithms achieved better results when diagnosing, but had decreased performance for out-of-distribution images. These findings were published by Tscahndl et al in The Lancet Oncology.

Methods

The International Skin Imaging Collaboration (ISIC) and the MedUni Vienna organized an international challenge to compare the diagnostic skills of 511 physicians with 139 computer algorithms (from 77 different machine learnings labs). A database of more than 10,000 images was used as a training set for the machines. This database included benign (moles, sun spots, senile warts, angiomas, and dermatofibromas) and malignant (melanomas, basal cell carcinomas, and pigmented squamous cell carcinomas) pigmented lesions.

Of the 511 human readers, 283 were board-certified dermatologists, 118 were dermatology residents, and 83 were general practitioners. Each participant had to diagnose 30 randomly selected images out of a test set of 1,511 images.

Results

The highest-performing human readers—27 experts with more than 10 years of experience—diagnosed a mean of 18.78 out of 30 cases correctly, while the top 3 machine algorithms achieved a mean of 25.43 correct diagnoses.

This did not surprise first author Philipp Tschandl, PhD, of the MedUni Vienna, who said in a press release, “Two-thirds of all participating machines were better than humans; this result had been evident in similar trials during the past years.”

However, the difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11.4%, 95% confidence interval = 9.9–12.9 vs 3.6%, 0.8–6.3; P < .0001).

“The computer only analyzes an optical snapshot and is really good at it. In real life, however, the diagnosis is a complex task. Physicians usually examine the entire patient and not just single lesions. When humans make a diagnosis, they also take additional information into account, such as the duration of the disease, whether the patient is at high or low risk, and the age of the patient, which was not provided in this study,” explained Dr. Tschandl.

The authors concluded, “State of the art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research.”

Disclosure: For full disclosures of the study authors, visit thelancet.com.

The content in this post has not been reviewed by the American Society of Clinical Oncology, Inc. (ASCO®) and does not necessarily reflect the ideas and opinions of ASCO®.


Advertisement

Advertisement




Advertisement