Tackling the Problem of Data Imbalancing for Melanoma Classification
Full Text
Share
Malignant melanoma is the most dangerous type of skin cancer, yet melanoma is the most treatable kind of
cancer when diagnosed at an early stage. In this regard, Computer-Aided Diagnosis systems based on machine
learning have been developed to discern melanoma lesions from benign and dysplastic nevi in dermoscopic
images. Similar to a large range of real world applications encountered in machine learning, melanoma classification
faces the challenge of imbalanced data, where the percentage of melanoma cases in comparison
with benign and dysplastic cases is far less. This article analyzes the impact of data balancing strategies at
the training step. Subsequently, Over-Sampling (OS) and Under-Sampling (US) are extensively compared in
both feature and data space, revealing that NearMiss-2 (NM2) outperform other methods achieving Sensitivity
(SE) and Specificity (SP) of 91.2% and 81.7%, respectively. More generally, the reported results highlight that
methods based on US or combination of OS and US in feature space outperform the others
Tots els drets reservats