Mitigating skin color bias in dermatology AI using CycleGAN-based data augmentation

Dermatological AI systems frequently underperform on darker skin tones due to severe data scarcity, with publicly available datasets overwhelmingly skewed toward light skin. This underrepresentation results in diagnostic inaccuracies that contribute to healthcare disparities. In this study, we investigated whether augmenting a dermatology dataset using a Cycle-consistent generative adversarial network (CycleGAN) to generate synthetic dark-skinned images from light-skinned counterparts could improve classification performance on real dark skin lesions. We hypothesized that a model trained on both real light skin images and synthetic dark skin images would achieve higher diagnostic accuracy on dark skin than a model trained solely on light skin data. The CycleGAN was trained on images from the Stanford Diverse Dermatological Images (DDI) and Fitzpatrick17k datasets to generate synthetic dark skin images. To evaluate the hypothesis, we compared the performance of two ResNet50 classifiers, ResNet A and ResNet B, in distinguishing benign and malignant lesions. ResNet A was trained on light skin images from the Stanford DDI and Fitzpatrick17k datasets, while ResNet B was trained on a balanced dataset of real light skin images from these datasets and CycleGAN-generated dark skin images. When evaluated on real dark skin images, ResNet B achieved a higher classification accuracy (71.95%) than ResNet A (65.16%), along with higher precision, recall, and F1 scores. These results supported our hypothesis that training on synthetic dark skin images can enhance dermatological classifier performance on underrepresented skin tones. This approach offers a promising solution for addressing bias in dermatological AI systems when real-world datasets lack diversity.