Performance Across Skin Types
Dermatology AI must perform reliably across the full spectrum of skin tones. This page documents how Legit.Health addresses skin type diversity in training data, validation, and ongoing monitoring.
Training data diversity
The AI models are trained on 280,000+ clinical images drawn from 55+ curated datasets, including collections specifically designed for skin tone diversity:
| Dataset | Focus |
|---|---|
| Black & Brown Skin | Clinical images of dermatological conditions on darker skin tones |
| Diverse Dermatology Images (DDI) | Balanced representation across Fitzpatrick types from Stanford |
| Full Spectrum Dermatology | Wide skin tone coverage across common conditions |
| SkinDeep | Crowdsourced diverse dermatology images |
Stratified performance by Fitzpatrick type
The following metrics are from the production model's bias analysis and fairness evaluation, stratified by Fitzpatrick type groupings. All values include 95% confidence intervals.
Diagnostic accuracy
| Metric | Overall | FST I–II | FST III–IV | FST V–VI |
|---|---|---|---|---|
| Top-1 accuracy | 0.658 (0.654–0.663) | 0.686 (0.680–0.691) | 0.615 (0.606–0.624) | 0.535 (0.514–0.557) |
| Top-3 accuracy | 0.821 (0.817–0.825) | 0.850 (0.846–0.855) | 0.774 (0.766–0.782) | 0.694 (0.674–0.714) |
| Top-5 accuracy | 0.864 (0.861–0.868) | 0.891 (0.887–0.895) | 0.822 (0.815–0.830) | 0.746 (0.726–0.765) |
Clinical safety metrics
| Metric | Overall | FST I–II | FST III–IV | FST V–VI |
|---|---|---|---|---|
| AUC malignant | 0.918 (0.914–0.922) | 0.918 (0.913–0.923) | 0.919 (0.910–0.928) | 0.836 (0.794–0.877) |
| AUC pre-malignant | 0.878 (0.872–0.884) | 0.882 (0.875–0.889) | 0.879 (0.868–0.890) | 0.801 (0.763–0.840) |
| AUC urgent referral | 0.900 (0.889–0.911) | 0.913 (0.899–0.926) | 0.884 (0.868–0.900) | 0.827 (0.785–0.865) |
| AUC high-priority referral | 0.888 (0.884–0.892) | 0.890 (0.885–0.895) | 0.883 (0.876–0.891) | 0.855 (0.833–0.877) |
How to read these results
- FST I–II: Performance is highest for lighter skin types, consistent with the larger training set representation
- FST III–IV: Performance remains strong across all metrics, with no clinically significant degradation
- FST V–VI: Performance is lower, particularly for diagnostic accuracy. Confidence intervals are wider due to smaller sample sizes. Safety-critical metrics (AUC malignant, urgent referral) remain above 0.80
- All Fitzpatrick groups exceed minimum acceptance thresholds: the system meets performance criteria across the full spectrum
These metrics are from the platform's diagnostic classification model, which processes the broadest range of conditions and skin types. The clinical trial scoring models (ALADIN for acne, APASI for psoriasis) inherit the same image processing pipeline and benefit from the same training data diversity. Performance monitoring is active across all model families.
Known limitations and mitigation
FST V–VI representation: Darker skin types are underrepresented in dermatology image datasets globally. This is an industry-wide challenge, not specific to Legit.Health. The impact is visible in wider confidence intervals and lower absolute accuracy for FST V–VI.
Mitigation strategy:
- Active sourcing of diverse datasets (DDI, SkinDeep, Full Spectrum Dermatology)
- Ongoing post-market clinical follow-up (PMCF) monitoring performance by skin type
- Transparent reporting of stratified metrics, including limitations
For clinical trial sponsors: The scoring models used in clinical trials are validated against expert inter-rater variability as the acceptance criterion. This non-inferiority approach ensures that the AI is at least as consistent as dermatologists, regardless of skin type.
Ongoing monitoring
Performance across skin types is monitored as part of the post-market clinical follow-up (PMCF) programme under MDR Annex XIV. Any performance degradation detected for specific Fitzpatrick groups triggers a formal risk assessment per ISO 14971 and, if necessary, targeted model retraining.
Jetzt starten
KI-gestützte Dermatologie, validiert durch peer-reviewed Forschung. Vertraut von führenden Krankenhäusern in Europa. Füllen Sie das Formular aus, um zu sehen, wie unsere CE-gekennzeichnete Plattform Ihre Praxis transformieren kann.