Known Limitations

Clinical trial sponsors need to understand not just what an AI scoring system can do, but where its boundaries are. This page documents the known limitations of the automated SALT alopecia severity scoring technology, explains why each limitation exists, and describes how it is managed. Transparency about limitations is a regulatory expectation (ISO 14971, EU AI Act Article 13) and a prerequisite for informed protocol design.

Shadow and parting misclassification

Natural scalp partings, shadows from overhead lighting, and thin or light-coloured hair can be misclassified as areas of hair loss. The segmentation model uses pixel-level analysis of hair-bearing vs. non-hair-bearing scalp, and shadows and partings share visual features with actual alopecic patches.

Impact: This can lead to overestimation of hair loss, particularly in patients with fine or sparse hair who do not have alopecia, or in images captured under uneven lighting.

Mitigation:

Confidence thresholding is calibrated to balance sensitivity (detecting real hair loss) against specificity (not misclassifying partings or shadows)
The imaging protocol requires consistent, even lighting and specifies that hair should be in a natural position without styling to cover loss
The DIQA quality gate rejects images with severe lighting issues before they reach the segmentation model

No prospective clinical agreement data yet

Automated SALT scoring is deployed in a Phase 3 clinical trial for adverse event monitoring, but prospective agreement metrics (ICC, MAE, Pearson r) between AI-computed SALT scores and investigator SALT assessments have not yet been published.

The system is validated at the model level — the hair loss surface quantification model achieves RMAE 7.08% (95% CI: 5.63%–8.93%) on 800 independent test images. However, end-to-end clinical agreement (AI score vs. dermatologist score on the same patient at the same visit) requires prospective data from the ongoing deployment.

Mitigation: The acceptance criterion for the ongoing clinical investigation is ICC > 0.75. Data will be published as it becomes available. In the interim, the model-level validation provides confidence that the segmentation is accurate, and the SALT formula aggregation is deterministic (weighted sum of quadrant percentages).

What the model-level validation tells you

An RMAE of 7.08% means the model's hair loss percentage estimate is, on average, within 7 SALT points of the expert consensus on a 0–100 scale. For context, expert dermatologists typically estimate SALT in 5–10% increments, and their inter-rater ICC ranges from 0.80 to 0.90. The AI provides continuous-percentage resolution with sub-percentage granularity.

Alopecia areata specific

The model is trained to distinguish alopecia areata patches from androgenetic alopecia patterns. It may not be appropriate for other types of hair loss, including:

Cicatricial (scarring) alopecia — where the scalp texture is permanently altered
Telogen effluvium — diffuse thinning without discrete patches
Traction alopecia — hair loss from sustained tension on hair follicles

Different aetiologies of hair loss have different visual presentations and clinical significance. Using automated SALT scoring for conditions it was not designed for may produce unreliable results.

Mitigation: The current deployment for adverse event monitoring (drug-induced alopecia in a MASH trial) validates the model's generalisability beyond pure alopecia areata. Protocol design should specify which types of hair loss are in scope, and the model's applicability should be confirmed during the protocol design phase.

Hair texture, colour, and density variation

Performance may vary with:

Very light-coloured hair (blonde, white, grey) on pale scalp — low contrast between hair and skin
Very dark dense hair on dark skin — low contrast in the opposite direction
Tightly coiled hair textures — different visual presentation of hair density and scalp visibility

The contrast between hair and scalp is the primary visual cue for the segmentation model. Low-contrast scenarios are inherently harder for any image-based assessment method, including human visual estimation.

Mitigation: The validation dataset (800 images) includes multiple hair types and colours; the reported RMAE of 7.08% accounts for this variability across the dataset. However, the trichoscopy dataset for hair follicle detection is currently 100% Fitzpatrick I–II — this is acknowledged as a known limitation with active remediation through targeted data collection from FST III–VI patients.

Non-scalp hair excluded

SALT measures scalp hair loss only. Eyebrow hair loss (assessed by ClinRO Measure for Eyebrow Hair Loss), eyelash loss, and body hair loss are not evaluated by SALT and require separate assessment instruments.

Mitigation: If your protocol requires non-scalp hair assessment, this must be handled separately — either manually by the investigator or with dedicated models if available. This should be discussed during protocol design.

Cross-cutting limitations

The following limitations apply to all indications scored by the platform, not just alopecia.

Photograph-based assessment

The AI analyses clinical photographs, not live patients. Certain clinical features that require palpation (e.g., induration, or plaque thickness) or observation under specific conditions are estimated from visual cues only. This is an inherent limitation of any remote or image-based assessment method.

Mitigation: The imaging protocol standardises capture conditions (lighting, distance, angle), and the DIQA quality gate rejects images that do not meet minimum quality standards for focus, lighting, framing, and resolution. The acceptance criterion for each AI model is non-inferiority to expert inter-rater variability on the same photographs, ensuring the AI is at least as consistent as dermatologists working from the same modality.

Fitzpatrick skin type V–VI performance

Performance is lower for darker skin types due to the global underrepresentation of Fitzpatrick V–VI skin in dermatology image datasets. This is an industry-wide challenge that affects both AI systems and human assessors.

Mitigation: Stratified performance metrics are published transparently (see Performance Across Skin Types). Active dataset diversification is ongoing through targeted data sourcing (DDI, SkinDeep, Full Spectrum Dermatology) and post-market clinical follow-up (PMCF) monitoring. All Fitzpatrick groups currently exceed minimum acceptance thresholds.

Subjective ground truth

The reference standard for severity scoring is the mathematical consensus of multiple expert dermatologists — not an objective measurement. The AI cannot be “more correct” than the experts it was trained against.

Mitigation: This is not a limitation of the AI specifically, but of the clinical assessment itself. The same constraint applies to any human rater. The consensus of 2–3 independent experts is the best available approximation of truth and the same standard used by the FDA and EMA for reference standards in dermatology clinical trials. The AI matching this consensus represents the realistic ceiling of performance.

Model version specificity

All performance metrics reported in this documentation apply to a specific validated model version. Model updates — including retraining, architecture changes, or threshold adjustments — require full re-validation per IEC 62304 before deployment.

Mitigation: The model version is locked at study initiation. No mid-study model updates occur. This ensures that every patient in a trial is scored by the same model, preserving endpoint integrity throughout the study.

Decision support, not autonomous diagnosis

The system provides severity scoring to support clinical decisions. It does not replace clinical judgement, and it does not make autonomous diagnostic or treatment decisions. All AI-generated scores should be interpreted by qualified healthcare professionals within the context of the patient’s overall clinical presentation.

How limitations are managed

All limitations documented on this page are tracked within the formal risk management process (ISO 14971) and the software development lifecycle (IEC 62304). Each limitation has been assessed for clinical risk, and mitigations have been implemented where the residual risk is not already acceptable.

The post-market clinical follow-up (PMCF) programme under MDR Annex XIV continuously monitors real-world performance. Automated SALT scoring is currently deployed in a Phase 3 trial, providing continuous real-world validation data. Any new limitation identified through post-market surveillance triggers a formal risk assessment and, if necessary, corrective action.

Shadow and parting misclassification​

No prospective clinical agreement data yet​

Alopecia areata specific​

Hair texture, colour, and density variation​

Non-scalp hair excluded​

Cross-cutting limitations​