Saltar al contenido principal

Clinical Evidence and Validation

This page explains how the alopecia severity scoring technology is validated, what the performance metrics mean, and how the evidence pathway supports regulatory-grade use.

How the ground truth is established

Before looking at any performance numbers, it is critical to understand what the numbers are compared against.

How the ground truth was established

Expert dermatologist SALT assessments performed independently by 2–3 dermatologists specialising in hair disorders. Each dermatologist evaluates the same set of scalp photographs and assigns a SALT score per quadrant and globally. The consensus (ground truth) is computed as the mathematical average of their scores.

Why this approach: SALT scoring, while more objective than many dermatological scales due to its percentage-based approach, still involves subjective estimation of hair loss extent. The best approximation of truth is the consensus of multiple experts.

SALT is more objective than scales like IGA because it measures percentage of hair loss rather than qualitative descriptors. However, visual estimation of percentage from photographs introduces variability. Expert inter-rater ICC for SALT is typically 0.80–0.90, which sets the achievable ceiling for any automated system.

Why SALT is more objective than other dermatological scales

Unlike scales such as IGA (which rely on qualitative descriptors like "mild" or "moderate"), SALT measures a quantitative outcome: the percentage of scalp affected by hair loss. This makes SALT inherently more objective and less susceptible to inter-rater variability than most dermatological severity scales.

However, variability still exists because dermatologists must visually estimate the percentage of hair loss from photographs or direct inspection. Studies report inter-rater ICC values of 0.80–0.90 for SALT, which is good but not perfect. This residual variability is exactly what automated scoring eliminates.

The advantage of automation for SALT

Unlike subjective scales where the AI must replicate clinical judgement, SALT scoring is fundamentally a measurement task: what percentage of this area lacks hair? This is precisely the kind of task where pixel-level image analysis excels. The AI computes exact percentages from segmentation rather than estimating them visually, providing higher granularity (continuous percentages vs. 5–10% increments) and perfect reproducibility.

Understanding the metrics

The following metrics are used to evaluate the agreement between ASALT scores and expert SALT assessments:

Intraclass correlation coefficient

The degree of agreement between two or more raters measuring the same quantity. In this context: how closely ASALT scores agree with expert dermatologist SALT assessments.

Perfect score: 1.0 would mean perfect agreement. For SALT scoring, individual dermatologists achieve ICC ~0.80–0.90 against each other, making this the realistic ceiling.

Why it matters: ICC is the standard measure of inter-rater reliability for continuous scales in clinical research. It tells you whether the AI's SALT estimates are as consistent as expert assessments.

0.91: Excellent0.750.9: Good0.50.75: Moderate00.5: Poor

Mean absolute error

The average magnitude of scoring errors in SALT points. An MAE of 5.0 means that, on average, ASALT's score differs from the expert consensus by 5 percentage points on the 0–100 scale.

Perfect score: 0.0 would mean zero error. Individual dermatologists achieve MAE of 5–10 SALT points against each other.

Why it matters: For a 0–100 scale, an MAE under 10 means the AI is within the range of expert disagreement. This is critical for detecting treatment responses such as SALT 50 or SALT 75.

05: Excellent510: Good1020: Acceptable20100: Poor

Pearson correlation

The strength and direction of the linear relationship between ASALT scores and expert SALT scores. Measures whether the AI tracks the same severity trend as dermatologists.

Perfect score: 1.0 would mean perfect linear correlation. Individual dermatologists achieve ~0.85–0.95 against each other for SALT.

Why it matters: High Pearson correlation means ASALT and dermatologists rank patients in the same severity order, which is essential for detecting treatment response.

0.91: Very strong0.70.9: Strong0.50.7: Moderate00.5: Weak

Clinical trial deployment

Phase 3 deployment for adverse event monitoring

ASALT has been deployed in a Phase 3 clinical trial (MASH indication) for monitoring drug-induced alopecia as an adverse event. In this context:

  • 4-perspective scalp imaging (left, right, top, back) is performed at baseline and subsequent visits
  • The system monitors ASALT score evolution and triggers automated email notifications when hair loss increases by >=25% from baseline
  • Site investigators confirm the alert by clinical assessment and, if confirmed as an adverse event, initiate enhanced monitoring with scalp photographs every 2 months until resolution

This deployment validates the technology's ability to:

  • Produce consistent, reproducible SALT scores across multiple investigator sites
  • Detect clinically meaningful changes in hair loss over time
  • Support automated safety monitoring workflows in large multi-centre trials

Positioning for severity measurement

While the initial deployment focused on adverse event monitoring, the technology is equally applicable — and more commonly needed — for alopecia areata treatment trials where SALT is the primary efficacy endpoint. The same scoring methodology provides:

  • Continuous SALT scores with sub-percentage resolution
  • Automated SALT response classification (SALT 50/75/90/100) at each visit
  • Longitudinal severity tracking with visual comparison across visits
  • Zero inter-rater variability, eliminating a significant source of noise in multi-site treatment trials

Regulatory-grade validation pathway

The clinical evidence follows a structured regulatory pathway:

StandardScopeApplication to alopecia scoring
IEC 62304Software lifecycle processesThe AI scoring pipeline follows a documented development lifecycle with risk-based classification
ISO 14971Risk managementSystematic risk analysis including failure modes (misclassification of hair-bearing regions, shadow artefacts, non-alopecia hair loss)
IEC 62366-1Usability engineeringThe mobile capture application has been validated for usability at investigator sites
MEDDEV 2.7/1 Rev 4Clinical evaluationClinical evidence compiled following the structured methodology for clinical evaluation reports
MDR Annex XIVClinical evaluation and PMCFPost-market clinical follow-up ensures ongoing validation as the technology evolves

Ongoing clinical validation program

StudyConditionEndpointsStatus
ASALT Phase 3 deploymentAlopecia (adverse event monitoring)ASALT score, severity classificationActive (Phase 3 MASH trial)
DIQA validationAll conditionsImage quality assessmentPublished (JAAD 2023)
AIHS4 validationHidradenitis suppurativaIHS4 severity scoringPublished
APASI validationPsoriasisPASI severity scoringPublished
ASCORAD validationAtopic dermatitisSCORAD severity scoringPublished

For the full list of clinical evidence across all indications, see the clinical validation section.

DIQA validation

Hernández Montilla I, Mac Carthy T, Aguilar A, Medela ADermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials Journal of the American Academy of Dermatology. 2023. doi:10.1016/j.jaad.2022.11.002

J Am Acad Dermatol. 2023;88(4):927-928

  • Pearson correlation ≥0.70 with expert image quality assessment
  • Real-time evaluation of focus, lighting, framing, and resolution
  • Applicability to both clinical practice and clinical trial settings

DIQA is the image quality assessment algorithm that acts as a quality gate in the clinical trial workflow. For scalp imaging, DIQA ensures consistent focus, lighting, and framing across all four quadrant images and all investigator sites.

The same AI architecture and methodology used for alopecia scoring has been validated across multiple dermatological conditions:

Legit.HealthAutomatic International Hidradenitis Suppurativa Severity Score System (AIHS4): A Novel Tool to Assess the Severity of Hidradenitis Suppurativa Using Artificial Intelligence Published. 2025.

  • Inter-observer ICC ≥ 0.727 (95% CI: 0.66–0.79) for objective severity assessment
  • State-of-the-art comparison: ICC of 0.47 without the device vs. 0.727 with the device
  • Same deep learning architecture validated across multiple dermatological conditions

Validation has also been completed for APASI (psoriasis), ASCORAD (atopic dermatitis), and multiple MRMC studies.