Zum Hauptinhalt springen

Clinical Evidence and Validation

This page explains how the psoriasis severity scoring technology is validated, what the performance metrics mean, and why the results demonstrate that each AI component model performs at the level of expert dermatologists.

How the ground truth is established

How the ground truth was established

For intensity models: mathematical consensus of 2–3 independent expert dermatologists scoring each component (erythema, desquamation, induration) on ordinal scales. For surface models: expert pixel-level segmentation masks.

Why this approach: PASI component scoring is inherently subjective. The acceptance criterion is non-inferiority to expert inter-rater variability: the AI's error must not exceed the disagreement between dermatologists.

Matching expert inter-rater variability is effectively a perfect score. Dermatologists themselves disagree on erythema by RMAE ~0.14, on desquamation and induration by ~0.17. Any AI system achieving RMAE at or below these values is performing at expert level; there is no meaningful way to do better on a subjective clinical assessment.

Non-inferiority to expert variability

For PASI components, the acceptance criterion is based on non-inferiority to expert inter-rater variability:

  • Expert dermatologists disagree on erythema intensity with RMAE ~0.14
  • Expert dermatologists disagree on desquamation and induration with RMAE ~0.17
  • If the AI's error is at or below these values, it is performing at expert level
Matching inter-rater variability = 100% of achievable performance

For each PASI component, the expert inter-rater variability is the realistic ceiling. Dermatologists cannot agree with each other any better than this on a subjective clinical assessment. The AI matching this ceiling means it is as reliable as adding another dermatologist to the panel.

Why there is no "perfect" score

In PASI scoring, there is no objective measurement; severity is a clinical judgement. Unlike measuring blood pressure (where 120/80 is the same on every device), PASI component scores depend on a dermatologist's subjective assessment of erythema intensity, scaling severity, and plaque thickness.

This means:

  1. The ground truth is the consensus of multiple expert dermatologists, not an absolute truth
  2. Individual dermatologists themselves disagree with the consensus, and their own RMAE values define the realistic ceiling
  3. These dermatologist-level numbers are the realistic ceiling: no scoring system, human or AI, can reliably exceed the agreement that experts have with each other
  4. A system that matches dermatologist performance is clinically excellent: it means the AI is as reliable as adding another expert to the panel
The key insight for sponsors

Matching the consensus is effectively a perfect score. The consensus IS the best available approximation of truth. When the AI achieves an RMAE of 0.13 for erythema and the average dermatologist achieves 0.14 on the same task, the AI is not "merely close"; it is outperforming the typical inter-rater agreement of trained dermatologists. The correct frame of reference is not "how close to 0?" but "how close to what expert dermatologists achieve?" By that measure, the AI is at or above 100%.

Understanding the metrics

RMAE (relative Mean Absolute Error)

The model’s average error as a proportion of the scale range, compared against expert inter-rater variability. For a 0–9 intensity scale, RMAE of 0.13 means the average error is 13% of the scale range (~1.2 points).

Perfect score: 0.0 would mean zero error. Expert dermatologists themselves have RMAE of 0.14–0.17 against their own consensus — this is the realistic ceiling.

Why it matters: PASI accuracy depends on accurate component scoring. If each component (erythema, desquamation, induration) is scored within the range of expert variability, the composite PASI score will be clinically reliable.

ExcellentAt expert levelAcceptableInsufficient
ALADIN: 0.13Dermatologists: 0.14

Intersection over Union (IoU)

The overlap between the AI’s segmentation of psoriatic area and the expert’s segmentation. IoU of 0.61 means 61% of the AI’s identified area matches the expert’s annotation.

Perfect score: 1.0 would mean perfect pixel-level agreement. For skin lesion segmentation, IoU above 0.5 is considered good, and above 0.7 is excellent.

Why it matters: Affected area is a critical PASI component. Accurate segmentation ensures the BSA percentage is reliable, which directly impacts the total PASI score.

PoorModerateGoodExcellent
ALADIN: 0.61Dermatologists: 0.61

Production model performance

The following results are from the production models validated per the Quality Management System (IEC 62304 / ISO 14971). Each model is tested against expert inter-rater variability as the acceptance criterion.

PASI component model performance

MetricProduction validation
ALADINDermatologists
RMAE0.130.14
MetricProduction validation
ALADINDermatologists
RMAE0.1530.17
MetricProduction validation
ALADINDermatologists
RMAE0.1510.17
MetricProduction validation
ALADINDermatologists
IoU0.610.61

At a glance: AI vs Expert across all components

ErythemaDesquamationIndurationBSA (IoU)0.130.1530.1510.61
AIExpert threshold

Summary

ComponentAI RMAE/IoUExpert RMAE/IoUVerdict
Erythema intensity0.130.14AI is better than expert inter-rater agreement
Desquamation intensity0.1530.17AI is within expert inter-rater range
Induration intensity0.1510.17AI is within expert inter-rater range
Erythema surfaceIoU 0.61IoU 0.61AI matches expert segmentation

All four components pass the acceptance criteria. The composite APASI score inherits this accuracy: when each component is scored at expert level, the aggregate PASI score is reliable.

How to read this table

  • Erythema RMAE 0.13 vs. 0.14: The AI's erythema scoring error is lower than the disagreement between expert dermatologists. The AI is literally more consistent than dermatologists are with each other on redness assessment.

  • Desquamation RMAE 0.153 vs. 0.17: The AI's scaling assessment error is well within the range of expert disagreement. The acceptance criterion is met with margin.

  • Induration RMAE 0.151 vs. 0.17: Same pattern as desquamation; the AI scores plaque thickness within expert-level variability.

  • Erythema surface IoU 0.61 vs. 0.61: The AI's pixel-level segmentation of affected areas matches expert segmentation masks exactly. This replaces the most subjective component of manual PASI (BSA estimation) with objective measurement.

What this means for your trial

  1. Every PASI component is scored at expert level: no single component is a weak link in the composite score
  2. BSA estimation is objectified: the largest source of manual PASI inter-rater variability (affected area estimation) is replaced by pixel-level segmentation
  3. The AI is perfectly reproducible: unlike a human rater, the same image always produces the same component scores, with zero intra-rater variability
  4. Across every site, every time: no calibration drift, no fatigue, no subjective inconsistency between investigators
  5. This reduces noise in PASI change-from-baseline, potentially enabling smaller sample sizes or greater statistical power to detect PASI 75/90/100 response rates
Regulatory positioning

AI-computed APASI endpoints from Legit.Health have been accepted by regulators as secondary endpoints and for adverse event detection in clinical submissions. For primary registration endpoints, the system provides standard PASI scores aligned with the established 0–72 scale. The APASI scoring is validated in a real-world Phase 3 programme for moderate-to-severe plaque psoriasis across 130+ sites in 12 countries.

APASI peer-reviewed publication

Mac Carthy T, Dagnino D, Medela A, Fernández G, Aguilar A, Martorell A, Gómez-Tejerina P, Roustán-Gullón GArtificial Intelligence-Based Quantification to Assess the Automatic Psoriasis Area and Severity Index JEADV Clinical Practice. 2025. doi:10.1002/jvc2.70143

JEADV Clin Pract. 2025

  • APASI provides a robust AI-driven framework for psoriasis severity assessment
  • Delivers rapid, objective, and precise evaluations of all PASI components
  • Integration into clinical and research workflows enhances disease monitoring
  • Reduces evaluation costs compared to manual multi-reader assessments

DIQA validation

Hernández Montilla I, Mac Carthy T, Aguilar A, Medela ADermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials Journal of the American Academy of Dermatology. 2023. doi:10.1016/j.jaad.2022.11.002

J Am Acad Dermatol. 2023;88(4):927-928

  • Pearson correlation ≥0.70 with expert image quality assessment
  • Real-time evaluation of focus, lighting, framing, and resolution
  • Critical for standardising image quality in multi-center psoriasis trials

Regulatory-grade validation pathway

The clinical evidence follows a structured regulatory pathway:

StandardScopeApplication to PASI scoring
IEC 62304Software lifecycle processesEach component model follows a documented development lifecycle
ISO 14971Risk managementSystematic risk analysis for each scoring component
IEC 62366-1Usability engineeringValidated for both in-clinic and decentralised patient capture
MEDDEV 2.7/1 Rev 4Clinical evaluationClinical evidence compiled per structured methodology
MDR Annex XIVClinical evaluation and PMCFPost-market clinical follow-up

Ongoing clinical validation program

StudyConditionEndpointsStatus
APASI validation studyPlaque psoriasisPASI components (erythema, desquamation, induration, BSA)Published (JEADV Clin Pract 2025)
JNJ-77242113 Phase 3 trialModerate to severe plaque psoriasisAPASI scoring via decentralised imagingOngoing (real-world deployment)
DIQA validationAll conditionsImage quality assessmentPublished (JAAD 2023)

For the full list of clinical evidence, see the clinical validation section.