Aller au contenu principal

Clinical Evidence and Validation

This page presents the clinical validation evidence for the AIHS4 scoring system, including the M-27134-01 clinical trial results that demonstrate the AI's reliability substantially exceeds manual IHS4 inter-rater agreement.

M-27134-01 clinical trial validation

The AIHS4 system was validated in the M-27134-01 clinical trial for hidradenitis suppurativa:

"Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa" (2023)

Study design

ParameterDetail
StudyM-27134-01
IndicationHidradenitis suppurativa
DesignObservational non-interventional study based on remote evaluation of clinical trial images
ObjectiveEvaluate the performance and reliability of AIHS4 within a clinical trial context
User groupDermatologists

Performance results

MetricAIHS4Acceptance criterionState-of-the-art (manual)
Inter-observer ICC0.727 (95% CI: 0.66–0.79)≥ 0.70ICC 0.47 (95% CI: 0.32–0.65)
ICC variability0.10< 0.150.10

Visual comparison

0.727

AIHS4 Inter-observer ICC

Acceptance: ≥ 0.70 — Passed

0.47

Manual IHS4 Inter-rater ICC

State-of-the-art (literature)

How to interpret the results

What ICC 0.727 means for your trial

An ICC of 0.727 means that 72.7% of the total score variance is attributable to true patient differences (the signal), rather than measurement noise. For a count-based severity measure like IHS4, this is a high level of reproducibility.

Compared to manual IHS4 inter-rater ICC of 0.47 (where less than half the variance is signal), AIHS4 represents a 55% improvement in signal-to-noise ratio. The AI does not just match human performance — it substantially exceeds it.

  • ICC ≥ 0.75 is generally considered "good" reliability in clinical measurement. AIHS4 at 0.727 approaches this threshold.
  • Manual IHS4 ICC of 0.47 falls in the "fair" range, reflecting the inherent difficulty of visually distinguishing abscess vs. nodule and identifying fistulae consistently across raters.
  • ICC variability of 0.10 (equal to the state-of-the-art) confirms that the AI's reliability is stable, not inflated by a few easy cases.

Perfect reproducibility

Beyond ICC, AIHS4 offers a characteristic that manual scoring cannot: zero intra-rater variability. The identical image always produces the identical score. Every site produces comparable data without calibration exercises or inter-rater reliability training.

Acceptance criteria methodology

The acceptance criteria for AIHS4 are based on non-inferiority to published inter-rater variability:

CriterionRationale
ICC ≥ 0.70Pre-specified threshold for "good" reliability, above the literature benchmark of 0.47
ICC variability < 0.15Ensures reliability is consistent across the dataset, not driven by outliers

Both criteria are met by the production AIHS4 model.

Regulatory-grade validation pathway

The clinical evidence follows a structured regulatory pathway:

StandardScopeApplication to HS scoring
IEC 62304Software lifecycle processesThe AI scoring pipeline follows a documented development lifecycle
ISO 14971Risk managementSystematic risk analysis including lesion classification failure modes
IEC 62366-1Usability engineeringValidated for investigator use at clinical trial sites
MEDDEV 2.7/1 Rev 4Clinical evaluationClinical evidence compiled per structured methodology
MDR Annex XIVClinical evaluation and PMCFPost-market clinical follow-up

The same AI architecture used for HS scoring has been validated across multiple conditions:

ConditionScoring systemKey metricStatus
AcneALADIN / IGACohen's κ = 0.53Published
PsoriasisAPASI / PASIComponent RMAE ≤ 0.153Published
AlopeciaAutomated SALTRMAE = 7.08%Deployed in Phase 3
Atopic dermatitisASCORAD / SCORADPilot validatedPublished

Cross-condition validation strengthens the evidence for the underlying technology platform.

For the full list of clinical evidence across all indications, see the clinical validation section.