Clinical Evidence and Validation

This page presents the clinical validation evidence for the AIHS4 scoring system, including the M-27134-01 clinical trial results that demonstrate the AI's reliability substantially exceeds manual IHS4 inter-rater agreement.

M-27134-01 clinical trial validation

The AIHS4 system was validated in the M-27134-01 clinical trial for hidradenitis suppurativa:

"Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa" (2023)

Study design

Parameter	Detail
Study	M-27134-01
Indication	Hidradenitis suppurativa
Design	Observational non-interventional study based on remote evaluation of clinical trial images
Objective	Evaluate the performance and reliability of AIHS4 within a clinical trial context
User group	Dermatologists

Performance results

Metric	AIHS4	Acceptance criterion	State-of-the-art (manual)
Inter-observer ICC	0.727 (95% CI: 0.66–0.79)	≥ 0.70	ICC 0.47 (95% CI: 0.32–0.65)
ICC variability	0.10	< 0.15	0.10

Visual comparison

0.727

AIHS4 Inter-observer ICC

Acceptance: ≥ 0.70 — Passed

0.47

Manual IHS4 Inter-rater ICC

State-of-the-art (literature)

How to interpret the results

What ICC 0.727 means for your trial

An ICC of 0.727 means that 72.7% of the total score variance is attributable to true patient differences (the signal), rather than measurement noise. For a count-based severity measure like IHS4, this is a high level of reproducibility.

Compared to manual IHS4 inter-rater ICC of 0.47 (where less than half the variance is signal), AIHS4 represents a 55% improvement in signal-to-noise ratio. The AI does not just match human performance — it substantially exceeds it.

ICC ≥ 0.75 is generally considered "good" reliability in clinical measurement. AIHS4 at 0.727 approaches this threshold.
Manual IHS4 ICC of 0.47 falls in the "fair" range, reflecting the inherent difficulty of visually distinguishing abscess vs. nodule and identifying fistulae consistently across raters.
ICC variability of 0.10 (equal to the state-of-the-art) confirms that the AI's reliability is stable, not inflated by a few easy cases.

Perfect reproducibility

Beyond ICC, AIHS4 offers a characteristic that manual scoring cannot: zero intra-rater variability. The identical image always produces the identical score. Every site produces comparable data without calibration exercises or inter-rater reliability training.

Acceptance criteria methodology

The acceptance criteria for AIHS4 are based on non-inferiority to published inter-rater variability:

Criterion	Rationale
ICC ≥ 0.70	Pre-specified threshold for "good" reliability, above the literature benchmark of 0.47
ICC variability < 0.15	Ensures reliability is consistent across the dataset, not driven by outliers

Both criteria are met by the production AIHS4 model.

Regulatory-grade validation pathway

The clinical evidence follows a structured regulatory pathway:

Standard	Scope	Application to HS scoring
IEC 62304	Software lifecycle processes	The AI scoring pipeline follows a documented development lifecycle
ISO 14971	Risk management	Systematic risk analysis including lesion classification failure modes
IEC 62366-1	Usability engineering	Validated for investigator use at clinical trial sites
MEDDEV 2.7/1 Rev 4	Clinical evaluation	Clinical evidence compiled per structured methodology
MDR Annex XIV	Clinical evaluation and PMCF	Post-market clinical follow-up

The same AI architecture used for HS scoring has been validated across multiple conditions:

Condition	Scoring system	Key metric	Status
Acne	ALADIN / IGA	Cohen's κ = 0.53	Published
Psoriasis	APASI / PASI	Component RMAE ≤ 0.153	Published
Alopecia	Automated SALT	RMAE = 7.08%	Deployed in Phase 3
Atopic dermatitis	ASCORAD / SCORAD	Pilot validated	Published

Cross-condition validation strengthens the evidence for the underlying technology platform.

For the full list of clinical evidence across all indications, see the clinical validation section.

M-27134-01 clinical trial validation​

Study design​

Performance results​

Visual comparison​

How to interpret the results​

Perfect reproducibility​

Acceptance criteria methodology​

Regulatory-grade validation pathway​

Related validation studies​