Clinical Evidence and Validation
This page presents the clinical validation evidence for the AIHS4 scoring system, including the M-27134-01 clinical trial results that demonstrate the AI's reliability substantially exceeds manual IHS4 inter-rater agreement.
M-27134-01 clinical trial validation
The AIHS4 system was validated in the M-27134-01 clinical trial for hidradenitis suppurativa:
"Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa" (2023)
Study design
| Parameter | Detail |
|---|---|
| Study | M-27134-01 |
| Indication | Hidradenitis suppurativa |
| Design | Observational non-interventional study based on remote evaluation of clinical trial images |
| Objective | Evaluate the performance and reliability of AIHS4 within a clinical trial context |
| User group | Dermatologists |
Performance results
| Metric | AIHS4 | Acceptance criterion | State-of-the-art (manual) |
|---|---|---|---|
| Inter-observer ICC | 0.727 (95% CI: 0.66–0.79) | ≥ 0.70 | ICC 0.47 (95% CI: 0.32–0.65) |
| ICC variability | 0.10 | < 0.15 | 0.10 |
Visual comparison
0.727
AIHS4 Inter-observer ICC
Acceptance: ≥ 0.70 — Passed
0.47
Manual IHS4 Inter-rater ICC
State-of-the-art (literature)
How to interpret the results
An ICC of 0.727 means that 72.7% of the total score variance is attributable to true patient differences (the signal), rather than measurement noise. For a count-based severity measure like IHS4, this is a high level of reproducibility.
Compared to manual IHS4 inter-rater ICC of 0.47 (where less than half the variance is signal), AIHS4 represents a 55% improvement in signal-to-noise ratio. The AI does not just match human performance — it substantially exceeds it.
- ICC ≥ 0.75 is generally considered "good" reliability in clinical measurement. AIHS4 at 0.727 approaches this threshold.
- Manual IHS4 ICC of 0.47 falls in the "fair" range, reflecting the inherent difficulty of visually distinguishing abscess vs. nodule and identifying fistulae consistently across raters.
- ICC variability of 0.10 (equal to the state-of-the-art) confirms that the AI's reliability is stable, not inflated by a few easy cases.
Perfect reproducibility
Beyond ICC, AIHS4 offers a characteristic that manual scoring cannot: zero intra-rater variability. The identical image always produces the identical score. Every site produces comparable data without calibration exercises or inter-rater reliability training.
Acceptance criteria methodology
The acceptance criteria for AIHS4 are based on non-inferiority to published inter-rater variability:
| Criterion | Rationale |
|---|---|
| ICC ≥ 0.70 | Pre-specified threshold for "good" reliability, above the literature benchmark of 0.47 |
| ICC variability < 0.15 | Ensures reliability is consistent across the dataset, not driven by outliers |
Both criteria are met by the production AIHS4 model.
Regulatory-grade validation pathway
The clinical evidence follows a structured regulatory pathway:
| Standard | Scope | Application to HS scoring |
|---|---|---|
| IEC 62304 | Software lifecycle processes | The AI scoring pipeline follows a documented development lifecycle |
| ISO 14971 | Risk management | Systematic risk analysis including lesion classification failure modes |
| IEC 62366-1 | Usability engineering | Validated for investigator use at clinical trial sites |
| MEDDEV 2.7/1 Rev 4 | Clinical evaluation | Clinical evidence compiled per structured methodology |
| MDR Annex XIV | Clinical evaluation and PMCF | Post-market clinical follow-up |
Related validation studies
The same AI architecture used for HS scoring has been validated across multiple conditions:
| Condition | Scoring system | Key metric | Status |
|---|---|---|---|
| Acne | ALADIN / IGA | Cohen's κ = 0.53 | Published |
| Psoriasis | APASI / PASI | Component RMAE ≤ 0.153 | Published |
| Alopecia | Automated SALT | RMAE = 7.08% | Deployed in Phase 3 |
| Atopic dermatitis | ASCORAD / SCORAD | Pilot validated | Published |
Cross-condition validation strengthens the evidence for the underlying technology platform.
For the full list of clinical evidence across all indications, see the clinical validation section.