Scoring Methodology
This page provides a detailed technical description of the AI scoring pipeline for psoriasis severity assessment via the Automatic Psoriasis Area and Severity Index (APASI).
The PASI formula
The Psoriasis Area and Severity Index (Fredriksson & Pettersson, 1978) is the most widely used measure of psoriasis severity in clinical trials. It assesses four body regions, each scored for three intensity dimensions and affected area:
| Region | Symbol | Weight () | % of BSA |
|---|---|---|---|
| Head | 0.1 | 10% | |
| Trunk | 0.3 | 30% | |
| Upper extremities | 0.2 | 20% | |
| Lower extremities | 0.4 | 40% |
Each region is scored for:
- Erythema (): Redness, 0–4
- Desquamation (): Scaling, 0–4
- Induration (): Plaque thickness, 0–4
- Affected area (): BSA involvement, 0–6 (0%, <10%, 10–29%, 30–49%, 50–69%, 70–89%, 90–100%)
Maximum score: .
AI component models
APASI automates each PASI component with a dedicated AI model. This modular approach means each component can be independently validated.
Erythema intensity model
Measures the redness of psoriatic plaques from clinical images. The model outputs a continuous intensity value that maps to the 0–4 ordinal scale.
- Production RMAE: 0.13 (95% CI: 0.119–0.142)
- Acceptance criterion: ≤0.14 (expert inter-rater RMAE)
- Result: PASS, the AI's erythema scoring error is below the disagreement between dermatologists
Desquamation intensity model
Measures the severity of silvery-white scaling on psoriatic plaques.
- Production RMAE: 0.153 (95% CI: 0.139–0.167)
- Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
- Result: PASS, within expert inter-rater variability
Induration intensity model
Measures the thickness and elevation of psoriatic plaques.
- Production RMAE: 0.151 (95% CI: 0.137–0.167)
- Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
- Result: PASS, within expert inter-rater variability
Erythema surface quantification model
Segments psoriatic areas at pixel level to compute the percentage of body surface area affected per region. Unlike the intensity models (which score severity), this model measures extent.
- Production IoU: ≥0.61 (validated against expert segmentation masks)
- Acceptance criterion: ≥0.61 (expert segmentation IoU)
- Result: PASS, matches expert segmentation quality
This model replaces the most subjective component of manual PASI (BSA estimation) with objective pixel-level measurement. In manual PASI, the affected area score () requires a dermatologist to visually estimate what percentage of each body region is covered by psoriatic plaques and then map that estimate to a 0–6 ordinal scale. Studies have consistently shown BSA estimation to be the largest single source of PASI inter-rater variability; dermatologists routinely disagree by 1–2 ordinal levels on affected area.
The AI segmentation model eliminates this subjectivity entirely. It identifies psoriatic pixels in each body region, computes the exact percentage of affected surface area, and maps the result to the scale. Because the measurement is pixel-level, two images with visually similar affected areas will always receive the same score, a level of consistency that is impossible to achieve with human estimation.
Body region segmentation
For full-body PASI assessment, the AI segments each full-body image to identify the four PASI regions. Body landmarks are detected to delineate head, trunk, upper extremities, and lower extremities, ensuring each region's score is computed from the correct anatomical area.
When perspectives overlap, the AI uses Body region segmentation to prevent double-counting:
- Body segmentation: The AI segments each full-body image to identify the four PASI regions (head, trunk, upper extremities, lower extremities).
- Region assignment: Each pixel in the image is assigned to a specific body region using anatomical landmarks.
- Erythema surface quantification: Within each region, the AI segments psoriatic areas to compute the affected BSA percentage.
- Intensity scoring: Close-up images of each region are analysed for erythema, desquamation, and induration intensity.
Severity classification
| PASI score | Severity | Clinical description |
|---|---|---|
| 0 | Clear | No psoriasis (PASI 0) |
| 5 | Mild | Mild psoriasis (PASI 1–5) |
| 10 | Moderate | Moderate psoriasis (PASI 5–10) |
| 20 | Severe | Severe psoriasis (PASI 10–20) |
| 72 | Very severe | Very severe psoriasis (PASI >20, maximum 72) |
PASI response thresholds
In clinical trials, treatment efficacy is typically measured as the proportion of patients achieving:
| Response | Definition | Typical use |
|---|---|---|
| PASI 75 | ≥75% improvement from baseline PASI | Traditional primary endpoint |
| PASI 90 | ≥90% improvement from baseline PASI | Increasingly used as primary endpoint for biologics |
| PASI 100 | 100% improvement (complete clearance) | Secondary endpoint |
APASI enables automated PASI response calculation by computing baseline and follow-up scores from the same AI pipeline.
Reproducibility
Inter-rater variability: eliminated
Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.
Intra-rater variability: eliminated
The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.
Site-to-site consistency
In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.
Impact on trial design
Reduced scoring variability means cleaner endpoint data, which translates to:
- Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
- Faster data lock — no queries related to scoring inconsistencies
- Stronger regulatory submissions — consistent, reproducible data with documented methodology
Model versioning
The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same component models throughout the trial:
- No mid-trial model updates: all four component model versions (erythema, desquamation, induration, BSA segmentation) are frozen when the study is configured
- Version tracking: model version numbers are recorded in every scored report and in the audit trail
- Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
- Reproducibility guarantee: any image set can be re-scored at any time and will produce the identical APASI score
Comparison with manual PASI
| Dimension | Manual counting | AI counting |
|---|---|---|
| Time per patient | 10–20 minutes (4 regions × 4 components) | ~5 seconds for 11 images |
| Inter-rater variability | Significant; RMAE 0.14–0.17 between dermatologists for intensity components | Zero — deterministic |
| Intra-rater variability | Documented; worsens with fatigue across long assessment sessions | Zero — no fatigue |
| BSA estimation | Highly subjective; often the largest source of PASI variability | Pixel-level segmentation (IoU ≥0.61) |
| Component granularity | Integer scores only (0–4 per component) | Continuous scores mapped to integers, enabling finer sensitivity |
| Decentralised capture | Requires in-clinic visit for manual scoring | Patient captures at home; AI scores remotely |
| Consistency across sites | Requires calibration exercises | Inherent — same models everywhere |
| Cost | Trained dermatologist time per patient per visit | Fixed per-study licensing |
Manual PASI scoring is particularly challenging because it requires 16 individual assessments (4 components × 4 regions) per patient. BSA estimation is the weakest link; studies have shown it to be the largest source of PASI inter-rater variability. APASI replaces subjective BSA estimation with pixel-level segmentation, and replaces subjective intensity grading with AI models validated against expert consensus.