Saltar al contenido principal

Scoring Methodology

This page provides a detailed technical description of the AI scoring pipeline for psoriasis severity assessment via the Automatic Psoriasis Area and Severity Index (APASI).

The PASI formula

The Psoriasis Area and Severity Index (Fredriksson & Pettersson, 1978) is the most widely used measure of psoriasis severity in clinical trials. It assesses four body regions, each scored for three intensity dimensions and affected area:

PASI=r{h,t,u,l}wrAr(Er+Dr+Ir)\text{PASI} = \sum_{r \in \{h,t,u,l\}} w_r \cdot A_r \cdot (E_r + D_r + I_r)
RegionSymbolWeight (wrw_r)% of BSA
Headhh0.110%
Trunktt0.330%
Upper extremitiesuu0.220%
Lower extremitiesll0.440%

Each region is scored for:

  • Erythema (ErE_r): Redness, 0–4
  • Desquamation (DrD_r): Scaling, 0–4
  • Induration (IrI_r): Plaque thickness, 0–4
  • Affected area (ArA_r): BSA involvement, 0–6 (0%, <10%, 10–29%, 30–49%, 50–69%, 70–89%, 90–100%)

Maximum score: 0.1×6×12+0.3×6×12+0.2×6×12+0.4×6×12=720.1 \times 6 \times 12 + 0.3 \times 6 \times 12 + 0.2 \times 6 \times 12 + 0.4 \times 6 \times 12 = 72.

AI component models

APASI automates each PASI component with a dedicated AI model. This modular approach means each component can be independently validated.

Erythema intensity model

Measures the redness of psoriatic plaques from clinical images. The model outputs a continuous intensity value that maps to the 0–4 ordinal scale.

  • Production RMAE: 0.13 (95% CI: 0.119–0.142)
  • Acceptance criterion: ≤0.14 (expert inter-rater RMAE)
  • Result: PASS, the AI's erythema scoring error is below the disagreement between dermatologists

Desquamation intensity model

Measures the severity of silvery-white scaling on psoriatic plaques.

  • Production RMAE: 0.153 (95% CI: 0.139–0.167)
  • Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
  • Result: PASS, within expert inter-rater variability

Induration intensity model

Measures the thickness and elevation of psoriatic plaques.

  • Production RMAE: 0.151 (95% CI: 0.137–0.167)
  • Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
  • Result: PASS, within expert inter-rater variability

Erythema surface quantification model

Segments psoriatic areas at pixel level to compute the percentage of body surface area affected per region. Unlike the intensity models (which score severity), this model measures extent.

  • Production IoU: ≥0.61 (validated against expert segmentation masks)
  • Acceptance criterion: ≥0.61 (expert segmentation IoU)
  • Result: PASS, matches expert segmentation quality

This model replaces the most subjective component of manual PASI (BSA estimation) with objective pixel-level measurement. In manual PASI, the affected area score (ArA_r) requires a dermatologist to visually estimate what percentage of each body region is covered by psoriatic plaques and then map that estimate to a 0–6 ordinal scale. Studies have consistently shown BSA estimation to be the largest single source of PASI inter-rater variability; dermatologists routinely disagree by 1–2 ordinal levels on affected area.

The AI segmentation model eliminates this subjectivity entirely. It identifies psoriatic pixels in each body region, computes the exact percentage of affected surface area, and maps the result to the ArA_r scale. Because the measurement is pixel-level, two images with visually similar affected areas will always receive the same score, a level of consistency that is impossible to achieve with human estimation.

Body region segmentation

For full-body PASI assessment, the AI segments each full-body image to identify the four PASI regions. Body landmarks are detected to delineate head, trunk, upper extremities, and lower extremities, ensuring each region's score is computed from the correct anatomical area.

When perspectives overlap, the AI uses Body region segmentation to prevent double-counting:

  1. Body segmentation: The AI segments each full-body image to identify the four PASI regions (head, trunk, upper extremities, lower extremities).
  2. Region assignment: Each pixel in the image is assigned to a specific body region using anatomical landmarks.
  3. Erythema surface quantification: Within each region, the AI segments psoriatic areas to compute the affected BSA percentage.
  4. Intensity scoring: Close-up images of each region are analysed for erythema, desquamation, and induration intensity.

Severity classification

PASI scoreSeverityClinical description
0ClearNo psoriasis (PASI 0)
5MildMild psoriasis (PASI 1–5)
10ModerateModerate psoriasis (PASI 5–10)
20SevereSevere psoriasis (PASI 10–20)
72Very severeVery severe psoriasis (PASI >20, maximum 72)

PASI response thresholds

In clinical trials, treatment efficacy is typically measured as the proportion of patients achieving:

ResponseDefinitionTypical use
PASI 75≥75% improvement from baseline PASITraditional primary endpoint
PASI 90≥90% improvement from baseline PASIIncreasingly used as primary endpoint for biologics
PASI 100100% improvement (complete clearance)Secondary endpoint

APASI enables automated PASI response calculation by computing baseline and follow-up scores from the same AI pipeline.

Reproducibility

Inter-rater variability: eliminated

Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.

Intra-rater variability: eliminated

The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.

Site-to-site consistency

In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.

Impact on trial design

Reduced scoring variability means cleaner endpoint data, which translates to:

  • Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
  • Faster data lock — no queries related to scoring inconsistencies
  • Stronger regulatory submissions — consistent, reproducible data with documented methodology

Model versioning

The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same component models throughout the trial:

  • No mid-trial model updates: all four component model versions (erythema, desquamation, induration, BSA segmentation) are frozen when the study is configured
  • Version tracking: model version numbers are recorded in every scored report and in the audit trail
  • Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
  • Reproducibility guarantee: any image set can be re-scored at any time and will produce the identical APASI score

Comparison with manual PASI

DimensionManual countingAI counting
Time per patient10–20 minutes (4 regions × 4 components)~5 seconds for 11 images
Inter-rater variabilitySignificant; RMAE 0.14–0.17 between dermatologists for intensity componentsZero — deterministic
Intra-rater variabilityDocumented; worsens with fatigue across long assessment sessionsZero — no fatigue
BSA estimationHighly subjective; often the largest source of PASI variabilityPixel-level segmentation (IoU ≥0.61)
Component granularityInteger scores only (0–4 per component)Continuous scores mapped to integers, enabling finer sensitivity
Decentralised captureRequires in-clinic visit for manual scoringPatient captures at home; AI scores remotely
Consistency across sitesRequires calibration exercisesInherent — same models everywhere
CostTrained dermatologist time per patient per visitFixed per-study licensing

Manual PASI scoring is particularly challenging because it requires 16 individual assessments (4 components × 4 regions) per patient. BSA estimation is the weakest link; studies have shown it to be the largest source of PASI inter-rater variability. APASI replaces subjective BSA estimation with pixel-level segmentation, and replaces subjective intensity grading with AI models validated against expert consensus.