Scoring Methodology

This page provides a detailed technical description of the AI scoring pipeline for psoriasis severity assessment via the Automatic Psoriasis Area and Severity Index (APASI).

The PASI formula

The Psoriasis Area and Severity Index (Fredriksson & Pettersson, 1978) is the most widely used measure of psoriasis severity in clinical trials. It assesses four body regions, each scored for three intensity dimensions and affected area:

\text{PASI} = \sum_{r \in \{h,t,u,l\}} w_r \cdot A_r \cdot (E_r + D_r + I_r)

Region	Symbol	Weight ( $w_r$ )	% of BSA
Head	$h$	0.1	10%
Trunk	$t$	0.3	30%
Upper extremities	$u$	0.2	20%
Lower extremities	$l$	0.4	40%

Each region is scored for:

Erythema ( $E_r$ ): Redness, 0–4
Desquamation ( $D_r$ ): Scaling, 0–4
Induration ( $I_r$ ): Plaque thickness, 0–4
Affected area ( $A_r$ ): BSA involvement, 0–6 (0%, <10%, 10–29%, 30–49%, 50–69%, 70–89%, 90–100%)

Maximum score: $0.1 \times 6 \times 12 + 0.3 \times 6 \times 12 + 0.2 \times 6 \times 12 + 0.4 \times 6 \times 12 = 72$ .

AI component models

APASI automates each PASI component with a dedicated AI model. This modular approach means each component can be independently validated.

Erythema intensity model

Measures the redness of psoriatic plaques from clinical images. The model outputs a continuous intensity value that maps to the 0–4 ordinal scale.

Production RMAE: 0.13 (95% CI: 0.119–0.142)
Acceptance criterion: ≤0.14 (expert inter-rater RMAE)
Result: PASS, the AI's erythema scoring error is below the disagreement between dermatologists

Desquamation intensity model

Measures the severity of silvery-white scaling on psoriatic plaques.

Production RMAE: 0.153 (95% CI: 0.139–0.167)
Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
Result: PASS, within expert inter-rater variability

Induration intensity model

Measures the thickness and elevation of psoriatic plaques.

Production RMAE: 0.151 (95% CI: 0.137–0.167)
Acceptance criterion: ≤0.17 (expert inter-rater RMAE)
Result: PASS, within expert inter-rater variability

Erythema surface quantification model

Segments psoriatic areas at pixel level to compute the percentage of body surface area affected per region. Unlike the intensity models (which score severity), this model measures extent.

Production IoU: ≥0.61 (validated against expert segmentation masks)
Acceptance criterion: ≥0.61 (expert segmentation IoU)
Result: PASS, matches expert segmentation quality

This model replaces the most subjective component of manual PASI (BSA estimation) with objective pixel-level measurement. In manual PASI, the affected area score ( $A_r$ ) requires a dermatologist to visually estimate what percentage of each body region is covered by psoriatic plaques and then map that estimate to a 0–6 ordinal scale. Studies have consistently shown BSA estimation to be the largest single source of PASI inter-rater variability; dermatologists routinely disagree by 1–2 ordinal levels on affected area.

The AI segmentation model eliminates this subjectivity entirely. It identifies psoriatic pixels in each body region, computes the exact percentage of affected surface area, and maps the result to the $A_r$ scale. Because the measurement is pixel-level, two images with visually similar affected areas will always receive the same score, a level of consistency that is impossible to achieve with human estimation.

Body region segmentation

For full-body PASI assessment, the AI segments each full-body image to identify the four PASI regions. Body landmarks are detected to delineate head, trunk, upper extremities, and lower extremities, ensuring each region's score is computed from the correct anatomical area.

When perspectives overlap, the AI uses Body region segmentation to prevent double-counting:

Body segmentation: The AI segments each full-body image to identify the four PASI regions (head, trunk, upper extremities, lower extremities).
Region assignment: Each pixel in the image is assigned to a specific body region using anatomical landmarks.
Erythema surface quantification: Within each region, the AI segments psoriatic areas to compute the affected BSA percentage.
Intensity scoring: Close-up images of each region are analysed for erythema, desquamation, and induration intensity.

Severity classification

PASI score	Severity	Clinical description
0	Clear	No psoriasis (PASI 0)
5	Mild	Mild psoriasis (PASI 1–5)
10	Moderate	Moderate psoriasis (PASI 5–10)
20	Severe	Severe psoriasis (PASI 10–20)
72	Very severe	Very severe psoriasis (PASI >20, maximum 72)

PASI response thresholds

In clinical trials, treatment efficacy is typically measured as the proportion of patients achieving:

Response	Definition	Typical use
PASI 75	≥75% improvement from baseline PASI	Traditional primary endpoint
PASI 90	≥90% improvement from baseline PASI	Increasingly used as primary endpoint for biologics
PASI 100	100% improvement (complete clearance)	Secondary endpoint

APASI enables automated PASI response calculation by computing baseline and follow-up scores from the same AI pipeline.

Reproducibility

Inter-rater variability: eliminated

Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.

Intra-rater variability: eliminated

The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.

Site-to-site consistency

In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.

Impact on trial design

Reduced scoring variability means cleaner endpoint data, which translates to:

Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
Faster data lock — no queries related to scoring inconsistencies
Stronger regulatory submissions — consistent, reproducible data with documented methodology

Model versioning

The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same component models throughout the trial:

No mid-trial model updates: all four component model versions (erythema, desquamation, induration, BSA segmentation) are frozen when the study is configured
Version tracking: model version numbers are recorded in every scored report and in the audit trail
Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
Reproducibility guarantee: any image set can be re-scored at any time and will produce the identical APASI score

Comparison with manual PASI

Dimension	Manual counting	AI counting
Time per patient	10–20 minutes (4 regions × 4 components)	~5 seconds for 11 images
Inter-rater variability	Significant; RMAE 0.14–0.17 between dermatologists for intensity components	Zero — deterministic
Intra-rater variability	Documented; worsens with fatigue across long assessment sessions	Zero — no fatigue
BSA estimation	Highly subjective; often the largest source of PASI variability	Pixel-level segmentation (IoU ≥0.61)
Component granularity	Integer scores only (0–4 per component)	Continuous scores mapped to integers, enabling finer sensitivity
Decentralised capture	Requires in-clinic visit for manual scoring	Patient captures at home; AI scores remotely
Consistency across sites	Requires calibration exercises	Inherent — same models everywhere
Cost	Trained dermatologist time per patient per visit	Fixed per-study licensing

Manual PASI scoring is particularly challenging because it requires 16 individual assessments (4 components × 4 regions) per patient. BSA estimation is the weakest link; studies have shown it to be the largest source of PASI inter-rater variability. APASI replaces subjective BSA estimation with pixel-level segmentation, and replaces subjective intensity grading with AI models validated against expert consensus.

The PASI formula​

AI component models​

Erythema intensity model​

Desquamation intensity model​

Induration intensity model​

Erythema surface quantification model​

Body region segmentation​

Severity classification​

PASI response thresholds​

Reproducibility​