Scoring Methodology

This page provides a detailed technical description of the AI scoring pipeline for acne severity assessment. It covers the detection model, density computation, IGA formula derivation, multi-perspective scoring, overlap handling, and reproducibility characteristics.

Inflammatory lesion detection

Object detection model

The lesion detection stage uses a convolutional neural network (CNN) trained for object detection on facial acne images. The model identifies individual inflammatory acne lesions (specifically papules, pustules, and nodules) and outputs bounding boxes with confidence scores for each detection.

Comedones (blackheads and whiteheads) are excluded from the count. This is consistent with:

IGA methodology, which assesses inflammatory severity
FDA guidance, which recommends that "inflammatory and noninflammatory lesions should be counted and reported separately"
The Hayashi Criterion (Hayashi et al., 2008), which grades acne severity based on the number of inflammatory eruptions per half-face

AI-detected inflammatory acne lesions with confidence scores — Lesion detection: bounding boxes identify each inflammatory acne lesion. The total count N is the number of detections.

Training data

The model was trained on a dataset of facial acne images annotated by board-certified dermatologists. Each annotation consists of a bounding box around an individual inflammatory lesion with its type classification.

Confidence thresholding

Each detection includes a confidence score. Detections below a calibrated confidence threshold are discarded. The threshold was tuned to balance sensitivity (not missing real lesions) against specificity (not counting artifacts or non-lesion structures). The threshold is fixed at inference time and applies identically across all images and sites.

Spatial density computation

Definition and formula

The spatial density score $D$ quantifies how closely detected lesions are clustered together. It is computed as:

D = \frac{A_{\text{overlap}}}{A_{\text{total}}}

where:

$A_{\text{overlap}}$ = total area of pairwise overlaps between circular regions centered on each detected lesion
$A_{\text{total}}$ = total area covered by all circular detection regions (union)

The radius of each circular region is derived from the bounding box dimensions of the corresponding detection. The density score ranges from 0 (no overlap; lesions are completely dispersed) to approaching 1 (near-total overlap; lesions are tightly concentrated in a small area).

Overlapping circles showing lesion spatial density — Density calculation: overlapping detection circles indicate spatial concentration. Higher overlap = higher density.

Clinical significance of density

Density captures a dimension of acne severity that lesion count alone misses. Two patients can present with the same number of lesions but very different clinical severity. Concentrated lesions suggest more active, localised inflammation and are more likely to result in scarring.

Scenario	Lesion count	Density	Clinical perception
Scattered	25	0.15	Lesions dispersed across the face — appears milder
Clustered	25	0.65	Lesions concentrated in a small area — appears more severe

IGA calculation

The formula

The AI translates the lesion count $N$ and spatial density $D$ into a score aligned with the 5-point IGA (Investigator Global Assessment) scale:

\text{IGA} = N^a \cdot (D + b)

where:

$N$ = number of inflammatory acne lesions detected in the image
$D$ = spatial density of detected lesions (range: 0–1)
$a$ , $b$ = empirically derived calibration constants

Calibration methodology

The constants a and b were determined by optimising the correlation between the AI-computed score and expert IGA ratings. The expert ratings represent the consensus of three board-certified dermatologists who independently scored each case.

The power term $N^a$

The exponent $a$ (where $0 \< a \< 1$ ) introduces a logarithmic relationship between lesion count and severity. This reflects clinical reality: the difference between 5 and 15 lesions is far more clinically significant than the difference between 85 and 95 lesions. At high counts, additional lesions have diminishing marginal impact on perceived severity; the patient is already clearly severe.

IGA severity alignment

The resulting IGA score maps to the standard 5-point scale used in regulatory submissions worldwide:

IGA score	Severity	Clinical description
0	Clear	No inflammatory lesions
1	Almost clear	Rare inflammatory lesions with very low density
2	Mild	Some inflammatory lesions, low to moderate density, no nodules
3	Moderate	Many inflammatory lesions, moderate to high density, occasional nodules
4	Severe	Numerous inflammatory lesions, high density, many nodules

This alignment means AI-computed IGA scores can be used as IGA-equivalent endpoints without score mapping or transformation.

ALADIN composite score

The ALADIN (Acne Lesion And Density INdex) composite score extends the IGA to a higher-resolution continuous scale:

\text{ALADIN} = \text{IGA} \times 2.5

This yields a 0–10 continuous scale that preserves the clinical meaning of IGA while providing finer granularity. A patient with IGA 2.8 (between Mild and Moderate) registers as ALADIN 7.0, capturing intermediate severity levels that integer IGA rounds away.

The ALADIN composite is particularly useful as an exploratory endpoint for detecting subtle treatment effects that the integer IGA scale may not resolve, especially in early-phase studies where sensitivity to change is critical.

Per-perspective scoring and global aggregation

Local scores

Each perspective (e.g., left diagonal, right diagonal) produces its own set of local scores:

Lesion count $N_{\text{local}}$
Spatial density $D_{\text{local}}$
IGA score (local)
ALADIN score (local)

Global score aggregation

The global score is derived from local scores using one of three configurable aggregation methods:

Method	Formula	When to use
Maximum (default)	$\text{Global} = \max(\text{Local}_1, \text{Local}_2, \ldots)$	Captures the worst-affected area. Recommended for most acne protocols. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region.
Sum	$\text{Global} = \sum \text{Local}_i$	Captures cumulative severity across the face. Useful when total burden matters.
Mean	$\text{Global} = \frac{1}{n}\sum \text{Local}_i$	Average severity. Useful when perspectives have significant overlap.

The aggregation method is configured per protocol during study setup, ensuring the global score aligns with the study's clinical and statistical design.

Why maximum is the default: In acne, the worst-affected perspective most closely represents the dermatologist's overall severity impression. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region of the face, not the average.

Multi-perspective protocols and overlap handling

Counting methodology

The standard protocol follows the Hayashi Criterion (Hayashi et al. (2008)), which grades acne severity by counting inflammatory eruptions per half-face. Images are captured at approximately 70-degree diagonal angles, one from the left and one from the right. The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face.

Hayashi Criterion grade	Range
Mild	0–5 lesions per half-face
Moderate	6–20 lesions per half-face
Severe	21–50 lesions per half-face
Very severe	>50 lesions per half-face

The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face, and the per-perspective count maps naturally to the grading thresholds.

Alternative protocols

Not all studies use the standard 2-perspective approach. Legit.Health supports configurable multi-perspective protocols:

Protocol	Perspectives	Use case
Standard (Hayashi Criterion)	2 views: Left diagonal (~70°), Right diagonal (~70°)	Most acne studies; captures majority of facial acne area per the Hayashi Criterion for counting inflammatory lesions per half-face.
Three-perspective	3 views: Left perpendicular, Frontal, Right perpendicular	Studies requiring full-face frontal coverage. The frontal view overlaps with both lateral views; facial landmark detection deduplicates lesions.
Custom	Any combination of perspectives	Any combination of perspectives, defined in collaboration with the sponsor during protocol design.

Overlap exclusion

When perspectives overlap, the AI uses Facial landmark detection to prevent double-counting:

Landmark identification: The AI detects facial landmarks (eyes, nose, mouth, jawline, forehead boundaries) in each image.
Region mapping: Each perspective is mapped to the facial regions it covers, based on the detected landmarks and the known capture angle.
Overlap detection: Lesions that appear in overlapping regions between two perspectives are identified using their spatial position relative to the facial landmarks.
Deduplication: Overlapping lesions are counted only once, attributed to the perspective with the highest confidence detection.

Facial landmark mesh overlaid on lesion detections showing how the AI maps regions — The facial landmark mesh identifies anatomical reference points and defines region boundaries, enabling the system to attribute each detected lesion to a specific half-face.

Left diagonal perspective with the excluded right-side region highlighted in green — Left diagonal: the green overlay marks the region excluded from this perspective's lesion count. Only lesions on the visible (left) half-face are scored.

This architecture enables sponsors to design protocols with as many perspectives as they need; the AI handles the complexity of deduplication automatically.

Reproducibility

Inter-rater variability: eliminated

Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.

Intra-rater variability: eliminated

The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.

Site-to-site consistency

In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.

Impact on trial design

Reduced scoring variability means cleaner endpoint data, which translates to:

Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
Faster data lock — no queries related to scoring inconsistencies
Stronger regulatory submissions — consistent, reproducible data with documented methodology

Model versioning

The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same model throughout the trial:

No mid-trial model updates: the model version is frozen when the study is configured
Version tracking: the model version number is recorded in every scored report and in the audit trail
Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
Reproducibility guarantee: any image can be re-scored at any time and will produce the identical result

Comparison with manual counting

Dimension	Manual counting	AI counting
Time per patient	5–15 minutes	<2 seconds
Inter-rater variability	Significant; requires training and calibration	Zero — deterministic
Intra-rater variability	Documented; worsens with fatigue	Zero — no fatigue
Scalability	Limited by rater availability	Unlimited — API-based
Consistency across sites	Requires calibration exercises	Inherent — same model everywhere
Spatial density	Not captured by manual counting	Automatically computed
Cost	Central reader fees; per-patient costs	Fixed per-study licensing
Regulatory alignment	Dependent on rater training	Trained against dermatologist consensus

The FDA acknowledges that "lesion counting is time-consuming" and that "reliability is higher with rater training and use of standard templates." AI scoring can be viewed as the ultimate standardised template: a trained model that applies the same methodology to every image with perfect consistency.

Inflammatory lesion detection​

Object detection model​

Training data​

Confidence thresholding​

Spatial density computation​

Definition and formula​

Clinical significance of density​

IGA calculation​

The formula​

Calibration methodology​

The power term NaN^aNa​

IGA severity alignment​

ALADIN composite score​

Per-perspective scoring and global aggregation​

Local scores​

Global score aggregation​

Multi-perspective protocols and overlap handling​

Counting methodology​

Alternative protocols​

Overlap exclusion​

Reproducibility​