Saltar al contenido principal

Scoring Methodology

This page provides a detailed technical description of the AI scoring pipeline for acne severity assessment. It covers the detection model, density computation, IGA formula derivation, multi-perspective scoring, overlap handling, and reproducibility characteristics.

Inflammatory lesion detection

Object detection model

The lesion detection stage uses a convolutional neural network (CNN) trained for object detection on facial acne images. The model identifies individual inflammatory acne lesions (specifically papules, pustules, and nodules) and outputs bounding boxes with confidence scores for each detection.

Comedones (blackheads and whiteheads) are excluded from the count. This is consistent with:

  • IGA methodology, which assesses inflammatory severity
  • FDA guidance, which recommends that "inflammatory and noninflammatory lesions should be counted and reported separately"
  • The Hayashi Criterion (Hayashi et al., 2008), which grades acne severity based on the number of inflammatory eruptions per half-face
Bounding boxes identifying inflammatory acne lesions
Lesion detection: bounding boxes identify each inflammatory acne lesion. The total count N is the number of detections.

Training data

The model was trained on a dataset of facial acne images annotated by board-certified dermatologists. Each annotation consists of a bounding box around an individual inflammatory lesion with its type classification.

Confidence thresholding

Each detection includes a confidence score. Detections below a calibrated confidence threshold are discarded. The threshold was tuned to balance sensitivity (not missing real lesions) against specificity (not counting artifacts or non-lesion structures). The threshold is fixed at inference time and applies identically across all images and sites.

Spatial density computation

Definition and formula

The spatial density score DD quantifies how closely detected lesions are clustered together. It is computed as:

D=AoverlapAtotalD = \frac{A_{\text{overlap}}}{A_{\text{total}}}

where:

  • AoverlapA_{\text{overlap}} = total area of pairwise overlaps between circular regions centered on each detected lesion
  • AtotalA_{\text{total}} = total area covered by all circular detection regions (union)

The radius of each circular region is derived from the bounding box dimensions of the corresponding detection. The density score ranges from 0 (no overlap; lesions are completely dispersed) to approaching 1 (near-total overlap; lesions are tightly concentrated in a small area).

Overlapping circles showing lesion spatial density
Density calculation: overlapping detection circles indicate spatial concentration. Higher overlap = higher density.

Clinical significance of density

Density captures a dimension of acne severity that lesion count alone misses. Two patients can present with the same number of lesions but very different clinical severity. Concentrated lesions suggest more active, localised inflammation and are more likely to result in scarring.
ScenarioLesion countDensityClinical perception
Scattered250.15Lesions dispersed across the face — appears milder
Clustered250.65Lesions concentrated in a small area — appears more severe

IGA calculation

The formula

The AI translates the lesion count NN and spatial density DD into a score aligned with the 5-point IGA (Investigator Global Assessment) scale:

IGA=Na(D+b)\text{IGA} = N^a \cdot (D + b)

where:

  • NN = number of inflammatory acne lesions detected in the image
  • DD = spatial density of detected lesions (range: 0–1)
  • aa, bb = empirically derived calibration constants

Calibration methodology

The constants a and b were determined by optimising the correlation between the AI-computed score and expert IGA ratings. The expert ratings represent the consensus of three board-certified dermatologists who independently scored each case.

The power term NaN^a

The exponent aa (where 0\<a\<10 \< a \< 1) introduces a logarithmic relationship between lesion count and severity. This reflects clinical reality: the difference between 5 and 15 lesions is far more clinically significant than the difference between 85 and 95 lesions. At high counts, additional lesions have diminishing marginal impact on perceived severity; the patient is already clearly severe.

IGA severity alignment

The resulting IGA score maps to the standard 5-point scale used in regulatory submissions worldwide:

IGA scoreSeverityClinical description
0ClearNo inflammatory lesions
1Almost clearRare inflammatory lesions with very low density
2MildSome inflammatory lesions, low to moderate density, no nodules
3ModerateMany inflammatory lesions, moderate to high density, occasional nodules
4SevereNumerous inflammatory lesions, high density, many nodules

This alignment means AI-computed IGA scores can be used as IGA-equivalent endpoints without score mapping or transformation.

ALADIN composite score

The ALADIN (Acne Lesion And Density INdex) composite score extends the IGA to a higher-resolution continuous scale:

ALADIN=IGA×2.5\text{ALADIN} = \text{IGA} \times 2.5

This yields a 0–10 continuous scale that preserves the clinical meaning of IGA while providing finer granularity. A patient with IGA 2.8 (between Mild and Moderate) registers as ALADIN 7.0, capturing intermediate severity levels that integer IGA rounds away.

The ALADIN composite is particularly useful as an exploratory endpoint for detecting subtle treatment effects that the integer IGA scale may not resolve, especially in early-phase studies where sensitivity to change is critical.

Per-perspective scoring and global aggregation

Local scores

Each perspective (e.g., left diagonal, right diagonal) produces its own set of local scores:

  • Lesion count NlocalN_{\text{local}}
  • Spatial density DlocalD_{\text{local}}
  • IGA score (local)
  • ALADIN score (local)

Global score aggregation

The global score is derived from local scores using one of three configurable aggregation methods:

MethodFormulaWhen to use
Maximum (default)Global=max(Local1,Local2,)\text{Global} = \max(\text{Local}_1, \text{Local}_2, \ldots)Captures the worst-affected area. Recommended for most acne protocols. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region.
SumGlobal=Locali\text{Global} = \sum \text{Local}_iCaptures cumulative severity across the face. Useful when total burden matters.
MeanGlobal=1nLocali\text{Global} = \frac{1}{n}\sum \text{Local}_iAverage severity. Useful when perspectives have significant overlap.

The aggregation method is configured per protocol during study setup, ensuring the global score aligns with the study's clinical and statistical design.

Why maximum is the default: In acne, the worst-affected perspective most closely represents the dermatologist's overall severity impression. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region of the face, not the average.

Multi-perspective protocols and overlap handling

Counting methodology

The standard protocol follows the Hayashi Criterion (Hayashi et al. (2008)), which grades acne severity by counting inflammatory eruptions per half-face. Images are captured at approximately 70-degree diagonal angles, one from the left and one from the right. The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face.

Hayashi Criterion gradeRange
Mild0–5 lesions per half-face
Moderate6–20 lesions per half-face
Severe21–50 lesions per half-face
Very severe>50 lesions per half-face

The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face, and the per-perspective count maps naturally to the grading thresholds.

Alternative protocols

Not all studies use the standard 2-perspective approach. Legit.Health supports configurable multi-perspective protocols:

ProtocolPerspectivesUse case
Standard (Hayashi Criterion)2 views: Left diagonal (~70°), Right diagonal (~70°)Most acne studies; captures majority of facial acne area per the Hayashi Criterion for counting inflammatory lesions per half-face.
Three-perspective3 views: Left perpendicular, Frontal, Right perpendicularStudies requiring full-face frontal coverage. The frontal view overlaps with both lateral views; facial landmark detection deduplicates lesions.
CustomAny combination of perspectivesAny combination of perspectives, defined in collaboration with the sponsor during protocol design.

Overlap exclusion

When perspectives overlap, the AI uses Facial landmark detection to prevent double-counting:

  1. Landmark identification: The AI detects facial landmarks (eyes, nose, mouth, jawline, forehead boundaries) in each image.
  2. Region mapping: Each perspective is mapped to the facial regions it covers, based on the detected landmarks and the known capture angle.
  3. Overlap detection: Lesions that appear in overlapping regions between two perspectives are identified using their spatial position relative to the facial landmarks.
  4. Deduplication: Overlapping lesions are counted only once, attributed to the perspective with the highest confidence detection.

This architecture enables sponsors to design protocols with as many perspectives as they need; the AI handles the complexity of deduplication automatically.

Reproducibility

Inter-rater variability: eliminated

Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.

Intra-rater variability: eliminated

The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.

Site-to-site consistency

In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.

Impact on trial design

Reduced scoring variability means cleaner endpoint data, which translates to:

  • Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
  • Faster data lock — no queries related to scoring inconsistencies
  • Stronger regulatory submissions — consistent, reproducible data with documented methodology

Model versioning

The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same model throughout the trial:

  • No mid-trial model updates: the model version is frozen when the study is configured
  • Version tracking: the model version number is recorded in every scored report and in the audit trail
  • Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
  • Reproducibility guarantee: any image can be re-scored at any time and will produce the identical result

Comparison with manual counting

DimensionManual countingAI counting
Time per patient5–15 minutes<2 seconds
Inter-rater variabilitySignificant; requires training and calibrationZero — deterministic
Intra-rater variabilityDocumented; worsens with fatigueZero — no fatigue
ScalabilityLimited by rater availabilityUnlimited — API-based
Consistency across sitesRequires calibration exercisesInherent — same model everywhere
Spatial densityNot captured by manual countingAutomatically computed
CostCentral reader fees; per-patient costsFixed per-study licensing
Regulatory alignmentDependent on rater trainingTrained against dermatologist consensus

The FDA acknowledges that "lesion counting is time-consuming" and that "reliability is higher with rater training and use of standard templates." AI scoring can be viewed as the ultimate standardised template: a trained model that applies the same methodology to every image with perfect consistency.