Scoring Methodology
This page provides a detailed technical description of the AI scoring pipeline for acne severity assessment. It covers the detection model, density computation, IGA formula derivation, multi-perspective scoring, overlap handling, and reproducibility characteristics.
Inflammatory lesion detection
Object detection model
The lesion detection stage uses a convolutional neural network (CNN) trained for object detection on facial acne images. The model identifies individual inflammatory acne lesions (specifically papules, pustules, and nodules) and outputs bounding boxes with confidence scores for each detection.
Comedones (blackheads and whiteheads) are excluded from the count. This is consistent with:
- IGA methodology, which assesses inflammatory severity
- FDA guidance, which recommends that "inflammatory and noninflammatory lesions should be counted and reported separately"
- The Hayashi Criterion (Hayashi et al., 2008), which grades acne severity based on the number of inflammatory eruptions per half-face

Training data
The model was trained on a dataset of facial acne images annotated by board-certified dermatologists. Each annotation consists of a bounding box around an individual inflammatory lesion with its type classification.Confidence thresholding
Each detection includes a confidence score. Detections below a calibrated confidence threshold are discarded. The threshold was tuned to balance sensitivity (not missing real lesions) against specificity (not counting artifacts or non-lesion structures). The threshold is fixed at inference time and applies identically across all images and sites.Spatial density computation
Definition and formula
The spatial density score quantifies how closely detected lesions are clustered together. It is computed as:
where:
- = total area of pairwise overlaps between circular regions centered on each detected lesion
- = total area covered by all circular detection regions (union)
The radius of each circular region is derived from the bounding box dimensions of the corresponding detection. The density score ranges from 0 (no overlap; lesions are completely dispersed) to approaching 1 (near-total overlap; lesions are tightly concentrated in a small area).

Clinical significance of density
Density captures a dimension of acne severity that lesion count alone misses. Two patients can present with the same number of lesions but very different clinical severity. Concentrated lesions suggest more active, localised inflammation and are more likely to result in scarring.| Scenario | Lesion count | Density | Clinical perception |
|---|---|---|---|
| Scattered | 25 | 0.15 | Lesions dispersed across the face — appears milder |
| Clustered | 25 | 0.65 | Lesions concentrated in a small area — appears more severe |
IGA calculation
The formula
The AI translates the lesion count and spatial density into a score aligned with the 5-point IGA (Investigator Global Assessment) scale:
where:
- = number of inflammatory acne lesions detected in the image
- = spatial density of detected lesions (range: 0–1)
- , = empirically derived calibration constants
Calibration methodology
The constants a and b were determined by optimising the correlation between the AI-computed score and expert IGA ratings. The expert ratings represent the consensus of three board-certified dermatologists who independently scored each case.The power term
The exponent (where ) introduces a logarithmic relationship between lesion count and severity. This reflects clinical reality: the difference between 5 and 15 lesions is far more clinically significant than the difference between 85 and 95 lesions. At high counts, additional lesions have diminishing marginal impact on perceived severity; the patient is already clearly severe.
IGA severity alignment
The resulting IGA score maps to the standard 5-point scale used in regulatory submissions worldwide:
| IGA score | Severity | Clinical description |
|---|---|---|
| 0 | Clear | No inflammatory lesions |
| 1 | Almost clear | Rare inflammatory lesions with very low density |
| 2 | Mild | Some inflammatory lesions, low to moderate density, no nodules |
| 3 | Moderate | Many inflammatory lesions, moderate to high density, occasional nodules |
| 4 | Severe | Numerous inflammatory lesions, high density, many nodules |
This alignment means AI-computed IGA scores can be used as IGA-equivalent endpoints without score mapping or transformation.
ALADIN composite score
The ALADIN (Acne Lesion And Density INdex) composite score extends the IGA to a higher-resolution continuous scale:
This yields a 0–10 continuous scale that preserves the clinical meaning of IGA while providing finer granularity. A patient with IGA 2.8 (between Mild and Moderate) registers as ALADIN 7.0, capturing intermediate severity levels that integer IGA rounds away.
The ALADIN composite is particularly useful as an exploratory endpoint for detecting subtle treatment effects that the integer IGA scale may not resolve, especially in early-phase studies where sensitivity to change is critical.
Per-perspective scoring and global aggregation
Local scores
Each perspective (e.g., left diagonal, right diagonal) produces its own set of local scores:
- Lesion count
- Spatial density
- IGA score (local)
- ALADIN score (local)
Global score aggregation
The global score is derived from local scores using one of three configurable aggregation methods:
| Method | Formula | When to use |
|---|---|---|
| Maximum (default) | Captures the worst-affected area. Recommended for most acne protocols. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region. | |
| Sum | Captures cumulative severity across the face. Useful when total burden matters. | |
| Mean | Average severity. Useful when perspectives have significant overlap. |
The aggregation method is configured per protocol during study setup, ensuring the global score aligns with the study's clinical and statistical design.
Why maximum is the default: In acne, the worst-affected perspective most closely represents the dermatologist's overall severity impression. When dermatologists perform a global IGA assessment, they are primarily influenced by the most severely affected region of the face, not the average.
Multi-perspective protocols and overlap handling
Counting methodology
The standard protocol follows the Hayashi Criterion (Hayashi et al. (2008)), which grades acne severity by counting inflammatory eruptions per half-face. Images are captured at approximately 70-degree diagonal angles, one from the left and one from the right. The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face.
| Hayashi Criterion grade | Range |
|---|---|
| Mild | 0–5 lesions per half-face |
| Moderate | 6–20 lesions per half-face |
| Severe | 21–50 lesions per half-face |
| Very severe | >50 lesions per half-face |
The AI lesion counting is consistent with this methodology: each perspective corresponds to approximately one half-face, and the per-perspective count maps naturally to the grading thresholds.
Alternative protocols
Not all studies use the standard 2-perspective approach. Legit.Health supports configurable multi-perspective protocols:
| Protocol | Perspectives | Use case |
|---|---|---|
| Standard (Hayashi Criterion) | 2 views: Left diagonal (~70°), Right diagonal (~70°) | Most acne studies; captures majority of facial acne area per the Hayashi Criterion for counting inflammatory lesions per half-face. |
| Three-perspective | 3 views: Left perpendicular, Frontal, Right perpendicular | Studies requiring full-face frontal coverage. The frontal view overlaps with both lateral views; facial landmark detection deduplicates lesions. |
| Custom | Any combination of perspectives | Any combination of perspectives, defined in collaboration with the sponsor during protocol design. |
Overlap exclusion
When perspectives overlap, the AI uses Facial landmark detection to prevent double-counting:
- Landmark identification: The AI detects facial landmarks (eyes, nose, mouth, jawline, forehead boundaries) in each image.
- Region mapping: Each perspective is mapped to the facial regions it covers, based on the detected landmarks and the known capture angle.
- Overlap detection: Lesions that appear in overlapping regions between two perspectives are identified using their spatial position relative to the facial landmarks.
- Deduplication: Overlapping lesions are counted only once, attributed to the perspective with the highest confidence detection.
This architecture enables sponsors to design protocols with as many perspectives as they need; the AI handles the complexity of deduplication automatically.
Reproducibility
Inter-rater variability: eliminated
Different investigators scoring the same patient produce different results. AI-powered scoring eliminates this entirely — the same image always produces the same score, regardless of which site captures it.
Intra-rater variability: eliminated
The same investigator may score the same patient differently on different occasions due to fatigue, time pressure, learning effects, or subjective drift over a long study. The AI has no such drift — it is the same model, with the same weights, producing the same output deterministically.
Site-to-site consistency
In multi-center trials, scoring consistency across sites is critical for endpoint integrity. Manual scoring requires extensive calibration exercises, training sessions, and ongoing monitoring for rater drift. AI scoring requires none of this — scores are inherently consistent across all sites.
Impact on trial design
Reduced scoring variability means cleaner endpoint data, which translates to:
- Smaller required sample sizes — less noise means smaller samples can detect the same treatment effect
- Faster data lock — no queries related to scoring inconsistencies
- Stronger regulatory submissions — consistent, reproducible data with documented methodology
Model versioning
The AI model version is locked at study initiation. This ensures every patient in the study is scored by the same model throughout the trial:
- No mid-trial model updates: the model version is frozen when the study is configured
- Version tracking: the model version number is recorded in every scored report and in the audit trail
- Change control: any model changes follow the formal change control process under IEC 62304, including risk assessment per ISO 14971
- Reproducibility guarantee: any image can be re-scored at any time and will produce the identical result
Comparison with manual counting
| Dimension | Manual counting | AI counting |
|---|---|---|
| Time per patient | 5–15 minutes | <2 seconds |
| Inter-rater variability | Significant; requires training and calibration | Zero — deterministic |
| Intra-rater variability | Documented; worsens with fatigue | Zero — no fatigue |
| Scalability | Limited by rater availability | Unlimited — API-based |
| Consistency across sites | Requires calibration exercises | Inherent — same model everywhere |
| Spatial density | Not captured by manual counting | Automatically computed |
| Cost | Central reader fees; per-patient costs | Fixed per-study licensing |
| Regulatory alignment | Dependent on rater training | Trained against dermatologist consensus |
The FDA acknowledges that "lesion counting is time-consuming" and that "reliability is higher with rater training and use of standard templates." AI scoring can be viewed as the ultimate standardised template: a trained model that applies the same methodology to every image with perfect consistency.