Known Limitations

Clinical trial sponsors need to understand not just what an AI scoring system can do, but where its boundaries are. This page documents the known limitations of the ASCORAD atopic dermatitis severity scoring technology, explains why each limitation exists, and describes how it is managed.

Affected area extent vs. intensity

SCORAD combines two conceptually different measurements: the extent of affected skin (BSA, a spatial measurement) and the intensity of signs (a severity measurement). Photograph-based BSA estimation inherently covers only the areas captured; areas not photographed cannot contribute to the score.

Mitigation: The imaging protocol is designed to comprehensively cover all clinically relevant body areas. For decentralised trials, patient-guided capture provides full coverage of the relevant regions. The DIQA quality gate ensures all required perspectives are captured before submission.

Dryness scoring on dry-skinned patients

Dryness (xerosis) can be difficult to differentiate from very mild eczema, particularly on skin types that exhibit baseline dryness unrelated to atopic dermatitis. This affects both human assessors and AI.

Mitigation: Dryness is scored relative to clinical context. The AI is trained on demographically diverse datasets. Performance on Fitzpatrick V–VI skin types is monitored through the post-market surveillance programme. See Performance Across Skin Types for detailed stratified metrics.

Subjective components remain patient-reported

ASCORAD automates the objective SCORAD component (BSA + intensity signs). Pruritus (NRS 0–10) and sleep disturbance (NRS 0–10) are inherently subjective and are collected via patient-reported outcome instruments, not from images.

Mitigation: This is a fundamental characteristic of the SCORAD scale, not a limitation of the AI. The objective component (0–72) is fully automated; the subjective component (0–20) requires patient input regardless of scoring method. For studies preferring a fully objective endpoint, EASI (0–72) does not include subjective components.

Erythema on dark skin

Erythema (redness) is harder to detect visually on darker skin tones. On Fitzpatrick V–VI skin, erythema in AD may present as violaceous or hyperpigmented rather than classically red.

Mitigation: The AI models are trained on diverse skin types. Performance monitoring is stratified by Fitzpatrick type as part of the post-market surveillance programme. For studies with significant dark skin representation, this should be discussed during protocol design.

Pilot study sample size

The current published evidence (Medela et al., JID Innovations, 2022) is a pilot study. The prospective aEASI_HVN clinical investigation will provide larger-sample validation data.

Mitigation: The pilot study establishes proof of concept. The technology shares the same validated architecture as APASI (psoriasis), ALADIN (acne), and automated SALT scoring (alopecia), all of which have larger validation datasets. Cross-condition validation strengthens the overall evidence base.

Cross-cutting limitations

The following limitations apply to all indications scored by the platform, not just atopic dermatitis.

Photograph-based assessment

The AI analyses clinical photographs, not live patients. Certain clinical features that require palpation (e.g., induration, or plaque thickness) or observation under specific conditions are estimated from visual cues only. This is an inherent limitation of any remote or image-based assessment method.

Mitigation: The imaging protocol standardises capture conditions (lighting, distance, angle), and the DIQA quality gate rejects images that do not meet minimum quality standards for focus, lighting, framing, and resolution. The acceptance criterion for each AI model is non-inferiority to expert inter-rater variability on the same photographs, ensuring the AI is at least as consistent as dermatologists working from the same modality.

Fitzpatrick skin type V–VI performance

Performance is lower for darker skin types due to the global underrepresentation of Fitzpatrick V–VI skin in dermatology image datasets. This is an industry-wide challenge that affects both AI systems and human assessors.

Mitigation: Stratified performance metrics are published transparently (see Performance Across Skin Types). Active dataset diversification is ongoing through targeted data sourcing (DDI, SkinDeep, Full Spectrum Dermatology) and post-market clinical follow-up (PMCF) monitoring. All Fitzpatrick groups currently exceed minimum acceptance thresholds.

Subjective ground truth

The reference standard for severity scoring is the mathematical consensus of multiple expert dermatologists — not an objective measurement. The AI cannot be “more correct” than the experts it was trained against.

Mitigation: This is not a limitation of the AI specifically, but of the clinical assessment itself. The same constraint applies to any human rater. The consensus of 2–3 independent experts is the best available approximation of truth and the same standard used by the FDA and EMA for reference standards in dermatology clinical trials. The AI matching this consensus represents the realistic ceiling of performance.

Model version specificity

All performance metrics reported in this documentation apply to a specific validated model version. Model updates — including retraining, architecture changes, or threshold adjustments — require full re-validation per IEC 62304 before deployment.

Mitigation: The model version is locked at study initiation. No mid-study model updates occur. This ensures that every patient in a trial is scored by the same model, preserving endpoint integrity throughout the study.

Decision support, not autonomous diagnosis

The system provides severity scoring to support clinical decisions. It does not replace clinical judgement, and it does not make autonomous diagnostic or treatment decisions. All AI-generated scores should be interpreted by qualified healthcare professionals within the context of the patient’s overall clinical presentation.

How limitations are managed

All limitations documented on this page are tracked within the formal risk management process (ISO 14971) and the software development lifecycle (IEC 62304). Each limitation has been assessed for clinical risk, and mitigations have been implemented where the residual risk is not already acceptable.

The post-market clinical follow-up (PMCF) programme under MDR Annex XIV continuously monitors real-world performance. Any new limitation identified through post-market surveillance triggers a formal risk assessment and, if necessary, corrective action.

Affected area extent vs. intensity​

Dryness scoring on dry-skinned patients​

Subjective components remain patient-reported​

Erythema on dark skin​

Pilot study sample size​

Cross-cutting limitations​