Blog | Legit.Health

Creating a brand-new scoring system, the 6 key steps every scientist should follow

July 23, 2023 · 9 min read

Taig Mac Carthy

Co-founder at Legit.Health

Alfonso Medela

CAIO at Legit.Health

Introduction

Scoring systems are the unsung heroes of day-to-day dermatological practice. Disliked by many for the additional work they can bring to the table, they help train the clinical eye, bring a more evidence-based approach to the clinical practice and provide valuable endpoints in research.

There's no denying the crucial role scoring systems play in the development of the clinical field, as they are essential tools that bring precision, objectivity and reliability to clinical trials.

To better use and understand the tools at our disposal, we have to start at the beginning:

What is a Dermatological scoring system?

In Dermatology, a scoring system is a methodology that allows the doctor to assess the severity of a condition by observing and quantifying objective parameters such as redness, the affected area, the quantity and density of lesions, and so on.

The main goal of a scoring system is to provide a tool that documents data precisely and consistently for routine evaluations and clinical studies.

Some of the most used scoring systems in dermatology include:

IHS4: International Hidradenitis Suppurativa Severity Scoring System
PASI: Psoriasis Area and Severity Index
SALT: Severity of Alopecia Tool
SCORAD: SCORing Atopic Dermatitis
EASI: Eczema Area and Severity Index
UAS: Urticaria Activity Score
DLQI: Dermatology Life Quality Index

Now that we know what scoring systems are, let's delve deeper:

What is the latest in dermatological scoring systems?

There have been very recent advancements in dermatological scoring systems, and in fact we are in the midst of a major paradigm shift.

Although many doctors still use the traditional versions of these scoring systems, many key opinion leaders of the dermatological world are pushing for digitalization and automation of the task of scoring the severity, allowing greater precision, reliability and objectivity to the measurements.

Some notable examples include:

APASI: Automatic Psoriasis Area and Severity Index
AIHS4: Automatic International Hidradenitis Suppurativa Severity Scoring System
ASALT: Automatic Severity of Alopecia Tool
ASCORAD: Automatic SCORing Atopic Dermatitis

This paradigm shift is not exempt from controversy. Indeed, many HCPs prefer to stick to the traditional pen-and-paper scoring systems. However, an increasing number of practitioners are embracing the power of artificial intelligence. Watch this video where one of the creators of the SCORAD, Professor Jean-François Stalder, interacts with one of the creators of the automatic versions of the ASCORAD, Mr. Taig Mac Carthy.

Clips extracted from the event "Artificial intelligence: what future for eczema patients?" organised by the Pierre Fabre Eczema Foundation on Sep 14, 2023

How do we know if a scoring system is good?

When it comes to dermatological assessments, the effectiveness of a scoring system is paramount. But what exactly makes a scoring system reliable and useful? Through scientific consensus, several key factors have been identified that contribute to the robustness of these systems. Let's delve into these crucial elements:

Ease of Use: This factor considers whether the system can be applied effortlessly within the constraints of time and financial resources. A user-friendly system is crucial for widespread adoption in clinical settings.
Sensitivity to Change: An effective scoring system must be capable of detecting clinically meaningful changes over time. This sensitivity ensures that any progress or deterioration in a patient's condition is accurately captured.
Interobserver Reliability: This refers to the consistency of the results when different observers use the scoring system. High interobserver reliability means different clinicians will arrive at similar conclusions, enhancing the system's credibility.
Intra-observer Variability: This looks at the consistency of results when the same observer uses the scoring system multiple times. Low intra-observer variability indicates that the system provides stable results, irrespective of repeated assessments by the same clinician.
Interpretability: A practical scoring system should provide meaningful qualitative interpretations of its scores, like categorizing the severity of a condition as mild, moderate, or severe.

These criteria not only ensure the scoring system's effectiveness but also its applicability and reliability in diverse clinical scenarios.

Adapted from "Methods and definitions to rate the quality of outcome measures". Schmitt, J., Langan, S., Deckert, S., Svensson, A., von Kobyletzki, L., Thomas, K., & Spuls, P. (2013). Assessment of clinical signs of atopic dermatitis: A systematic review and recommendation. Journal of Allergy and Clinical Immunology, 132(6), 1337--1347. doi:10.1016/j.jaci.2013.07.008.

Do you want to see the clinical AI technology in action?

Discover now

How can you create a new scoring system?

Developing a new dermatological scoring system is a complex but rewarding endeavour. By following these key steps and incorporating cutting-edge technology, scientists can significantly contribute to the field of dermatology, enhancing diagnosis and treatment outcomes.

Here's a list of steps you can follow to create a scoring system:

1. Identify a need

The first step is to identify a need. You may look for instances where determining the severity of a condition has a high degree of subjectivity. You may also seek to automate scoring systems that are very tedious to fill in. Most importantly, it is useful to look for situations where patients are suffering due to a lack of access to a specialist who is capable of measuring the severity.

All these instances create an opportunity to innovate. This is perfectly conveyed by Professor Ramon Grimalt, who explains the motivation that led him and his collaborators to create a new scoring system for dermatitis:

In this video (in Spanish), Dr Ramon Grimalt and Alfonso Medela, both co-authors of the ASCORAD scoring system, explain the need that pushed them to create a new scoring system for atopic dermatitis.

Through this project, it has been possible to create, for the first time in history, an artificial intelligence system that is in charge of this work. We are very satisfied with the results that will allow clinical trials and more precise diagnoses.
—
Dr. Ramón Grimalt

Even before starting the development process of a new scoring system, the researchers at Legit.Health make sure they are addressing an actual need. Staying up-to-date with the state of the art in dermatology and extensively reviewing the pertinent literature is a key step before any research project.

2. Selecting the dataset

As is often the case in medical science, one of the biggest issues our researchers have to face is selecting an optimal dataset. To do this correctly, there are some crucial things to consider:

Collaborate with experts

Collaboration with dermatologists and other experts is vital. Their insights help tailor the system to real-world needs. Simultaneously, gather a comprehensive dataset encompassing a wide range of cases and variations within the disease spectrum. This dataset will form the foundation of your scoring system.

Establish annotation and assessment protocols

Develop clear guidelines for annotating and assessing lesions or other dermatological features. This could involve detailed criteria for lesion count, size, color, or other relevant factors. Ensuring consistency in these initial stages is crucial for the system's reliability.

Regarding the quantity of images, as a rule of thumb, the more parameters we want to include in the new scoring system, the more images we need to have in the dataset. But of course, the more images the better.

Explanation of Hives Identification — Explanation of how the artificial intelligence identifies hives in urticaria images.

One trick that may help you to better optimise the usage of the dataset is to start using the images gradually, looking for the moment when the results stabilise, thus deciding the dataset size as you gather it, so that you don't fall short or use too many images.

3. Building a model

This step is somewhat difficult to cover in this article, but right below you will find a scientific publication where we detail how we trained a model for hidradenitis suppurativa.

Publication of the automatic version of the IHS3 in the journal Skin Research and Technology.

Select the most impactful clinical signs

All scoring systems look at clinical signs. However, not all of them contribute equally to the affectation of the patient. That is why a key step in building a model is selecting the most relevant clinical signs, and looking for the optimal combination.

Often, for each condition, you can find several pre-existing scoring systems, each one with some strengths and weaknesses. It is by analysing them that we can find out which parameters are the most important ones for the condition
—
Alfonso Medela CAIO at Legit.Health

For example, in the case of acne, the literature suggested that aside from the count of lesions (a parameter that showed in every other scoring system), the density of lesions was also noted as a very prominent sign of a severe condition.

The selected parameters will then be measured and identified in a selection of clinical pictures, by specialized doctors who assist in this process. These values will be compared with those of the gold standard scoring system for that same array of images.

This allows researchers to correlate the values of the previously existing method with the newly defined variables.

4. Optimising the model

Once the researchers have all the parameters defined, it is time to create an equation that combines them all to best represent the severity of the different clinical images used in the study.

This is known as an optimisation problem, a mathematical term that describes the process of finding the best possible solution to a specific problem: in this case, how to represent the severity of a condition.

Do you want to see the clinical AI technology in action?

Discover now

Without delving too deep into the mathematical intricacies of the process, different combinations of operations are tested on the parameters: adding, subtracting or multiplying them to obtain a single combined value that represents the severity of the condition.

Then, the results of each of these proposed equations for every single image in the set are compared with the results given by the selected gold standard, looking for the best possible correlation with the previously existing problems.

5. Prove it works

Validation is key. Test the system against both interobserver and intraobserver variability. Use statistical measures like the F1 score, mean absolute error (MAE), and Krippendorff alpha to assess the system's reliability and accuracy. This step might involve multiple iterations to fine-tune the system.

This is done with a subset of clinical pictures that have not been used to optimise or train the model in the previous steps. After all, your equation has been created to perform perfectly when tested against those images, so to demonstrate that it can work with any picture of the condition, we have to test it with some new ones.

APASI, A dermatological scoring system — APASI, the most advanced dermatological scoring system for psoriasis

Simply put, our researchers work closely with specialised doctors to label the images using both the gold standard method and the newly developed model and will compare the results to see if the new model performs better than the original one and still finds a correlation with it.

6. Put it into practice in a clinical environment

Finally, the model is brought to the forefront of medical experimentation and compared with the results of dermatologists observing the same affection in clinical practice.

Continuous monitoring and feedback from users will provide valuable insights for ongoing improvement. The system should evolve with advancements in dermatological knowledge and technology.

CADx System Report — Caption of a full report from the CADx system. The chart at the top right shows the evolution of the urticaria, by plotting the AUAS scores across time.

This step is key to ensure that the method can keep up with the reality of the day-to-day practice of medicine, as a good and reliable scoring system is not only more objective and less error-prone than the clinical eye but also needs to be fast and not overload doctors with more work.

That's why at Legit.Health we link our scoring systems to computer vision algorithms, to put the speed and precision of artificial intelligence in the hands of doctors and help them help their patients.

Get access now

This free 23-day trial of Legit.Health gives clinics and hospitals a hands-on look at how to drive increased adherence and improve patient outcomes, as well as improving efficiency and overall quality of life.

Introduction​

What is a Dermatological scoring system?​

What is the latest in dermatological scoring systems?​

How do we know if a scoring system is good?​

How can you create a new scoring system?​

1. Identify a need​

2. Selecting the dataset​

Collaborate with experts​

Establish annotation and assessment protocols​

3. Building a model​

Select the most impactful clinical signs​

4. Optimising the model​

5. Prove it works​

6. Put it into practice in a clinical environment​

Get access now​