Knowledge base

Knowledge base in the form of
“IF … THEN …” rules base
which can be easily implemented into decision support system.

Selection of diseases

Drawing upon current statistics and data from leading health organisations such as the World Health Organization (WHO, https://www.cdc.gov/nndss/index.html access by day 23 Feb 2024) through its Global Health Observatory (GHO), the Centers for Disease Control and Prevention (CDC, https://www.cdc.gov/nndss/index.html, access by day 23 Feb 2024) via the National Notifiable Diseases Surveillance System (NNDSS), and the European Centre for Disease Prevention and Control (ECDC, https://atlas.ecdc.europa.eu/public/index.aspx access by day 23 Feb 2023) through its Surveillance Atlas of Infectious Diseases, we have identified 22 diseases for concentrated analysis and monitoring within the scope of this project. These diseases have been selected based on their significant impact on global health, their prevalence, and the trends observed in recent data, thereby ensuring that our system is equipped to handle infectious diseases that pose threats to maritime operations.

List of selected diseases

Chickenpox

Mumps

Chikungunya

Norovirus

Cholera

Pertussis

COVID-19

Rabies

Dengue

Rubella

Diphtheria

Tetanus

Ebola

Tuberculosis

Infectious mononucleosis

Typhoid and paratyphoid fever

Influenza

Hepatitis A

Malaria

Yellow fever

Meningococcal infection

Zika

Acquiring medical knowledge

Relevant literature concerning selected diseases, diagnostic criteria for various infections, and guidelines for diagnosing infectious diseases were first identified. Literature includes international peer-reviewed articles, online reports, commentaries, editorials, electronic books and press releases from universities and research institutions, which include expert opinions. Grey literature published by the WHO, the US Centers for Disease Control and Prevention (CDC) and other local government publications and information outlets were also included. Research databases examined included PubMed, Google Scholar, Embase, Medline and Science Direct.

Chickenpox: presentation and complications in adults Abro AH, Ustadi AM, Das K, Abdou AM, Hussaini HS, Chandra FS.
Clinical manifestations, complications and management of chickenpox infection in pediatric Bereda G. 
Chickenpox Clinical Presentation Anthony J Papadopoulos, MD
Interna Szczeklika 2022 Group work
Chronic Joint Pain 3 Years after Chikungunya Virus Infection Largely Characterized by Relapsing-remitting Symptoms Sarah R. Tritsch, Liliana Encinales, Nelly Pacheco, et al. 
Manifestations of Atypical Symptoms of Chikungunya during the Dhaka Outbreak (2017) in Bangladesh Deeba IM, Hasan MM, Al Mosabbir A, Siam MHB, Islam MS, Raheem E, Hossain MS.
Overview on Chikungunya Virus Infection: From Epidemiology to State-of-the-Art Experimental Models Constant LEC, Rajsfus BF, Carneiro PH, Sisnande T, Mohana-Borges R and Allonso D
Chikungunya virus: A general overview K.A. Galán-Huerta, A.M. Rivas-Estilla, I. Fernández-Salas, J.A. Farfan-Ale, J. Ramos-Jiménez
Chikungunya virus disease European Centre for Disease Prevention and Control (ECDC) 2024
Parazytologia medyczna kompendium Morozinska- Gogol
A prolonged, community-wide cholera outbreak associated with drinking water contaminated by sewage in Kasese District, western Uganda Kwesiga, B., Pande, G., Ario, A.R. et al. 
Cholera — the new strike of an old foe Anna Kuna, Michał Gajewski
Cholera Matthew Fanous; Kevin C. King
Sensitivity, Specificity, and Public-Health Utility of Clinical Case Definitions Based on the Signs and Symptoms of Cholera in Africa Nadri J, Sauvageot D, Njanpop-Lafourcade BM, Baltazar CS et al. 
Factsheet on COVID-19 European Centre for Disease Prevention and Control (ECDC) 2024
COVID-19 diagnosis and management: a comprehensive review Pascarella G, Strumia A, Piliego C, Bruno F et al. 
COVID-19 patients’ clinical characteristics, discharge rate, and fatality rate of meta-analysis Li L-q, Huang T, Wang Y-q, et al. 
Clinical profile of Dengue fever in an urban tertiary care hospital in South India Dhivya P., Monica A., Jayaramachandran S.
Dengue—How Best to Classify It Anon Srikiatkhachorn, Alan L. Rothman, Robert V. Gibbons et al. 
Dengue hemorrhagic fever – A systemic literature review of current perspectives on pathogenesis, prevention and control Wen-Hung Wang, Aspiro Nayim Urbina, Max R. Chang et al. 
Clinical Characteristics and Management of 676 Hospitalized Diphtheria Cases, Kyrgyz Republic, 1995 R. Kadirova, H. Ü. Kartoglu, P. M. Strebel
Corynebacterium Diphtheriae Anmol Chaudhary; Shivlal Pandey
Association of clinical signs and symptoms of Ebola viral disease with case fatality: a systematic review and meta-analysis Harsha Moole , Swetha Chitta , Darlyn Victor et al. 
Clinical Manifestations and Laboratory Diagnosis of Ebola Virus Infection. Qureshi AI. 
The Reemergence of Ebola Hemorrhagic Fever, Democratic Republic of the Congo, 1995  Ali S. Khan, F. Kweteminga Tshioko, David L. Heymann et al. 
Acute and Chronic Symptoms of Mononucleosis Sanjay Lambore, MB, MSc, James McSherry, MB ChB, and Arthur S. Kraus, ScD
Infectious mononucleosis in children – one centre experience Joanna Maria Wrembel, Tomasz Jarmoliński
Clinical Signs and Symptoms Predicting Influenza Infection Monto AS, Gravenstein S, Elliott M, Colopy M, Schweinle J. 
An Office-Based Approach to Influenza: Clinical Diagnosis and Laboratory Testing NORMAN J. MONTALTO, D.O.
Travelers Malaria Patricia Schlagenhauf-Lawlor
Sensitivity of fever for diagnosis of clinical malaria in a Kenyan area of unstable, low malaria transmission. utanda, Albino & Cheruiyot, Priscah & Hodges, James & Ayodo, George & Odero, Wilson & John, Chandy
Plasmodium falciparum clinical malaria in Dielmo, a holoendemic area in Senegal: No influence of acquired immunity on initial symptomatology and severity of malaria attacks Rogier, Christophe & Ly, Alioune & Adama, Tall & Cissé, Badara & Trape, J.
Invasive Meningococcal Infection: Analysis of 110 cases from a Tertiary Care Centre in North East India Dass Hazarika, R., Deka, N.M., Khyriem, A.B. et al.
Which early ‘red flag’ symptoms identify children with meningococcal disease in primary care? Tanya Ali Haj-Hassan, Matthew J Thompson, Richard T Mayon-White et al. 
Mumps Virus: Modification of the Identify-Isolate-Inform Tool for Frontline Healthcare Providers Koenig KL, Shastry S, Mzahim B, Almadhyan A, Burns MJ. 
Characteristics of a large mumps outbreak: Clinical severity, complications and association with vaccination status of mumps outbreak cases Stein Zamir, H Schroeder, H Shoob, N Abramson & G Zentner 
Mumps outbreak and laboratory diagnosis Mylène Maillet, Eric Bouvat, Nicole Robert et al. 
An outbreak of norovirus-related acute gastroenteritis associated with delivery food in Guangzhou, southern China  Lu Y, Ma M, Wang H, Wang D, Chen C, Jing Q, Geng J, Li T, Zhang Z, Yang Z. 
Vomiting as a Symptom and Transmission Risk in Norovirus Illness: Evidence from Human Challenge Studies Amy E. Kirby, Ashleigh Streby, Christine L. Moe
Clinical Manifestation of Norovirus Gastroenteritis in Health Care Settings Ben A. Lopman, Mark H. Reacher, Ian B. Vipond, Joyshri Sarangi, David W. G. Brown
Clinical manifestation of norovirus infection in children aged less than five years old admitted with acute diarrhea in Surabaya, Indonesia: a cross-sectional study Fardah Athiyyah A, Shigemura K, Kitagawa K et al. 
Diagnostic value of symptoms and laboratory data for pertussis in adolescent and adult patients Miyashita N, Akaike H, Teranishi H, Kawai Y, Ouchi K, Kato T, Hayashi T, Okimoto N.
Pertussis (Whooping Cough) Centers for Disease Control and Prevention
Pertussis prevalence among adult patients with acute cough İlbay A, Tanrıöver MD, Zarakol P, Güzelce EÇ, Bölek H, Ünal S. 
Clinical aspects of human rabies in the state of Ceará, Brazil: an overview of 63 cases Duarte NFH, Pires Neto RDJ, Viana VF, Feijão LX, Alencar CH, Heukelbach J. 
Epidemiological and clinical features of human rabies cases in Bali 2008-2010 Susilawathi NM, Darwinata AE, Dwija IB, Budayanti NS et al. 
Rubella outbreak among workers in three small- and medium-size business establishments associated with imported genotype 1E rubella virus-Shizuoka, Japan, 2015 Kato H, Kamiya H, Mori Y, Yahata Y, Morino S, Griffith M et al. 
Rubella (German Measles, Three-Day Measles) Centers for Disease Control and Prevention
Five years review of cases of adult tetanus managed at Gondar University Hospital, North West Ethiopia (Gondar, Sep. 2003-Aug. 2008) Tadesse A, Gebre-Selassie S. 
Clinical features and outcomes of tetanus: a retrospective study Fan Z, Zhao Y, Wang S, Zhang F, Zhuang C. 
Tetanus Louise Thwaites, MD
Tetanus: Presentation and outcome in adults.  Younas NJ, Abro AH, Das K, Abdou AMS, Ustadi AM, Afzal S. 
Miliary tuberculosis: Clinical manifestations, diagnosis and outcome in 38 adults Mert, A., Bilir, M., Tabak, F., Ozaras, R., Ozturk, R. et al. 
A Population-Based Survey of Tuberculosis Symptoms: How Atypical Are Atypical Presentations? Loren G. Miller, Steven M. Asch, Emily I. Yu, Laura Knowles, Lillian Gelberg, Paul Davidson
Alert sign and symptoms for the early diagnosis of pulmonary tuberculosis: analysis of patients followed by a tertiary pediatric hospital Farina, E., D’Amore, C., Lancella, L. et al. 
Clinical and epidemiological characteristics of HIV/AIDS patients diagnosed with tuberculosis in the Integral Care Service of the Dr. Robert Reid Cabral Children’s Hospital during the period 2010-2016 Ricardo Elías-Melgen, Rosa Abreu, Milandres García
Typhoid FeverAn Epidemic With Remarkably Few Clinical Signs and Symptoms Klotz SA, Jorgensen JH, Buckwold FJ, Craven PC.
Current trends in typhoid fever Crum, N.F. 
Characteristic features of culture positive enteric fever in pediatric teaching hospital in Sulaimani governorate Tyib, Tara & Fakhir, Haydar & Mohammad, Hayder
Enteric fever Basnyat B, Qamar FN, Rupali P, Ahmed T, Parry CM.
Clinical Manifestations Of Hepatitis A: Recent Experience In A Community Teaching Hospital Myron J. Tong, Neveen S. El-Farra, Marianne I. Grew,
Natural History, Clinical Manifestations, and Pathogenesis of Hepatitis A Shin EC, Jeong SH
Clinical and Epidemiological Spectrum of Acute Viral Hepatitis Due to Hepatitis A and E in Children: A Descriptive, Cross-Sectional, Hospital-Based Study Javaria Rasheed, Muhammad Khalid, Sobia Rubab, Bushra Iqbal, Iram Nawaz, Asad Shahzad
Assessing yellow Fever risk in the ecuadorian Amazon.  Izurieta RO, Macaluso M, Watts DM, Tesh RB, Guerra B, Cruz LM, Galwankar S, Vermund SH
Clinical features of yellow fever cases at Vom Christian Hospital during the 1969 epidemic on the Jos Plateau, Nigeria Evan M Jones and D. C. Wilson
Clinical and epidemiological characteristics of yellow fever in Brazil: analysis of reported cases 1998-2002 Tuboi, Suely & Costa, Zouraide & Vasconcelos, Pedro & Hatch, Douglas.
An Overview of Yellow Fever Virus Disease McGuinness I, Beckham JD, Tyler KL, Pastula DM.
Yellow fever outbreak in Kenya: A review Olivier Uwishema, Stanley Chinedu Eneh, Anyike Goodness Chiburoma et al. 
Yellow Fever Leslie V. Simon; Muhammad F. Hashmi; Klaus D. Torp.
Zika Virus The Johns Hopkins University
Clinical, laboratory and virological data from suspected ZIKV patients in an endemic arbovirus area Tatiana Elias Colombo, Cássia Fernanda Estofolete, Andréia Francesli Negri Reis et al. 
Clinical relevance of Zika symptoms in the context of a Zika Dengue epidemic Humberto Guanche Garcell, Francisco Gutiérrez García, Manuel Ramirez Nodal et al. 
The Clinical Spectrum of Zika Virus in Returning Travelers Eyal Meltzer, Eyal Leshem, Yaniv Lustig, Giora Gottesman, Eli Schwartz

Each disease was then described by several signs grouped into specific 8 categories. Targeted interviews were conducted with medical experts to determine the crucial elements in the diagnosis of infectious diseases and their relationship to clinical decision-making parameters. We have developed a table representing the frequencies of various disease symptoms. Each row in the table corresponds to a specific disease, and each column corresponds to a particular symptom. The intersection of a row and column contains the frequency or occurrence of a specific symptom for a particular disease.

Groups of signs of infectious diseases

1. General/systemic signs

  • continuous fever or fever with intervals less than 1 day
  • intermittent fever every 2-4 days
  • lethargy
  • sweating and/or chills
  • head pain
  • lack of appetite and/or weight loss

5. Hematological symptoms

  • bleeding manifestations

2. Respiratory signs 

  • chest pain
  • cough
  • phlegm
  • shortness of breath
  • sore throat
  • runny nose

6. Gastric symptoms:

  • abdominal pain
  • diarrhea
  • nausea
  • vomiting

3. Musculoskeletal signs

  • back pain
  • joint pain
  • muscle pain
  • lockjaw

7. Dermatological or associated signs:

 

  • neck swelling
  • skin rash
  • yellow skin and/or dark urine

4. Neurological  signs

  • blurry vision
  • cognitive difficulties
  • difficulty swallowing
  • dizziness
  • emotional agitation
  • neurological problems with sensation and movement
  • seizures
  • stiff neck and sensitivity to light

8. Others signs:

 

  • fear of water
  • testicular pain
  • eye redness

 

Prediction algorithm

The aim of the prediction algorithm is to give the possible infectious disease/s a patient may have based on her/his symptoms. The prediction algorithm has to be trained, and we trained it with the knowledge base of symptoms and infectious diseases. The knowledge base of symptoms and infectious diseases was expressed in percentages, i.e. how many patients out of 100 would express a specific symptom when they were infected by a specific infectious disease. Based on these data, we randomly generated hundreds of artificial patients with specific symptoms but reached all of them the exact percentages from the knowledge base. E.g. if for a specific disease 25% of patients have symptom 1, 50% of them symptom 2 and 100% of them symptom 3, we could generate the following 5 patients from these data and the overall percentages would still fit the initial data. We used this approach of randomly generating artificial patients to take into account that every human being is unique, so the symptoms appearing after infection are slightly different for each person.
Patient Symptom 1 Symptom 2 Symptom 3
1 1 1 1
2 1 1
3 1 1
4 1
5 1 1
6 1
7 1 1
8 1
25% 50% 100%
We used these randomly generated data to train our prediction algorithm. In our prediction algorithm we were testing three AI models to be able to compare them and select the most accurate one: random forest, decision tree and naive Bayes. These models are all used to predict a disease directly based on given symptoms. 1. The naive bayes model is a classifier which assumes that the symptoms are conditionally independent, given the target disease. This assumption’s strength (naivety) is what gives the classifier its name. 2. The decision tree model is a tree-like model of symptoms and their possible diseases, including chance event outcomes, resource costs, and utility. Each branch represents the outcome of the test (if a symptom is present or not), and each leaf node represents a disease. On the following image is a complete decision tree.
decision tree

3. Random forest model is based on the decision tree model, but in the random forest model, a forest (a big number) of decision trees is generated considering only some symptoms for each decision tree. The output of the random forest is the disease selected by most decision trees.

Orange data mining software was used to build all three models. Orange data mining software is one of the software for machine learning and data mining. We used it as it is a free, widely used software from which the generated models can be exported and then used almost anywhere by using an Orange python library (Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Mozina M, Polajnar M, Toplak M, Staric A, Stajdohar M, Umek L, Zagar L, Zbontar J, Zitnik M, Zupan B (2013) Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research 14(Aug): 2349−2353. http://jmlr.org/papers/volume14/demsar13a/demsar13a.pdf). This allows us to easily integrate the prediction algorithm into the website and mobile app and make it publicly available.

On the following image is the visualisation of the project in Orange data mining software.

Evaluation results

This section presents the evaluation of three different machine learning models: Random Forest, Naive Bayes, and Decision Tree. The models were assessed using various performance metrics to determine their suitability for deployment.


Dataset Used

We collected a small number of real patients’ symptoms and disease combinations as the test data. It should be stressed that the test data was not used during the training process while building these models.


Validation Techniques

The project applied Cross Validation and Random Sampling techniques to test the accuracy of the models. The combination of Cross Validation and Random Sampling facilitates a comprehensive evaluation of the machine learning models’ performance. This dual-validation approach ensures a thorough understanding of how the model performs across various scenarios.


Cross Validation technique and Sampling Setup

Cross Validation

  • Number of folds: 5
  • Stratified: Enabled (ensures each fold has a similar distribution of classes)

Random Sampling

  • Repeat train/test: 10 times
  • Training set size: 66%
  • Stratified: Enabled
Evaluation Metrics
  1. AUC (Area Under Curve): Measures the model’s ability to differentiate between classes.
  2. CA (Classification Accuracy): The ratio of correctly predicted instances to the total instances.
  3. F1 Score: The harmonic mean of precision and recall.
  4. Precision: The ratio of correctly predicted positive observations to the total predicted positives.
  5. Recall: The ratio of correctly predicted positive observations to all observations in the actual class.
  6. MCC (Matthews Correlation Coefficient): A measure of the quality of binary classifications.
Model Performance

Below is a table summarizing the evaluation results for the three machine learning models: Random Forest, Naive Bayes, and Decision Tree.

Metric Random forest Naive bayes Decision tree
AUC 1.000 0.998 0.802
CA 0.952 0.857 0.571
F1 0.937 0.825 0.500
Precision 0.929 0.810 0.468
Recall 0.952 0.857 0.571
MCC 0.952 0.854 0.559

This table captures the performance of each model across the key evaluation metrics used in the DESSEV project. The comparison of models by Area Under ROC Curve indicates that the Random Forest model is superior, followed by Naive Bayes, with Decision Tree being the least effective.


As the Random Forest model performs the best, it is the model used for disease prediction on our website and mobile app.

en_GBEnglish
Scroll to Top