Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine

Fernanda C. Dórea, C. Anne Muckle, David Kelton, J. T. McClure, Beverly J. McEwen, W. Bruce McNab, Javier Sanchez, Crawford W. Revie

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.

LanguageEnglish
Article numbere57334
Number of pages9
JournalPLoS ONE
Volume8
Issue number3
DOIs
Publication statusPublished - 7 Mar 2013

Fingerprint

Veterinary medicine
Veterinary Medicine
veterinary medicine
taxonomy
Macros
Decision Trees
monitoring
Decision trees
Classifiers
microbial detection
Monitoring
artificial intelligence
Pathogens
methodology
Learning systems
Animals
Health
Weights and Measures
laboratory experimentation
Research

Keywords

  • health data
  • medical surveillance systems
  • medical knowledge
  • pathogen detection
  • animal health
  • public health

Cite this

Dórea, Fernanda C. ; Muckle, C. Anne ; Kelton, David ; McClure, J. T. ; McEwen, Beverly J. ; McNab, W. Bruce ; Sanchez, Javier ; Revie, Crawford W. / Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine. In: PLoS ONE. 2013 ; Vol. 8, No. 3.
@article{ae9fcc6f9ee340ae996a5234b05c1ca4,
title = "Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine",
abstract = "Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Na{\"i}ve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Na{\"i}ve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.",
keywords = "health data, medical surveillance systems, medical knowledge, pathogen detection, animal health, public health",
author = "D{\'o}rea, {Fernanda C.} and Muckle, {C. Anne} and David Kelton and McClure, {J. T.} and McEwen, {Beverly J.} and McNab, {W. Bruce} and Javier Sanchez and Revie, {Crawford W.}",
year = "2013",
month = "3",
day = "7",
doi = "10.1371/journal.pone.0057334",
language = "English",
volume = "8",
journal = "PLOS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "3",

}

Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine. / Dórea, Fernanda C.; Muckle, C. Anne; Kelton, David; McClure, J. T.; McEwen, Beverly J.; McNab, W. Bruce; Sanchez, Javier; Revie, Crawford W.

In: PLoS ONE, Vol. 8, No. 3, e57334, 07.03.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploratory analysis of methods for automated classification of laboratory test orders into syndromic groups in veterinary medicine

AU - Dórea, Fernanda C.

AU - Muckle, C. Anne

AU - Kelton, David

AU - McClure, J. T.

AU - McEwen, Beverly J.

AU - McNab, W. Bruce

AU - Sanchez, Javier

AU - Revie, Crawford W.

PY - 2013/3/7

Y1 - 2013/3/7

N2 - Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.

AB - Background: Recent focus on earlier detection of pathogen introduction in human and animal populations has led to the development of surveillance systems based on automated monitoring of health data. Real- or near real-time monitoring of pre-diagnostic data requires automated classification of records into syndromes-syndromic surveillance-using algorithms that incorporate medical knowledge in a reliable and efficient way, while remaining comprehensible to end users. Methods: This paper describes the application of two of machine learning (Naïve Bayes and Decision Trees) and rule-based methods to extract syndromic information from laboratory test requests submitted to a veterinary diagnostic laboratory. Results: High performance (F1-macro = 0.9995) was achieved through the use of a rule-based syndrome classifier, based on rule induction followed by manual modification during the construction phase, which also resulted in clear interpretability of the resulting classification process. An unmodified rule induction algorithm achieved an F1-micro score of 0.979 though this fell to 0.677 when performance for individual classes was averaged in an unweighted manner (F1-macro), due to the fact that the algorithm failed to learn 3 of the 16 classes from the training set. Decision Trees showed equal interpretability to the rule-based approaches, but achieved an F1-micro score of 0.923 (falling to 0.311 when classes are given equal weight). A Naïve Bayes classifier learned all classes and achieved high performance (F1-micro = 0.994 and F1-macro =. 955), however the classification process is not transparent to the domain experts. Conclusion: The use of a manually customised rule set allowed for the development of a system for classification of laboratory tests into syndromic groups with very high performance, and high interpretability by the domain experts. Further research is required to develop internal validation rules in order to establish automated methods to update model rules without user input.

KW - health data

KW - medical surveillance systems

KW - medical knowledge

KW - pathogen detection

KW - animal health

KW - public health

U2 - 10.1371/journal.pone.0057334

DO - 10.1371/journal.pone.0057334

M3 - Article

VL - 8

JO - PLOS One

T2 - PLOS One

JF - PLOS One

SN - 1932-6203

IS - 3

M1 - e57334

ER -