TY - JOUR
T1 - CLEF 2017 technologically assisted reviews in empirical medicine overview
AU - Kanoulas, Evangelos
AU - Li, Dan
AU - Azzopardi, Leif
AU - Spijker, Rene
PY - 2017/9/11
Y1 - 2017/9/11
N2 - Systematic reviews are a widely used method to provide an overview over the current scientific consensus, by bringing together multiple studies in a reliable, transparent way. The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying all relevant studies in an unbiased way both complex and time consuming to the extent that jeopardizes the validity of their findings and the ability to inform policy and practice in a timely manner. The CLEF 2017 e-Health Lab Task 2 focuses on the efficient and effective ranking of studies during the abstract and title screening phase of conducting Diagnostic Test Accuracy systematic reviews. We constructed a benchmark collection of fifty such reviews and the corresponding relevant and irrelevant articles found by the original Boolean query. Fourteen teams participated in the task, submitting 68 automatic and semi-automatic runs, using information retrieval and machine learning algorithms over a variety of text representations, in a batch and iterative manner. This paper reports both the methodology used to construct the benchmark collection, and the results of the evaluation.
AB - Systematic reviews are a widely used method to provide an overview over the current scientific consensus, by bringing together multiple studies in a reliable, transparent way. The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying all relevant studies in an unbiased way both complex and time consuming to the extent that jeopardizes the validity of their findings and the ability to inform policy and practice in a timely manner. The CLEF 2017 e-Health Lab Task 2 focuses on the efficient and effective ranking of studies during the abstract and title screening phase of conducting Diagnostic Test Accuracy systematic reviews. We constructed a benchmark collection of fifty such reviews and the corresponding relevant and irrelevant articles found by the original Boolean query. Fourteen teams participated in the task, submitting 68 automatic and semi-automatic runs, using information retrieval and machine learning algorithms over a variety of text representations, in a batch and iterative manner. This paper reports both the methodology used to construct the benchmark collection, and the results of the evaluation.
KW - Active learning
KW - Evaluation
KW - Information retrieval
KW - Systematic reviews
KW - TAR
KW - Text classification
UR - http://ceur-ws.org/Vol-1866/
UR - http://www.scopus.com/inward/record.url?scp=85034732447&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85034732447
SN - 1613-0073
VL - 1866
SP - 1
EP - 29
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
ER -