Detection of news feeds items appropriate for children

Tamara Polajnar, Richard Glassey, Leif Azzopardi

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

2 Citations (Scopus)

Abstract

Identifying child-appropriate web content is an important
yet difficult classification task. This novel task is characterised by attempting
to determine age/child appropriateness (which is not necessarily
topic-based), despite the presence of unbalanced class sizes and the
lack of quality training data with human judgements of appropriateness.
Classification of feeds, a subset of web content, presents further challenges
due to their temporal nature and short document format. In this
paper, we discuss these challenges and present baseline results for this
task through an empirical study that classifies incoming news stories as
appropriate (or not) for children. We show that while the na¨ıve Bayes
approach produces a higher AUC it is vulnerable to the imbalanced data
problem, and that support vector machine provides a more robust overall
solution. Our research shows that classifying children’s content is a
non-trivial task that has greater complexities than standard text based
classification. While the F-score values are consistent with other research
examining age-appropriate text classification, we introduce a new problem
with a new dataset.
Original languageEnglish
Title of host publicationProceedings of the 34th European Conference on Advances in Information Retrieval
Place of PublicationBerlin, Heidelberg
PublisherSpringer-Verlag
Pages63-72
Number of pages10
ISBN (Print)978-3-642-28996-5
DOIs
Publication statusPublished - 2012

Publication series

NameECIR'12
PublisherSpringer-Verlag

Keywords

  • children
  • news feeds
  • web content
  • classification

Fingerprint Dive into the research topics of 'Detection of news feeds items appropriate for children'. Together they form a unique fingerprint.

Cite this