Measuring sexually explicit content

Katherine Darroch, George Weir

Research output: Contribution to conferencePaper

Abstract

In this paper we describe an experiment to investigate methods of measuring sexually explicit content in text documents. As a starting point, sample data was collected from a variety of sources and manually sorted into three categories: (i) sexually explicit, (ii) non-sexually explicit, and (iii) content that
contained sexually explicit terms but was not sexually explicit - e.g., information used for sex education. This selection of data was used as a training set in developing three software metrics of the type often used in
content filtering. Thereafter, a test set of six files was used in the experiment. These test files were scored for sexually explicit content by participants in the study and by the three different metrics. The assigned scores were then compared to consider how far the metrics and the users agreed on their view of sexually explicit content. In addition to our contrast between software metrics and users, we also note interesting trends from the participant demographics.

Conference

ConferenceCyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics
CountryUnited Kingdom
CityGlasgow
Period23/06/1424/06/14

Fingerprint

Education
Experiments
sex education
experiment
trend
software

Keywords

  • sexually explicit content
  • similarity measures
  • text documents
  • offensive content filtering
  • software metrics
  • Bayes theorem

Cite this

Darroch, K., & Weir, G. (2014). Measuring sexually explicit content. 101-112. Paper presented at Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics, Glasgow, United Kingdom.
Darroch, Katherine ; Weir, George. / Measuring sexually explicit content. Paper presented at Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics, Glasgow, United Kingdom.12 p.
@conference{18e90acdfdcd49678334ab2172f3ea4f,
title = "Measuring sexually explicit content",
abstract = "In this paper we describe an experiment to investigate methods of measuring sexually explicit content in text documents. As a starting point, sample data was collected from a variety of sources and manually sorted into three categories: (i) sexually explicit, (ii) non-sexually explicit, and (iii) content thatcontained sexually explicit terms but was not sexually explicit - e.g., information used for sex education. This selection of data was used as a training set in developing three software metrics of the type often used incontent filtering. Thereafter, a test set of six files was used in the experiment. These test files were scored for sexually explicit content by participants in the study and by the three different metrics. The assigned scores were then compared to consider how far the metrics and the users agreed on their view of sexually explicit content. In addition to our contrast between software metrics and users, we also note interesting trends from the participant demographics.",
keywords = "sexually explicit content, similarity measures, text documents, offensive content filtering, software metrics, Bayes theorem",
author = "Katherine Darroch and George Weir",
year = "2014",
month = "6",
day = "24",
language = "English",
pages = "101--112",
note = "Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics ; Conference date: 23-06-2014 Through 24-06-2014",

}

Darroch, K & Weir, G 2014, 'Measuring sexually explicit content' Paper presented at Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics, Glasgow, United Kingdom, 23/06/14 - 24/06/14, pp. 101-112.

Measuring sexually explicit content. / Darroch, Katherine; Weir, George.

2014. 101-112 Paper presented at Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics, Glasgow, United Kingdom.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Measuring sexually explicit content

AU - Darroch, Katherine

AU - Weir, George

PY - 2014/6/24

Y1 - 2014/6/24

N2 - In this paper we describe an experiment to investigate methods of measuring sexually explicit content in text documents. As a starting point, sample data was collected from a variety of sources and manually sorted into three categories: (i) sexually explicit, (ii) non-sexually explicit, and (iii) content thatcontained sexually explicit terms but was not sexually explicit - e.g., information used for sex education. This selection of data was used as a training set in developing three software metrics of the type often used incontent filtering. Thereafter, a test set of six files was used in the experiment. These test files were scored for sexually explicit content by participants in the study and by the three different metrics. The assigned scores were then compared to consider how far the metrics and the users agreed on their view of sexually explicit content. In addition to our contrast between software metrics and users, we also note interesting trends from the participant demographics.

AB - In this paper we describe an experiment to investigate methods of measuring sexually explicit content in text documents. As a starting point, sample data was collected from a variety of sources and manually sorted into three categories: (i) sexually explicit, (ii) non-sexually explicit, and (iii) content thatcontained sexually explicit terms but was not sexually explicit - e.g., information used for sex education. This selection of data was used as a training set in developing three software metrics of the type often used incontent filtering. Thereafter, a test set of six files was used in the experiment. These test files were scored for sexually explicit content by participants in the study and by the three different metrics. The assigned scores were then compared to consider how far the metrics and the users agreed on their view of sexually explicit content. In addition to our contrast between software metrics and users, we also note interesting trends from the participant demographics.

KW - sexually explicit content

KW - similarity measures

KW - text documents

KW - offensive content filtering

KW - software metrics

KW - Bayes theorem

UR - http://www.cyberforensics.org.uk/

M3 - Paper

SP - 101

EP - 112

ER -

Darroch K, Weir G. Measuring sexually explicit content. 2014. Paper presented at Cyberforensics 2014 - International Conference on Cybercrime, Security & Digital Forensics, Glasgow, United Kingdom.