Positing the problem: enhancing classification of extremist web content through textual analysis

George Weir, Richard Frank, Barry Cartwright, Emanuel dos Santos

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

1 Citation (Scopus)

Abstract

Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) web-crawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.
LanguageEnglish
Title of host publication2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)
EditorsBarry Cartwright, George Weir, Laurie Yiu-Chung Lau
PublisherIEEE
ISBN (Print)9781509060962
DOIs
Publication statusPublished - 17 Nov 2016
EventInternational Conference on Cybercrime and Computer Forensics 2016 - Simon Fraser University, Vancouver, Canada
Duration: 12 Jun 201614 Jun 2016

Conference

ConferenceInternational Conference on Cybercrime and Computer Forensics 2016
Abbreviated titleICCCF 2016
CountryCanada
CityVancouver
Period12/06/1614/06/16

Fingerprint

Terrorism
World Wide Web
Gages
Web crawler

Keywords

  • sentiment analysis
  • web-crawling
  • classification
  • textual analysis

Cite this

Weir, G., Frank, R., Cartwright, B., & dos Santos, E. (2016). Positing the problem: enhancing classification of extremist web content through textual analysis. In B. Cartwright, G. Weir, & L. Y-C. Lau (Eds.), 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF) IEEE. https://doi.org/10.1109/ICCCF.2016.7740431
Weir, George ; Frank, Richard ; Cartwright, Barry ; dos Santos, Emanuel. / Positing the problem : enhancing classification of extremist web content through textual analysis. 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF). editor / Barry Cartwright ; George Weir ; Laurie Yiu-Chung Lau . IEEE, 2016.
@inproceedings{9a8b001579444e2f8798542afd082451,
title = "Positing the problem: enhancing classification of extremist web content through textual analysis",
abstract = "Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) web-crawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.",
keywords = "sentiment analysis, web-crawling, classification, textual analysis",
author = "George Weir and Richard Frank and Barry Cartwright and {dos Santos}, Emanuel",
note = "{\circledC} 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.",
year = "2016",
month = "11",
day = "17",
doi = "10.1109/ICCCF.2016.7740431",
language = "English",
isbn = "9781509060962",
editor = "Barry Cartwright and George Weir and {Lau }, {Laurie Yiu-Chung}",
booktitle = "2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)",
publisher = "IEEE",

}

Weir, G, Frank, R, Cartwright, B & dos Santos, E 2016, Positing the problem: enhancing classification of extremist web content through textual analysis. in B Cartwright, G Weir & LY-C Lau (eds), 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF). IEEE, International Conference on Cybercrime and Computer Forensics 2016, Vancouver, Canada, 12/06/16. https://doi.org/10.1109/ICCCF.2016.7740431

Positing the problem : enhancing classification of extremist web content through textual analysis. / Weir, George; Frank, Richard; Cartwright, Barry; dos Santos, Emanuel.

2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF). ed. / Barry Cartwright; George Weir; Laurie Yiu-Chung Lau . IEEE, 2016.

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

TY - GEN

T1 - Positing the problem

T2 - enhancing classification of extremist web content through textual analysis

AU - Weir, George

AU - Frank, Richard

AU - Cartwright, Barry

AU - dos Santos, Emanuel

N1 - © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2016/11/17

Y1 - 2016/11/17

N2 - Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) web-crawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.

AB - Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) web-crawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the intensity of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.

KW - sentiment analysis

KW - web-crawling

KW - classification

KW - textual analysis

U2 - 10.1109/ICCCF.2016.7740431

DO - 10.1109/ICCCF.2016.7740431

M3 - Conference contribution book

SN - 9781509060962

BT - 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF)

A2 - Cartwright, Barry

A2 - Weir, George

A2 - Lau , Laurie Yiu-Chung

PB - IEEE

ER -

Weir G, Frank R, Cartwright B, dos Santos E. Positing the problem: enhancing classification of extremist web content through textual analysis. In Cartwright B, Weir G, Lau LY-C, editors, 2016 IEEE International Conference on Cybercrime and Computer Forensic (ICCCF). IEEE. 2016 https://doi.org/10.1109/ICCCF.2016.7740431