Cloud-based textual analysis as a basis for document classification

George Weir, Kolade Owoeye, Alice Oberacker, Haya Alshahrani

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

8 Citations (Scopus)
49 Downloads (Pure)


Growing trends in data mining and developments in machine learning, have encouraged interest in analytical techniques that can contribute insights on data characteristics. The present paper describes an approach to textual analysis that generates extensive quantitative data on target documents, with output including frequency data on tokens, types, parts-of-speech and word n-grams. These analytical results enrich the available source data and have proven useful in several contexts as a basis for automating manual classification tasks. In the following, we introduce the Posit textual analysis toolset and detail its use in data enrichment as input to supervised learning tasks, including automating the identification of extremist Web content. Next, we describe the extension of this approach to Arabic language. Thereafter, we recount the move of these analytical facilities from local operation to a Cloud-based service. This transition, affords easy remote access for other researchers seeking to explore the application of such data enrichment to their own text-based data sets.

Original languageEnglish
Title of host publication2018 International Conference on High Performance Computing & Simulation (HPCS)
EditorsKhalid Zine-Dine, Waleed W. Smari
Place of PublicationPiscataway, New Jersey
Number of pages5
ISBN (Print)9781538678787
Publication statusE-pub ahead of print - 1 Nov 2018
Event16th International Conference on High Performance Computing and Simulation, HPCS 2018 - Orleans, France
Duration: 16 Jul 201820 Jul 2018


Conference16th International Conference on High Performance Computing and Simulation, HPCS 2018


  • classification
  • cloud-service
  • data mining
  • featureset
  • posit
  • textual analysis


Dive into the research topics of 'Cloud-based textual analysis as a basis for document classification'. Together they form a unique fingerprint.

Cite this