Towards measuring content coordination in microblogs

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

Abstract

The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.

LanguageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication40th European Conference on IR Research, ECIR 2018, Proceedings
EditorsGabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, Allan Hanbury
Place of PublicationCham
PublisherSpringer-Verlag
Pages651-656
Number of pages6
ISBN (Print)9783319769400
DOIs
Publication statusE-pub ahead of print - 1 Mar 2018
Event40th European Conference on Information Retrieval, ECIR 2018 - Grenoble, France
Duration: 26 Mar 201829 Mar 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10772
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference40th European Conference on Information Retrieval, ECIR 2018
CountryFrance
CityGrenoble
Period26/03/1829/03/18

Fingerprint

Metric
Social Networks
Adjustment
Attack
Resources
Demonstrate
Simulation
Context
Profile
Human

Keywords

  • language models
  • online bots and trolls
  • simulating text

Cite this

Roussinov, D. (2018). Towards measuring content coordination in microblogs. In G. Pasi, B. Piwowarski, L. Azzopardi, & A. Hanbury (Eds.), Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Proceedings (pp. 651-656). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10772). Cham: Springer-Verlag. https://doi.org/10.1007/978-3-319-76941-7_58
Roussinov, Dmitri. / Towards measuring content coordination in microblogs. Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Proceedings. editor / Gabriella Pasi ; Benjamin Piwowarski ; Leif Azzopardi ; Allan Hanbury. Cham : Springer-Verlag, 2018. pp. 651-656 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{38d2fd38112946529b82411ad31a23c3,
title = "Towards measuring content coordination in microblogs",
abstract = "The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.",
keywords = "language models, online bots and trolls, simulating text",
author = "Dmitri Roussinov",
note = "This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science, vol 10772. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-76941-7_58.",
year = "2018",
month = "3",
day = "1",
doi = "10.1007/978-3-319-76941-7_58",
language = "English",
isbn = "9783319769400",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "651--656",
editor = "Gabriella Pasi and Benjamin Piwowarski and Leif Azzopardi and Allan Hanbury",
booktitle = "Advances in Information Retrieval",

}

Roussinov, D 2018, Towards measuring content coordination in microblogs. in G Pasi, B Piwowarski, L Azzopardi & A Hanbury (eds), Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10772, Springer-Verlag, Cham, pp. 651-656, 40th European Conference on Information Retrieval, ECIR 2018, Grenoble, France, 26/03/18. https://doi.org/10.1007/978-3-319-76941-7_58

Towards measuring content coordination in microblogs. / Roussinov, Dmitri.

Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Proceedings. ed. / Gabriella Pasi; Benjamin Piwowarski; Leif Azzopardi; Allan Hanbury. Cham : Springer-Verlag, 2018. p. 651-656 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10772).

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

TY - GEN

T1 - Towards measuring content coordination in microblogs

AU - Roussinov, Dmitri

N1 - This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science, vol 10772. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-76941-7_58.

PY - 2018/3/1

Y1 - 2018/3/1

N2 - The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.

AB - The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.

KW - language models

KW - online bots and trolls

KW - simulating text

UR - http://www.scopus.com/inward/record.url?scp=85044465801&partnerID=8YFLogxK

UR - https://doi.org/10.1007/978-3-319-76941-7

U2 - 10.1007/978-3-319-76941-7_58

DO - 10.1007/978-3-319-76941-7_58

M3 - Conference contribution book

SN - 9783319769400

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 651

EP - 656

BT - Advances in Information Retrieval

A2 - Pasi, Gabriella

A2 - Piwowarski, Benjamin

A2 - Azzopardi, Leif

A2 - Hanbury, Allan

PB - Springer-Verlag

CY - Cham

ER -

Roussinov D. Towards measuring content coordination in microblogs. In Pasi G, Piwowarski B, Azzopardi L, Hanbury A, editors, Advances in Information Retrieval: 40th European Conference on IR Research, ECIR 2018, Proceedings. Cham: Springer-Verlag. 2018. p. 651-656. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-76941-7_58