TY - GEN
T1 - Towards measuring content coordination in microblogs
AU - Roussinov, Dmitri
N1 - This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science, vol 10772. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-76941-7_58.
PY - 2018/3/1
Y1 - 2018/3/1
N2 - The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.
AB - The value of microblogging services (such as Twitter) and social networks (such as Facebook) in disseminating and discussing important events is currently under serious threat from automated or human contributors employed to distort information. While detecting coordinated attacks by their behaviour (e.g. different accounts posting the same images or links, fake profiles, etc.) has been already explored, here we look at detecting coordination in the content (words, phrases, sentences). We are proposing a metric capable of capturing the differences between organic and coordinated posts, which is based on the estimated probability of coincidentally repeating a word sequence. Our simulation results support our conjecture that only when the metric takes the context and the properties of the repeated sequence into consideration, it is capable of separating organic and coordinated content. We also demonstrate how those context-specific adjustments can be obtained using existing resources.
KW - language models
KW - online bots and trolls
KW - simulating text
UR - http://www.scopus.com/inward/record.url?scp=85044465801&partnerID=8YFLogxK
UR - https://doi.org/10.1007/978-3-319-76941-7
U2 - 10.1007/978-3-319-76941-7_58
DO - 10.1007/978-3-319-76941-7_58
M3 - Conference contribution book
AN - SCOPUS:85044465801
SN - 9783319769400
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 651
EP - 656
BT - Advances in Information Retrieval
A2 - Pasi, Gabriella
A2 - Piwowarski, Benjamin
A2 - Azzopardi, Leif
A2 - Hanbury, Allan
PB - Springer-Verlag
CY - Cham
T2 - 40th European Conference on Information Retrieval, ECIR 2018
Y2 - 26 March 2018 through 29 March 2018
ER -