From corpus-based collocation frequencies to readability measure

N.K. Anagnostou, G.R.S. Weir

Research output: Contribution to conferencePaper

Abstract

This paper provides a broad overview of three separate but related areas of research. Firstly, corpus linguistics is a growing discipline that applies analytical results from large language corpora to a wide variety of problems in linguistics and related disciplines. Secondly, readability research, as the name suggests, seeks to understand what makes texts more or less comprehensible to readers, and aims to apply this understanding to issues such as text rating and matching of texts to readers. Thirdly, collocation is a language feature that occurs when particular words are used frequently together for other than purely grammatical reasons. The intersection of these three aspects provides the basis for on-going research within the Department of Computer and Information Sciences at the University of Strathclyde and is the motivation for this overview. Specifically, we aim through analysis of collocation frequencies in major corpora, to afford valuable insight on the content of texts, which we believe will, in turn, provide a novel basis for estimating text readability.

Conference

ConferenceICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006
CityGlasgow, UK
Period21/08/0622/08/06

Fingerprint

linguistics
language
information science
computer science

Keywords

  • corpus linguistics
  • readability research
  • languages

Cite this

Anagnostou, N. K., & Weir, G. R. S. (2006). From corpus-based collocation frequencies to readability measure. 33-46. Paper presented at ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006, Glasgow, UK, .
Anagnostou, N.K. ; Weir, G.R.S. / From corpus-based collocation frequencies to readability measure. Paper presented at ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006, Glasgow, UK, .13 p.
@conference{1c224b75a15840f79abaa7ac40e3f121,
title = "From corpus-based collocation frequencies to readability measure",
abstract = "This paper provides a broad overview of three separate but related areas of research. Firstly, corpus linguistics is a growing discipline that applies analytical results from large language corpora to a wide variety of problems in linguistics and related disciplines. Secondly, readability research, as the name suggests, seeks to understand what makes texts more or less comprehensible to readers, and aims to apply this understanding to issues such as text rating and matching of texts to readers. Thirdly, collocation is a language feature that occurs when particular words are used frequently together for other than purely grammatical reasons. The intersection of these three aspects provides the basis for on-going research within the Department of Computer and Information Sciences at the University of Strathclyde and is the motivation for this overview. Specifically, we aim through analysis of collocation frequencies in major corpora, to afford valuable insight on the content of texts, which we believe will, in turn, provide a novel basis for estimating text readability.",
keywords = "corpus linguistics, readability research, languages",
author = "N.K. Anagnostou and G.R.S. Weir",
year = "2006",
month = "8",
day = "21",
language = "English",
pages = "33--46",
note = "ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006 ; Conference date: 21-08-2006 Through 22-08-2006",

}

Anagnostou, NK & Weir, GRS 2006, 'From corpus-based collocation frequencies to readability measure' Paper presented at ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006, Glasgow, UK, 21/08/06 - 22/08/06, pp. 33-46.

From corpus-based collocation frequencies to readability measure. / Anagnostou, N.K.; Weir, G.R.S.

2006. 33-46 Paper presented at ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006, Glasgow, UK, .

Research output: Contribution to conferencePaper

TY - CONF

T1 - From corpus-based collocation frequencies to readability measure

AU - Anagnostou, N.K.

AU - Weir, G.R.S.

PY - 2006/8/21

Y1 - 2006/8/21

N2 - This paper provides a broad overview of three separate but related areas of research. Firstly, corpus linguistics is a growing discipline that applies analytical results from large language corpora to a wide variety of problems in linguistics and related disciplines. Secondly, readability research, as the name suggests, seeks to understand what makes texts more or less comprehensible to readers, and aims to apply this understanding to issues such as text rating and matching of texts to readers. Thirdly, collocation is a language feature that occurs when particular words are used frequently together for other than purely grammatical reasons. The intersection of these three aspects provides the basis for on-going research within the Department of Computer and Information Sciences at the University of Strathclyde and is the motivation for this overview. Specifically, we aim through analysis of collocation frequencies in major corpora, to afford valuable insight on the content of texts, which we believe will, in turn, provide a novel basis for estimating text readability.

AB - This paper provides a broad overview of three separate but related areas of research. Firstly, corpus linguistics is a growing discipline that applies analytical results from large language corpora to a wide variety of problems in linguistics and related disciplines. Secondly, readability research, as the name suggests, seeks to understand what makes texts more or less comprehensible to readers, and aims to apply this understanding to issues such as text rating and matching of texts to readers. Thirdly, collocation is a language feature that occurs when particular words are used frequently together for other than purely grammatical reasons. The intersection of these three aspects provides the basis for on-going research within the Department of Computer and Information Sciences at the University of Strathclyde and is the motivation for this overview. Specifically, we aim through analysis of collocation frequencies in major corpora, to afford valuable insight on the content of texts, which we believe will, in turn, provide a novel basis for estimating text readability.

KW - corpus linguistics

KW - readability research

KW - languages

UR - http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_1539.pdf

M3 - Paper

SP - 33

EP - 46

ER -

Anagnostou NK, Weir GRS. From corpus-based collocation frequencies to readability measure. 2006. Paper presented at ICT in the Analysis, Teaching and Learning of Languages, Preprints of the ICTATLL Workshop 2006, Glasgow, UK, .