Identifying author heritage using surname data

an application for Russian surnames

Maria Karaulova, Abdullah Gök, Philip Shapira

Research output: Contribution to journalArticle

1 Citation (Scopus)
25 Downloads (Pure)

Abstract

This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname-based identification method and applied it to infer Russian heritage from suffix-based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first-name data to eliminate false-positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.

Original languageEnglish
Pages (from-to)488-498
Number of pages11
JournalJournal of the Association for Information Science and Technology
Volume70
Issue number5
Early online date25 Jan 2019
DOIs
Publication statusPublished - 31 May 2019

Fingerprint

Glossaries
Heritage
regularity
diaspora
dictionary
Group
migration
community
performance

Keywords

  • national heritage
  • morphology
  • surname-based identification method

Cite this

@article{c0fe889cbf4f407a9cdbed4abc159a8b,
title = "Identifying author heritage using surname data: an application for Russian surnames",
abstract = "This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname-based identification method and applied it to infer Russian heritage from suffix-based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first-name data to eliminate false-positive results. The method achieved 98{\%} precision and 94{\%} recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.",
keywords = "national heritage, morphology, surname-based identification method",
author = "Maria Karaulova and Abdullah G{\"o}k and Philip Shapira",
year = "2019",
month = "5",
day = "31",
doi = "10.1002/asi.24104",
language = "English",
volume = "70",
pages = "488--498",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1643",
number = "5",

}

Identifying author heritage using surname data : an application for Russian surnames. / Karaulova, Maria; Gök, Abdullah; Shapira, Philip.

In: Journal of the Association for Information Science and Technology, Vol. 70, No. 5, 31.05.2019, p. 488-498.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Identifying author heritage using surname data

T2 - an application for Russian surnames

AU - Karaulova, Maria

AU - Gök, Abdullah

AU - Shapira, Philip

PY - 2019/5/31

Y1 - 2019/5/31

N2 - This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname-based identification method and applied it to infer Russian heritage from suffix-based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first-name data to eliminate false-positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.

AB - This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname-based identification method and applied it to infer Russian heritage from suffix-based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first-name data to eliminate false-positive results. The method achieved 98% precision and 94% recall rates—superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.

KW - national heritage

KW - morphology

KW - surname-based identification method

UR - https://onlinelibrary.wiley.com/journal/23301643

U2 - 10.1002/asi.24104

DO - 10.1002/asi.24104

M3 - Article

VL - 70

SP - 488

EP - 498

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1643

IS - 5

ER -