Creating and exploiting the intrinsically disordered protein knowledge graph (IDP-KG)

Alasdair Gray, Petros Papadopoulos, Imran Asif, Ivan Micetic, Andras Hatos

Research output: Contribution to journalConference Contributionpeer-review

20 Downloads (Pure)

Abstract

There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model.

In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (idp-kg). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting idp-kg has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.
Original languageEnglish
Pages (from-to)11-18
Number of pages10
JournalCEUR Workshop Proceedings
Volume3127
Publication statusPublished - 14 Jan 2022
Event13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences - Online
Duration: 10 Jan 202214 Jan 2022

Keywords

  • knowledge graphs
  • bioschemas
  • findable
  • intrinsically disordered protiens

Fingerprint

Dive into the research topics of 'Creating and exploiting the intrinsically disordered protein knowledge graph (IDP-KG)'. Together they form a unique fingerprint.

Cite this