Abstract
There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model.
In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (idp-kg). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting idp-kg has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.
In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (idp-kg). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting idp-kg has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.
Original language | English |
---|---|
Pages (from-to) | 11-18 |
Number of pages | 10 |
Journal | CEUR Workshop Proceedings |
Volume | 3127 |
Publication status | Published - 14 Jan 2022 |
Event | 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences - Online Duration: 10 Jan 2022 → 14 Jan 2022 |
Keywords
- knowledge graphs
- bioschemas
- findable
- intrinsically disordered protiens