Knowledge graphs (KGs) have recently gained attention due to their flexible data model, which reduces the effort needed for integration across different, possibly heterogeneous, data sources. In this tutorial, we learn how to access scientific data stored in a relational database through the virtual knowledge graph (VKG) approach. In such an approach, the data are exposed as a KG and enriched with semantic information coming from a domain ontology. The KG is “virtual” in the sense that the data are not replicated but stay within the data sources and are accessed at query time.
We demonstrate the approach over scientific data coming from the biomedical domain and using the open-source VKG system Ontop. Since legacy data are exposed as a KG, users can access the data by means of a more convenient vocabulary provided by the domain ontology, benefit from automated reasoning capabilities, and do not need to focus on how the data are actually stored. Furthermore, the virtual approach allows for the use of KGs even in those contexts where the user does not own the data nor is granted the rights to make a copy of them.
By relying on existing federation tools, the approach described here for accessing scientific data can also be used to integrate multiple, heterogeneous, and possibly semi-structured and unstructured data sources.