Knowledge Graphs are an increasingly popular and powerful approach to organize information. They excel at the intuitiveness of the way they make complex data accessible. They also offer impressive performance allowing to address many otherwise difficult to compute tasks. Kairntech applies AI/NLP approaches to identify and structure knowledge that can be derived from text corpora. Feeding the extracted information into Knowledge Graphs is an evident continuation of the Kairntech approach. So in that sense, Kairntech and Knowledge Graphs form a logical couple, complementing and supporting each other.
In order to further strengthen the link between Kairntech and Graph Databases, we have recently joined the Neo4J Startup Program.
In what follows we highlight a case we recently worked on for a client: The computation of relationships between life science entities (proteins, genes, drugs, diseases) on a larger corpus of scientific publication that we performed for Fraunhofer SCAI in Sankt Augustin.
A Knowledge Graph specialist: Linkurious
Since in exercises like that we focus on our core capabilities (the identification and extraction of entities and relations from text), we leave storing this data and making it accessible as Knowledge Graphs to dedicated players and components. One example in this direction is Linkurious, a specialist for data analysis and visualization. Linkurious comes with an easy to use, web-based Knowledge Graph web frontend. The information resulting from the Kairntech analysis above lends itself easily to get imported into the Neo4J DB underlying a Linkurious installation.
Most scenarios where Graphs Databases excel today rely on the import of structured information and then using their analysis capabilities to reveal underlying insights and hidden patterns. A large body of financial transactions for instance typically already comes in some table form. That typically makes for instances the sender, the recipient, the account numbers, the amount, the date and other similar information explicit. Analysing this information in a Knowledge Graph then may allow to identify fraud or other relevant financial patterns.
In our case here however, we want to emphasize that the relevant information typically does not yet exist in structured (i.e. tabular) form. Instead it is rather implicitly hidden in the text corpus in unstructured form, in our case in thousands of scientific publication on a given topic. Only after the automatic NLP analysis of this data lends itself to analyses in graph databases. Considering the significant efforts – and hence: costs – that an otherwise manual creation of data of this type requires, the case for the NLP/AI-powered automatic enrichment of knowledge graphs is evident.
A minimal example: Shortest path analysis between entities
A first simple example for the kinds of analyses that the approach above facilitates is the shortest path analysis: How and through which other entities are two given entities linked in our dataset.
A minimal example for a shortest path analysis between “MTOR” and “Schizophrenia” in the Knowledge Graph interface of Linkurious.
The Linkurious Graph DB interface allows to quickly perform such an analysis: Here for instance checking what is the shortest path that links the protein MTOR to the disease Schizophrenia. We see that these two entities are linked via a substance named Dizoclipine and moreover we get access to the justification for these links: the provenance information – from which publication and from which precise sentence the relation comes from is stored alongside the extracted entities and their relations. Finally, we see in our little example above, that often entities in the life sciences are known with more than just one name: A substance e.g. often comes with a trade name (like “Dizoclipine”) but also with other known synonyms (here e.g. “MK-801”). An entity recognition process that is able to recognize these variants and normalize them under one preferred label is a key requirement in making the resulting graphs clean and easy to digest.
Heureka: the Graph supports new insights
In the project with SCAI mentioned above, the Kairntech role was to employ automatic NLP/AI methods on new data in order to enrich existing Knowledge Graphs on the origins and potential therapies for neurodegenerative diseases such as Schizophrenia and Bipolar Disorder. The data created by our processes was imported and consolidated with previous, manually created data. An important and encouraging feedback from SCAI about the project results then was that in fact this step – enriching the existing Knowledge Graph with automatically created data – had led to “new” insights. “New” in the sense that some specific entities had so far not been identified as potentially relevant candidates before. In our case here these were the three proteins “ALK”, “APOE” and “MAPK8IP1” that were found to be upstream regulators of tau phosphorylation (which in turn plays a significant role around the studied indications).
The graph above underlines that “APOE” plays a significant role not only in the context of Parkinson and Alzheimer’s, but also around the two main syndromes of the present study, Schizophrenia and Bipolar Disorder.
The reported case study underlines the value that lies in combining the two approaches – automatic AI/NLP powered processing of large document collections on the one had side, and the analysis of the resulting information in Knowledge Graphs on the other hand side – in order to support large, complex data analysis challenges.
Kairntech is currently actively exploring the potential benefits for third parties together with our partner Fraunhofer SCAI as well as with partners from the area of Knowledge Database software vendors.