Table of content

Home » Blog » Automatic knowledge graph creation from scientific Literature

Automatic knowledge graph creation from scientific Literature

June 15, 2022

Reading time: 4 min

Written by

vincent.nibart

Kairntech and Fraunhofer SCAI presented joint work on knowledge graph creation during a webinar on June 8, 2022

You can find the recording of the webinar here.

Harvesting the knowledge about the interactions between genes, proteins, drugs and diseases from scientific literature is a labor-intensive task. Knowledge graph databases are a powerful tool to allow for the handing of information of this type, but the manual analysis required to feed them is a slow and costly process.

The bioinformatics team at Fraunhofer SCAI in Sankt Augustin has a long experience on this topic.

“The data around a given indication is complex, scattered and heterogeneous – but it is not infinite. The respective knowledge, too, is complex – but it can be formalized”
Prof. Martin Hofmann-Apitius from Fraunhofer SCAI

One key ingredient for making this knowledge accessible is the Biological Expression Language (BEL) that offers a framework about how to encode the insights from scientific content in a machine-readable way.

What is left then is the question: How to create large volumes of BEL statements on the ever-growing body of literature on even narrow topics under realistic time and money constraints?

Large-scale automatic relation extraction

In spring 2021, SCAI and Kairntech therefore decided to embark on a pilot project on this question: Can the respective analysis of scientific publications and the subsequent creation of the corresponding BEL statements be automated with high enough quality to support the feeding of Knowledge Graphs for downstream investigations?

BEL knowledge — Figure 1: Encoding relations derived from literature analysis in BEL (Chart copied from Martin Hofmann-Apitius)

Kairntech already had a number of ingredients for such an experiment in place, as Stefan Geißler from Kairntech explained in this presentation: An off-the-shelf entity extraction component that covers up-to-date knowledge about almost any domain and that therefore also was able to perform the extraction of the types of entities required for the project.

Another important ingredient was the Kairntech notion of processing pipelines that allowed for the quick assembly of sophisticated processes, involving several dedicated analysis steps. Specifically, the entity extraction, the computation of secondary properties of the recognized entities as well as the rendering of the results into the required BEL format were combined here.

What was a new step in this pipeline was the inclusion of a deep learning driven model to compute relations between the respective entities: Does A “increase” B or does it “decrease” it or are the two only “associated”? The component was trained to detect this and a handful of other important relations.

Evaluating the results

In the 2021 pilot Kairntech created a large set of BEL relations from scientific literature on psychiatric disorders (Schizophrenia and Bipolar Disorder) and submitted them to the experts at SCAI for assessment.

“We found between 70 and 80% or the stipulated relations to be correct.”
Prof. Martin Hofmann-Apitius from Fraunhofer SCAI

That was Martin Hofmann-Apitius conclusion after his team had assessed the quality on a sample of the result data set.

“Not long ago one had to be happy to get results of this type for just entity extraction, but having this now even for relations is outstanding.”
Prof. Martin Hofmann-Apitius from Fraunhofer SCAI

In the webinar on June 8, 2022 SCAI and Kairntech explained the machinery that made this analysis possible as well as the perspectives that are it opens up now. Creating and updating large indication-wide Knowledge Graphs now become feasible.

Knowledge graph — Figure 2 Proteins that were found to have a positive correlation to both Schizophrenia as well as Bipolar Disorder – here visualized in Neo4j’s Graph Browser.

Next steps in the SCAI-Kairntech cooperation

Encouraged by the results of the cooperation, SCAI and Kairntech are currently extending their joint work in various directions: By further finetuning and optimizing the analysis, but also in particular by investigating the parallelization of the analysis to benefit from the massive computing power of SCAI’s large high performance computing cluster and the respective experiences in the SCAI team.

Current activities of the partners SCAI and Kairntech are therefore outlining the options for NLP/AI-powered knowledge graph creation and updating also for third parties from the industry as well as exploring the requirements to perform the computation of larger indication-wide knowledge-graphs.

SCAI and Kairntech plan to schedule new webinars in regular intervals to inform the community about recent achievements and next steps.

If you want to learn more, contact us at: info@kairntech.com

You can find the presentation of SCAI below:

Kairntech-SCAI-webinar-2022-finalpptx-compressed Download

You can download the presentation of Kairntech below:

Kairntech_SCAI_Webinar_20220608 Download