Fraunhofer SCAI compared Kairntech results with LLM results on relationship extraction with encouraging results for Kairntech
At the last meeting of the “Smart Innovation Community” on Dec 5, organized by the Fraunhofer IAO, Kairntech contributed an overview of our work in the EU-funded project COMMUTE (https://www.commute-project.eu/en/about.html) as another case story about the potential of NLP / AI approaches in innovative projects.
In COMMUTE, coordinated by Fraunhofer SCAI in Sankt Augustin, a sister institute of the Fraunhofer IAO in Stuttgart, one activity is the analysis of large amounts of scientific publications to create and enrich knowledge graphs with detailed facts about the relation between genes, proteins, drugs and diseases. The overall goal of COMMUTE is to explore the links between Covid-19 and Neurodegenerative Diseases such as Parkinson and Alzheimer’s. The project partners bring a broad set of diverse competencies to the table from laboratory experiments with organoids to public health data and natural language processing and artificial intelligence. We from Kairntech have contributed our software to the project and used its capabilities to define and apply sophisticated processing pipelines to document content.
Benchmarking
Specifically, a combination of generic, off-the-shelf components and specific, custom-made processing steps do the job in COMMUTE: the general-purpose Kairntech entity recognizer (https://github.com/kermitt2/entity-fishing) that knows how to extract, normalize, disambiguate and link millions of entities from a broad range of topics. A document structure recognition process that extracts metadata from unstructured PDF documents, an output transformation process, that renders the findings in well-formed BEL expressions (https://biological-expression-language.github.io/). And finally, a custom-made relationship extraction model that generates meaningful triples from the extracted entities.
One aspect that we emphasized in the presentation was the intensive publication activity in COMMUTE. For instance, the bioinformatics experts at Fraunhofer SCAI have assessed how the results of Kairntech compare to results obtained by using Large Language Models such as GPT on the same task (https://doi.org/10.1016/j.ailsci.2024.100095). In that paper the SCAI team arrives at the conclusion, that
“Sherpa has extracted far more BEL triples that are labelled as fully correct and from this perspective it outperforms the two GPT models, as evidenced by the bar charts. […] Overall, Sherpa outperforms other methods in terms of high precision by having 43 % triples that have fully/partially correct label (This percentage is 34 % and 6 % for GPT-4 and GPT-3.5, respectively.”
Obviously, a result that we gladly take note of. In the presentation, we highlighted another capability of Kairntech, namely offering natural language answers to natural language questions on user content. Below we see what this functionality (RAG – retrieval augmented generation) has to say on the subject:
The slides presented at the Smart Innovation meeting are accessible here: https://kairntech.com/wp-content/uploads/2024/12/Kairntech-@SmartInnovation20241205.pdf. We would like to thank the team at Fraunhofer IAO for the invitation and the opportunity to present our work in COMMUTE to the audience.