How to setup question answering? - Kairntech Documentation

Question answering is a functional domain of Natural Language Processing, positioned on the same level as document classification and the detection of entities within documents.

The objective is to create a “Chat-GPT like” user experience on documents, which are often internal and confidential: ask questions to documents and obtain answers that contain a link to the source document.

Create a project and select the Question-answering option.
Upload documents. A few documents are enough to test the solution.
Documents are then automatically segmented in so-called ‘chunks’ or ‘snippets’, typically a phrase or paragraph. It is possible to customize document segmentation by using off-the-shelf segmenter or by creating a segmentation pipeline.

How to configure and use a segmenter?

While for plain text search the uploaded documents are indexed, for semantic search (similarities) the segments are automatically vectorized, a process that consists of creating numeric values from the words with so-called embeddings. “OpenAI-compatible embeddings” is the vectorizer by default, but it is possible to select other off-the-shelf embedding models or to create custom ones.

How to configure and customize a vectorizer?

Ask a question – Search

Three different types of search exist:

Full text search matches the text elements from the question to relevant elements in snippets
Semantic search uses embedding-based vectors to find similarities
Hybrid search is a combination of both methods

The search results (the corresponding text snippets) are displayed and ordered by relevance. The technical process used to generate these search results is called a Retriever.
For semantic and hybrid search it is possible to select different embedding models (Vectorizers) if these are configured and active (with the green tick).

Ask a question – answer generation

Once documents have been successfully imported, automatically indexed and vectorized, go to the Question-answering view
Enter your question in the search box and press “Enter”
A search strategy, a vectorizer and a LLM are used by default, but you can change it in the options below the search bar (click on SHOW OPTIONS).

Large Language Models (LLMs) are used to generate an answer from the search results. The answer is presented with links to the relevant snippet, that in turn contains a link to the source document.
Different LLMs can be configured and selected.
- Commercial LLMs such as the ones provided by Open AI are subject to costs and are integrated using an API. These costs are minimized by only sending the snippets that are the results of a search to the LLM.
- Open Source LLMs such as Llama2, Mistral… can be tested as well.

How to configure an answer generator with LLM for question answering?

Save and import questions

A question is saved in your project by clicking on the save icon

If you have already a list of questions, you can import them in the project

How to import questions in question answering project?

Compare search results and answers

The Compare view is a special view provided to compare the results of different search types, vectorization models and Large Language Models. The goal is to find the best possible combination of search types, vectorization and LLMs to get the highest quality of answers.

How to compare results in question answering project?

How to build and configure a Kairntech Chatbot?