How to evaluate the quality of a retriever for question answering?

In your question-answering project, if you are not fully satisfied with the search retriever leveraging the default vectorizer, you can use and experiment different vectorizers to see which perform best.

Prepare a set of question answer in a JSON file with the format below. This dataset will be considered as your GOLD dataset.
Important notes:
- all answers should be in your corpus of segments.
- if you have a FAQ, you can import it even if the segments don’t contain the exact answer as formulated in the FAQ: the “Reference” retriever’s search plain text (see below) should still bring it back.

[
  {
    "text": "My first question?",
    "identifier": "question1",
    "metadata": {
      "question": "true"
    },
    "altTexts": [
      {
        "name": "answer",
        "text": "My first answer"
      }
    ]
  },
  {
    "text": "My second question?",
    "identifier": "question2",
    "metadata": {
      "question": "true"
    },
    "altTexts": [
      {
        "name": "answer",
        "text": "My second answer"
      }
    ]
  },
  {
    "text": "My third question?",
    "identifier": "question3",
    "metadata": {
      "question": "true"
    },
    "altTexts": [
      {
        "name": "answer",
        "text": "My third answer"
      }
    ]
  }
]

Go to the Document view
Import the JSON file in your project as you would import documents

Go the Processing menue
Create a first “Retriever” component

Use the default configuration

Configure the “Search parameter builder” component

Give the name “Reference”
- keep all default values except the Size (Maximum number of hits to be returned) = 1
- Apply Size = 3 if the segments don’t contain the exact answer as formulated in the FAQ.
Don’t forget to Save

Create a new “Retriever”
Select
- the Size (Maximum number of hits to be returned) = for instance 10
- the Search type you want to test (full-text, vector or hybrid search)
- the Vectorizer if you want to test vector or hybrid search…
Don’t forget to Save