How to configure and customize a vectorizer?

If you are not fully satisfied with the default vectorizer, you can use and experiment other off-the-shelf vectorizers. Customization is also possible by fine-tuning on a particular business domain or by adding more context to each segment.

Go to the Processing view
Create a new vectorizer

Give a name to your vectorizer
Select an off-the-shelf vectorizer
- all-MiniLM-L6-v2 (English)
- OpenAI embeddings (English)
- Paraphrase-multilingual-MiniLM (multilingual)
- …
By default, we use the following vectorizers:
- all-MiniLM-L6-v2 for english content
- CamemBERT for french content
- paraphrase-multilingual-MiniLM-L12-v2 for all other languages

Save and possibly activate the vectorizer on your project

The active vectorizers are marked with the green tick
The default vectorizer used in the semantic search in marked with the yellow cross

Customization

It is possible to add context to each segment for a better vectorization hence a better Retriever.
- Go to the Processing menu
- Create a new Vectorizer
- Select “Advanced vectorizer”
- Select “Web template engine Jinja” in the off-the-shelf component list.
- Parameters allow you create a Ninja script (see here for instance).
  - It is possible to add the document title to each text segment writing “{{ title }} > {{ text }}” in the Jinja template
  - It is possible to add the title and document metadata to each text segment writing “{{ title }} > {{ metadata.name }} > {{ text }}” in the Jinja template.
  - Kairntech professional services can assist you.

A vectorizer can be fine-tuned for a particular business domain or language.
- Kairntech professional services can do it for you.