If you are not fully satisfied with the default vectorizer, you can use and experiment other off-the-shelf vectorizers. Customization is also possible by fine-tuning on a particular business domain or by adding more context to each segment.
- Go to the Processing view
- Create a new vectorizer

- Give a name to your vectorizer
- Select an off-the-shelf vectorizer
- all-MiniLM-L6-v2 (English)
- OpenAI embeddings (English)
- Paraphrase-multilingual-MiniLM (multilingual)
- …
- By default, we use the following vectorizers:
- all-MiniLM-L6-v2 for english content
- CamemBERT for french content
- paraphrase-multilingual-MiniLM-L12-v2 for all other languages

- Save and possibly activate the vectorizer on your project

- The active vectorizers are marked with the green tick
- The default vectorizer used in the semantic search in marked with the yellow cross

Customization
- It is possible to add context to each segment for a better vectorization hence a better Retriever.
- Go to the Processing menu
- Create a new Vectorizer
- Select “Advanced vectorizer”
- Select “Web template engine Jinja” in the off-the-shelf component list.
- Parameters allow you create a Ninja script (see here for instance).
- It is possible to add the document title to each text segment writing “{{ title }} > {{ text }}” in the Jinja template
- It is possible to add the title and document metadata to each text segment writing “{{ title }} > {{ metadata.name }} > {{ text }}” in the Jinja template.
- Kairntech professional services can assist you.

- A vectorizer can be fine-tuned for a particular business domain or language.
- Kairntech professional services can do it for you.