Installation Guide

Supported Operating Systems

The Kairntech platform has been successfully tested on the following operating systems on both CPU and GPU environments.

  • Ubuntu 20.04.6 LTS x64
  • RHEL / CentOS 7 x64

Note: While docker virtualization eliminates the constraint on the Operating System on CPU, when the GPU version is to be deployed, the nvidia drivers reduce the operating systems compatibility to the listed ones above.

Supported GPU

For the Kairntech platform to work on GPU ( Graphics Processing Unit ), GPUs with Cuda compute capabilities >= 7.5 are supported (T4, A2, Titan RTX, RTX 3000 series, …).

The compute capability scores are available here.

Hardware Recommendations

Standard requirements are as follows:

  • 8 cores
  • 64 GB RAM (128GB in case entity-fishing or Transformers are deployed)
  • 400 GB SSD (high read IOPS – 10000 or above – is a must for entity-fishing)
  • OS: Ubuntu 20.04.6 LTS or RHEL / CentOS 7 x64

With the following CPU recommendations:

  1. Deep Learning components (Flair, Delft, Spacy, Transformers…):
    a maximum of CPU cores, even if single thread performance is not excellent
  2. Wikidata component (entity-fishing):
    CPU with high sustained all-core turbo frequency (above 3 GHz)
  3. Other components:
    Preferably as 2, but can accommodate 1

As a minimum for both 1 and 2, we recommend any 8 cores (or more) CPU of the following list, with a base clockspeed above 3 GHz and a Single Thread Performance index higher than 2200.

You can read more on CPU benchmark here.

As a matter of example:

  • CPU: AMD Ryzen 7 5700G (8 cores, 16 threads, single thread performance index: 3273)
  • RAM: 128 GB
  • SSD: Samsung 980 Pro M.2 PCIe 4.0 NVMe 1 To

Installation steps

All listed commands below come from the environment UBUNTU 18.04 LTS x64.

Host configuration prerequisites:

Kairntech platform Docker volumes prerequisites

Kairntech platform installation


Host configuration prerequisites:

ELASTICSEARCH recommendation

You may need to increase the vm.max_map_count kernel parameter to avoid running out of map areas.

In order to avoid such message:

[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

It is recommended to edit file /etc/sysctl.conf and insert the following lines:

# ES - at least 262144 for production use
vm.max_map_count=262144

Apply the modification with using the following command

sudo sysctl -p

INOTIFY recommendation

You may need to increase the fs.inotify.max_user_instances parameter to avoid reaching user limits on the number of inotify resources.

In order to avoid such message

[Errno 24] inotify instance limit reached

It is recommended to edit file /etc/sysctl.conf and insert the following lines:

# Prevent [Errno 24] inotify instance limit reached
fs.inotify.max_user_instances = 65530

Apply the modification with using the following command

sudo sysctl -p

HAPROXY recommendation

You may need to set net.ipv4.ip_unprivileged_port_start to let to non root user haproxy the permission to run on priviledged port 443.

In order to avoid such message (in haproxy container console output)

[ALERT]    (1) : Starting frontend http-in-sherpa: cannot bind socket (Permission denied) [0.0.0.0:443]
[ALERT]    (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

It is recommended to edit file /etc/sysctl.conf and insert the following lines:

# Enable haproxy to listen to 443
net.ipv4.ip_unprivileged_port_start=0

Apply the modification with using the following command

sudo sysctl -p

User/Folders creation

USER creation

Is it highly advised to create a specific user, for the deployment of the platform:

# FOR A STANDARD USER
sudo adduser kairntech

# OR FOR A HEADLESS USER
sudo adduser --disabled-password --gecos "" kairntech

FOLDERS creation

Is it highly advised to create specific folders, for the deployment of the platform:

sudo mkdir -p /opt/sherpa
sudo chown -R kairntech. /opt/sherpa

mkdir -p ~/embeddings
mkdir -p ~/vectorizers

The content of the prepared folders will consist in:

Directory /opt/sherpa/ will store all files and folders relative to the platform (delivered by Kairntech)

  • File docker-compose.yml to be used to deploy/pull Docker images of the platform
  • Folder sherpa-core to be used to store authentication mechanism keys and deploy specific components
  • Folder sherpa-haproxy to be used in case redirections are set (optionnal)

Directory ~/embeddings will store all files required for Embeddings volumes (delivered by Kairntech)

  • File deploy-embeddings-delft.sh to be used to deploy Delft embeddings
  • File docker-compose.delft.volumes.yml also to be used to deploy Delft embeddings
  • File deploy-embeddings-flair.sh to be used to deploy Flair embeddings
  • File docker-compose.flair.volumes.yml also to be used to deploy Flair embeddings
  • File deploy-embeddings-fasttext.sh to be used to deploy fastText embeddings
  • File docker-compose.fasttext.volumes.yml also to be used to deploy fastText embeddings
  • File deploy-knowledge-entityfishing.sh to be used to deploy entity-fishing knowledge
  • File docker-compose.ef.volumes.yml also to be used to deploy entity-fishing knowledge

Directory ~/vectorizers will store all files required for Vectorizers volumes (delivered by Kairntech)

  • File docker-compose.vectorizer.allminilml6v2.yml to be used to deploy allMiniLML6V2 model
  • File docker-compose.vectorizer.multiminilml12v2.yml to be used to deploy multiMiniLML12V2 model
  • File docker-compose.vectorizer.spdilacamembertgpl.yml to be used to deploy spDilaCamembert model
  • File docker-compose.vectorizer.sentencecamembertbase.yml to be used to deploy sentenceCamembertBase model

Binaries installation

Docker / Docker Compose installation

The platform being based on a Docker-type solution, please install docker and docker compose plugin.
The official page indicating the installation commands is located here.

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Then you will have to add the kairntech user to the docker group

sudo usermod -aG docker kairntech

As mentionned in the installation guide, log out and log back in so that your group membership is re-evaluated.

If you want to test, open a new session terminal and run

sudo su - kairntech

docker run hello-world

After installing the compose plugin, you can test via:

sudo su - kairntech

docker compose version

Docker volumes to mount

In order to feed docker volumes with embeddings files, some scripts will be provided (by Kairntech) in a zip file. You’ll need to have unzip binary on hand to uncompress.

FLAIR embeddings

In order to fully utilize the Flair engine, « embeddings » files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:

sudo su - kairntech

cd ~/embeddings

# INSTALL AR, DE, EN AND FR
export FLAIR_LANGS=ar,de,en,fr
docker compose -f docker-compose.flair.volumes.yml -p volumes-flair up

# OR INSTALL ALL LANGUAGES
export FLAIR_LANGS=all
docker compose -f docker-compose.flair.volumes.yml -p volumes-flair up

Once deployed, you should get the following sizes (all languages)

sudo du -hs /var/lib/docker/volumes/sherpashared_flair_suggester_datasets
12K

sudo du -hs /var/lib/docker/volumes/sherpashared_flair_suggester_embeddings
35GB

The Docker container can be removed, once Flair embeddings are deployed, via:

docker rm flair-suggester-init-job

The table below gives disk usage required to deploy available languages:

LanguageSize
Arabic (AR)2.9G
German (DE)4.3G
English (EN)3.8G
Spanish (ES)4.2G
Farsi (FA)768M
French (FR)4.2G
Hindi (HI)1.0G
Italian (IT)3.9G
Dutch (NL)3.9G
Portuguese (PT)2.7G
Russian (RU)4.1G
Chinese (ZH)1.6G
All35G

DELFT embeddings

In order to fully utilize the Delft engine, « embeddings » files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:

sudo su - kairntech

cd ~/embeddings

# INSTALL DE, EN AND FR
export DELFT_LANGS=de,en,fr
docker compose -f docker-compose.delft.volumes.yml -p volumes-delft up

# OR INSTALL ALL LANGUAGES
export DELFT_LANGS=all
docker compose -f docker-compose.delft.volumes.yml -p volumes-delft up

Once deployed, you should get the following sizes (all languages)

sudo du -hs /var/lib/docker/volumes/sherpashared_delft_suggester_database/
22G /var/lib/docker/volumes/sherpashared_delft_suggester_database/

sudo du -hs /var/lib/docker/volumes/sherpashared_delft_suggester_embeddings/
2.2G    /var/lib/docker/volumes/sherpashared_delft_suggester_embeddings/

The Docker container can be removed, once Delft embeddings are deployed, via:

docker rm delft-suggester-init-job

The table below gives disk usage required to deploy available languages:

LanguageSize
German (DE)7.0G
English (EN)7.2G
Spanish (ES)3.3G
French (FR)4.1G
Italian (IT)3.0G
Dutch (NL)3.0G
All24.2G

FASTTEXT embeddings

In order to fully utilize the fastText engine, « embeddings » files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:

sudo su - kairntech

cd ~/embeddings

# INSTALL AR, DE, EN AND FR
export FASTTEXT_LANGS=ar,de,en,fr
docker compose -f docker-compose.fasttext.volumes.yml -p volumes-fasttext up

# OR INSTALL ALL LANGUAGES
export FASTTEXT_LANGS=all
docker compose -f docker-compose.fasttext.volumes.yml -p volumes-fasttext up

Once deployed, you should get the following sizes (all languages)

sudo du -hs /var/lib/docker/volumes/sherpashared_fasttext_suggester_embeddings/
29G    /var/lib/docker/volumes/sherpashared_fasttext_suggester_embeddings/

The Docker container can be removed, once fastText embeddings are deployed, via:

docker rm fasttext-suggester-init-job

The table below gives disk usage required to deploy available languages:

LanguageSize
Arabic (AR)1.5G
German (DE)5.6G
English (EN)6.2G
Spanish (ES)2.5G
French (FR)2.9G
Italian (IT)2.2G
Japanese (JA)1.3G
Portuguese (PT)1.5G
Russian (RU)4.7G
Chinese (ZH)822M
All29G

ENTITY-FISHING knowledge

In order to fully utilize the entity-fishing engine, « knowledge » files must be downloaded.
These static files are generated every month, and stored as Docker volumes. In order to download these items, please run:

sudo su - kairntech

cd ~/embeddings

# INSTALL AR, DE, EN AND FR
export EF_LANGS=ar,de,en,fr
export EF_DATE=02-03-2023
docker compose -f docker-compose.ef.volumes.yml -p volumes-ef up

# OR INSTALL ALL LANGUAGES
export EF_LANGS=all
export EF_DATE=02-03-2023
docker compose -f docker-compose.ef.volumes.yml -p volumes-ef up

Once deployed, you should get the following sizes (all languages)

sudo du -hs /var/lib/docker/volumes/sherpa_entityfishing_data
100GB

The Docker container can be removed, once entity-fishing knowledge is deployed, via:

docker rm entity-fishing-init-job

The table below gives disk usage required to deploy available languages:

LanguageSize
Arabic (AR)36.7G (3.7G + 33G)
German (DE)40.0G (6.0G + 33G)
English (EN)49G (16G + 33G)
Spanish (ES)37.4G (4.4G + 33G)
Farsi (FA)36.5G (3.5G + 33G)
French (FR)38.6G (5.6G + 33G)
Italian (IT)36.9G (3.9G + 33G)
Japanese (JA)36.6G (3.6G + 33G)
Portuguese (PT)35.8G (2.8G + 33G)
Russian (RU)39.4G (6.4G + 33G)
Chinese (ZH)36.1G (3.1G + 33G)
Ukrainian (UA)36.6G (3.6G + 33G)
Indian (HI)33.5G (455M + 33G)
Swedish (SE)37.2G (4.2G + 33G)
Bengali (BD)33.7G (700M + 33G)
All100G (67G + 33G)

In these metrics, the common knowledge takes 33G of disk usage, and is mandatory.

VECTORIZERS

In order to fully utilize the vectorizers, languages models files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:

sudo su - kairntech

cd ~/vectorizers

# INSTALL allMiniLML6V2
docker compose -f docker-compose.vectorizer.allminilml6v2.yml -p allminilml6v2 up

# INSTALL multiMiniLML12V2
docker compose -f docker-compose.vectorizer.multiminilml12v2.yml -p multiminilml12v2 up

# INSTALL spDilaCamembert
docker compose -f docker-compose.vectorizer.spdilacamembertgpl.yml -p spdilacamembertgpl up

# INSTALL sentenceCamembertBase
docker compose -f docker-compose.vectorizer.sentencecamembertbase.yml -p sentencecamembertbase up

Kairntech platform installation

As a first step, in order to configure JWT authentication, run the following commands:

sudo su - kairntech

cd /opt/sherpa/sherpa-core/jwt

##  In order to generate private.pem
openssl genrsa -out private.pem 2048

##  In order to generate private_key.pem
openssl pkcs8 -topk8 -inform PEM -in private.pem -out private_key.pem -nocrypt

##  In order to generate public.pem
openssl rsa -in private.pem -outform PEM -pubout -out public.pem

This will generate 3 files:

  • private.pem, to be kept in a safe place
  • private_key.pem, to be used in sherpa-core/jwt folder
  • public.pem, to be used in sherpa-core/jwt folder

Then, in order to download the different images needed to install the platform, you must first connect to dockerhub.
(The password to be used will be delivered by Kairntech ).

sudo su - kairntech

cd /opt/sherpa

docker login

username: ktguestkt

password: 

Once logged in, you can start downloading the images:

docker compose -f docker-compose.yml pull

Finally, to start the platform, run:

docker compose -f docker-compose.yml up -d

Once the platform is started, you can check the status of the containers; the following console output is given as an example. Some containers may not be present, depending on the kind of deployment you processed.

docker ps -a --format "{{.ID}}\t\t{{.Names}}\t\t{{.Status}}"

79e235f82787        sherpa-core                           Up 20 sec

e69f95855809        sherpa-crfsuite-suggester             Up 20 sec
c9d95639c808        sherpa-entityfishing-suggester        Up 20 sec
94e4574b95de        sherpa-fasttext-suggester             Up 20 sec

8f13e72aeb0d        sherpa-phrasematcher-test-suggester   Up 20 sec
0f49dec91340        sherpa-phrasematcher-train-suggester  Up 20 sec
aa08f1008770        sherpa-sklearn-test-suggester         Up 20 sec
988976ef327d        sherpa-sklearn-train-suggester        Up 20 sec
bed6169d9185        sherpa-spacy-test-suggester           Up 20 sec
302bd98a44ab        sherpa-spacy-train-suggester          Up 20 sec
7754162ae44c        sherpa-flair-test-suggester           Up 20 sec
08d1ad415adb        sherpa-flair-train-suggester          Up 20 sec
8ded96094605        sherpa-delft-test-suggester           Up 20 sec
ebe47bd3ddf3        sherpa-delft-train-suggester          Up 20 sec
4835129a77c9        sherpa-bertopic-test-suggester        Up 20 sec
b999a848044c        sherpa-bertopic-train-suggester       Up 20 sec

0826e0dd9c85        sherpa-elasticsearch                  Up 20 sec
7f781bf11ddf        sherpa-mongodb                        Up 20 sec

d3b0e0557309        sherpa-builtins-importer              Up 20 sec

cf075d3b06f4        sherpa-multirole                      Up 20 sec
ae1b24e0ccdb        sherpa-pymultirole                    Up 20 sec
2a737b399388        sherpa-pymultirole-trf                Up 20 sec
f43121e96544        sherpa-pymultirole-ner                Up 20 sec