Migration Guide (kubernetes)

The following lines will guide you trough the migration from the version manta to the version naja.

These notes are based on a kubernetes deployment.

ℹ️ Note:

The new naja version comes with Elastic Search autentication (in addition to MongoDB authentication).

1. Create MongoDB dump (migration 4.0.6 → 7.0.14) – optionnal

In order to create a MongoDB dump, a cron job will have to be created.

Enter the folder kubernetes/mongodb-backup-restore/ and pay attention to file
manifest mongodb-backup-job.yaml.

Specifications may have to be adapted to match the deployed release name:

    spec:
      containers:
        - env:
            - name: COMMAND
              value: dump
            - name: MAX_DUMPS
              value: "30"
            - name: MONGODB_HOST
              value: sherpa-mongodb:27017
            - name: MONGODB_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: MONGODB_PASSWORD
                  name: mongodb-credentials
            - name: MONGODB_USERNAME
              valueFrom:
                secretKeyRef:
                  key: MONGODB_USERNAME
                  name: mongodb-credentials
          image: kairntech/mongodb-backup:naja-preview-3
          imagePullPolicy: Always
          name: mongodb-backup-job
          volumeMounts:
            - mountPath: /data
              name: mongodb-dumps
      imagePullSecrets:
        - name: kairntech-docker-credentials
      restartPolicy: Never
      volumes:
        - name: mongodb-dumps
          persistentVolumeClaim:
            claimName: mongodb-dumps

Apply the restoration job:

kubectl apply -f mongodb-backup-job.yaml

Check that the dump is being created:

kubectl logs -f job.batch/mongodb-backup-job

Wait for the end of the dump creation (note the MongoDB dump name):

kubectl logs -f job.batch/mongodb-backup-job

Running: mongodump  --host=sherpa-mongodb:27017 --authenticationDatabase=admin --username=admin --password=<hidden> --gzip --archive=mongodump-sherpa-2025-08-29-08h56_06.gz
2025-08-29T08:56:07.050+0200	writing admin.system.users to archive 'mongodump-sherpa-2025-08-29-08h56_06.gz'
2025-08-29T08:56:07.055+0200	done dumping admin.system.users (1 document)
...
Dump is ready: /data/dumps/mongodump-sherpa-2025-08-29-08h56_06.gz
Running: mongosh  --host=sherpa-mongodb:27017 --authenticationDatabase=admin --username=admin --password=<hidden> --eval 'db.adminCommand("listDatabases").databases.sort(function(l, r) {return r.sizeOnDisk - l.sizeOnDisk}).forEach(function(d) {print(d.name + " - " + (d.sizeOnDisk/1024/1024) + "M")});' > mongodump-sherpa-2025-08-29-08h56_06.txt
Database list is ready: /data/dumps/mongodump-sherpa-2025-08-29-08h56_06.txt
Done

Store the dump in your localhost:

kubectl cp $(kubectl get pods --field-selector=status.phase=Running|grep sherpa-mongodb|awk '{print $1}'):/data/dumps/mongodump-sherpa-2025-08-29-08h56_06.gz mongodump-sherpa-2025-08-29-08h56_06.gz

2. Adapt custom-values.yaml file

Our helm chart is customizable. Specific parameters have been described in a file called custom-values.yaml, delivered in the archive. In case there are modifications to be applied, this file should be adapted.

Storage classes

The Helm chart defines three storage class aliases: local, remote, and shared.

  • local: Storage class intended for MongoDB and Elasticsearch, which require high-performance storage (ideally node-local).
  • remote: Storage class intended for workloads that do not require high performance (e.g., volumes for machine learning models used by suggesters, configuration file volumes, etc.).
  • shared: Storage class intended for resources shared across multiple pods. This storage class must support ReadWriteMany (RWX) access mode.

After selecting your storage classes, you will need to update the following lines in the custom-values.yaml file (which currently contain our test values):

storageClass:
  local: local-path
  remote: longhorn

In case you do not have an RWX-enabled storage class, the Helm chart provides an option to replace volumes using the storage class alias shared with emptyDir volumes:

storage:
  rwx:
    available: false

CPU versus GPU suggester engines

In case you have suggesters in your pipeline that run on GPU, in your environment, you will need to set the gpu parameter of the inference suggester to true in the custom-values.yaml file (our test value is set to false):

suggester:
  flair:
#    training:
#      image:
#        gpu: false
    inference:
      image:
        gpu: true

Expose a NodePort

If you want to expose a NodePort for the Sherpa server, you can add the following option:

sherpa:
  svc:
    type: NodePort
    nodePort: 30010 # example

ℹ️ Note:

You can do the same for all other services (see the various svc keys in config.yaml).

Passwords

Random passwords will be generated for MongoDB and Elasticsearch.
The Sherpa admin password will default to secret, unless you add a key in a custom-secrets.yaml file:

SHERPA_ADMIN_PASSWORD: "your password"

3. Install naja version

Install via Helm chart

To deploy the naja version, first extract the provided .tgz package:

mkdir sherpa
cd sherpa/
tar xfz ../sherpa.naja.tgz

Then run the installation with:

helm install kairntech helm -f custom-values.yaml -f custom-secrets.yaml

Check that all is running

kubectl get all

ℹ️ Note:

Note that the system is not yet operational: the pods sherpa-entityfishing, sherpa-fasttext-test-suggester, sherpa-fasttext-train-suggester, sherpa-flair-test-suggester, and sherpa-flair-train-suggester remain in Init status because their init containers are waiting for resources.

The rsync server that will manage resources is exposed on NodePort 31873.

Init-containers to fetch resources

The init-containers should be waiting for files availability; you can check with the following commands:

FastText:

kubectl logs $(kubectl get pods -o=name|grep sherpa-fasttext-train-suggester) -c fasttext-suggester-resources-init-job

Flair:

kubectl logs $(kubectl get pods -o=name|grep sherpa-flair-train-suggester) -c flair-suggester-resources-init-job

Entity-Fishing:

kubectl logs $(kubectl get pods -o=name|grep sherpa-entityfishing|grep -v suggester) -c entity-fishing-resources-init-job

Fetching online resources

This step must be performed on a machine with internet access, and a Docker installation.

Extract the tgz archive

mkdir sherpa
cd sherpa/
tar xfz ../sherpa.naja.tgz

Download resources

Log in to Docker with your Kairntech credentials, then run:

cd kubernetes/init/
./download-resources.sh

Wait until the script finishes executing.
This script retrieves data stored in a Kairntech Amazon S3 bucket.

Import resources into Sherpa

Run the following commands:

cd resources-mirror
./upload-to-mirror.sh

Completion of the script and resource initialization

The script ends by copying a _ready file, signaling to the init containers that the resources are ready:

The resource init containers can now start downloading data from the mirror and extracting it into the volumes. You can monitor the operations with:

FastText:

kubectl logs $(kubectl get pods -o=name | grep sherpa-fasttext-train-suggester) -c fasttext-suggester-resources-init-job

Flair:

kubectl logs $(kubectl get pods -o=name | grep sherpa-flair-train-suggester) -c flair-suggester-resources-init-job

Entity-Fishing

kubectl logs $(kubectl get pods -o=name | grep sherpa-entityfishing | grep -v suggester) -c entity-fishing-resources-init-job

Finalizing the installation

Check that the init containers have completed successfully: all pods should now be in Running status:

kubectl get pods

Sherpa is now ready to operate.

4. Restoring a MongoDB dump with the latest projects – optionnal

This step can be replaced by importing multiple SPA project archives.

Copy the MongoDB dump into the MongoDB pod

kubectl cp mongodump-sherpa-2025-08-29-08h56_06.gz \
$(kubectl get pods --field-selector=status.phase=Running | grep sherpa-mongodb | awk '{print $1}'):/data/dumps

Navigate to the MongoDB backup/restore directory

Return to the archive extraction directory and go to kubernetes/mongodb-backup-restore:

cd kubernetes/mongodb-backup-restore/

Apply the restore job

kubectl apply -f mongodb-restore-projects-only-job.yaml

Wait for the restore job to complete

kubectl logs -f job.batch/mongodb-restore-projects-only-job

Restart sherpa-core

kubectl rollout restart deployment.apps/$(kubectl get deployments | grep sherpa-core | awk '{print $1}')

Monitor sherpa-core logs

Watch the logs until all projects are deployed and indexed:

kubectl logs -f deployment.apps/$(kubectl get deployments | grep sherpa-core | awk '{print $1}') | grep "Projects deployment complete"