The following lines will guide you trough the migration from the version manta to the version naja.
These notes are based on a kubernetes deployment.
ℹ️ Note:
The new
najaversion comes with Elastic Search autentication (in addition to MongoDB authentication).
1. Create MongoDB dump (migration 4.0.6 → 7.0.14) – optionnal
In order to create a MongoDB dump, a cron job will have to be created.
Enter the folder kubernetes/mongodb-backup-restore/ and pay attention to file manifest mongodb-backup-job.yaml.
Specifications may have to be adapted to match the deployed release name:
spec:
containers:
- env:
- name: COMMAND
value: dump
- name: MAX_DUMPS
value: "30"
- name: MONGODB_HOST
value: sherpa-mongodb:27017
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
key: MONGODB_PASSWORD
name: mongodb-credentials
- name: MONGODB_USERNAME
valueFrom:
secretKeyRef:
key: MONGODB_USERNAME
name: mongodb-credentials
image: kairntech/mongodb-backup:naja-preview-3
imagePullPolicy: Always
name: mongodb-backup-job
volumeMounts:
- mountPath: /data
name: mongodb-dumps
imagePullSecrets:
- name: kairntech-docker-credentials
restartPolicy: Never
volumes:
- name: mongodb-dumps
persistentVolumeClaim:
claimName: mongodb-dumps
Apply the restoration job:
kubectl apply -f mongodb-backup-job.yaml
Check that the dump is being created:
kubectl logs -f job.batch/mongodb-backup-job
Wait for the end of the dump creation (note the MongoDB dump name):
kubectl logs -f job.batch/mongodb-backup-job
Running: mongodump --host=sherpa-mongodb:27017 --authenticationDatabase=admin --username=admin --password=<hidden> --gzip --archive=mongodump-sherpa-2025-08-29-08h56_06.gz
2025-08-29T08:56:07.050+0200 writing admin.system.users to archive 'mongodump-sherpa-2025-08-29-08h56_06.gz'
2025-08-29T08:56:07.055+0200 done dumping admin.system.users (1 document)
...
Dump is ready: /data/dumps/mongodump-sherpa-2025-08-29-08h56_06.gz
Running: mongosh --host=sherpa-mongodb:27017 --authenticationDatabase=admin --username=admin --password=<hidden> --eval 'db.adminCommand("listDatabases").databases.sort(function(l, r) {return r.sizeOnDisk - l.sizeOnDisk}).forEach(function(d) {print(d.name + " - " + (d.sizeOnDisk/1024/1024) + "M")});' > mongodump-sherpa-2025-08-29-08h56_06.txt
Database list is ready: /data/dumps/mongodump-sherpa-2025-08-29-08h56_06.txt
Done
Store the dump in your localhost:
kubectl cp $(kubectl get pods --field-selector=status.phase=Running|grep sherpa-mongodb|awk '{print $1}'):/data/dumps/mongodump-sherpa-2025-08-29-08h56_06.gz mongodump-sherpa-2025-08-29-08h56_06.gz
2. Adapt custom-values.yaml file
Our helm chart is customizable. Specific parameters have been described in a file called custom-values.yaml, delivered in the archive. In case there are modifications to be applied, this file should be adapted.
Storage classes
The Helm chart defines three storage class aliases: local, remote, and shared.
local: Storage class intended for MongoDB and Elasticsearch, which require high-performance storage (ideally node-local).remote: Storage class intended for workloads that do not require high performance (e.g., volumes for machine learning models used by suggesters, configuration file volumes, etc.).shared: Storage class intended for resources shared across multiple pods. This storage class must support ReadWriteMany (RWX) access mode.
After selecting your storage classes, you will need to update the following lines in the custom-values.yaml file (which currently contain our test values):
storageClass:
local: local-path
remote: longhorn
In case you do not have an RWX-enabled storage class, the Helm chart provides an option to replace volumes using the storage class alias shared with emptyDir volumes:
storage:
rwx:
available: false
CPU versus GPU suggester engines
In case you have suggesters in your pipeline that run on GPU, in your environment, you will need to set the gpu parameter of the inference suggester to true in the custom-values.yaml file (our test value is set to false):
suggester:
flair:
# training:
# image:
# gpu: false
inference:
image:
gpu: true
Expose a NodePort
If you want to expose a NodePort for the Sherpa server, you can add the following option:
sherpa:
svc:
type: NodePort
nodePort: 30010 # example
ℹ️ Note:
You can do the same for all other services (see the various
svckeys inconfig.yaml).
Passwords
Random passwords will be generated for MongoDB and Elasticsearch.
The Sherpa admin password will default to secret, unless you add a key in a custom-secrets.yaml file:
SHERPA_ADMIN_PASSWORD: "your password"
3. Install naja version
Install via Helm chart
To deploy the naja version, first extract the provided .tgz package:
mkdir sherpa
cd sherpa/
tar xfz ../sherpa.naja.tgz
Then run the installation with:
helm install kairntech helm -f custom-values.yaml -f custom-secrets.yaml
Check that all is running
kubectl get all
ℹ️ Note:
Note that the system is not yet operational: the pods
sherpa-entityfishing,sherpa-fasttext-test-suggester,sherpa-fasttext-train-suggester,sherpa-flair-test-suggester, andsherpa-flair-train-suggesterremain in Init status because their init containers are waiting for resources.The rsync server that will manage resources is exposed on NodePort 31873.
Init-containers to fetch resources
The init-containers should be waiting for files availability; you can check with the following commands:
FastText:
kubectl logs $(kubectl get pods -o=name|grep sherpa-fasttext-train-suggester) -c fasttext-suggester-resources-init-job
Flair:
kubectl logs $(kubectl get pods -o=name|grep sherpa-flair-train-suggester) -c flair-suggester-resources-init-job
Entity-Fishing:
kubectl logs $(kubectl get pods -o=name|grep sherpa-entityfishing|grep -v suggester) -c entity-fishing-resources-init-job
Fetching online resources
This step must be performed on a machine with internet access, and a Docker installation.
Extract the tgz archive
mkdir sherpa
cd sherpa/
tar xfz ../sherpa.naja.tgz
Download resources
Log in to Docker with your Kairntech credentials, then run:
cd kubernetes/init/
./download-resources.sh
Wait until the script finishes executing.
This script retrieves data stored in a Kairntech Amazon S3 bucket.
Import resources into Sherpa
Run the following commands:
cd resources-mirror
./upload-to-mirror.sh
Completion of the script and resource initialization
The script ends by copying a _ready file, signaling to the init containers that the resources are ready:
The resource init containers can now start downloading data from the mirror and extracting it into the volumes. You can monitor the operations with:
FastText:
kubectl logs $(kubectl get pods -o=name | grep sherpa-fasttext-train-suggester) -c fasttext-suggester-resources-init-job
Flair:
kubectl logs $(kubectl get pods -o=name | grep sherpa-flair-train-suggester) -c flair-suggester-resources-init-job
Entity-Fishing
kubectl logs $(kubectl get pods -o=name | grep sherpa-entityfishing | grep -v suggester) -c entity-fishing-resources-init-job
Finalizing the installation
Check that the init containers have completed successfully: all pods should now be in Running status:
kubectl get pods
Sherpa is now ready to operate.
4. Restoring a MongoDB dump with the latest projects – optionnal
This step can be replaced by importing multiple SPA project archives.
Copy the MongoDB dump into the MongoDB pod
kubectl cp mongodump-sherpa-2025-08-29-08h56_06.gz \
$(kubectl get pods --field-selector=status.phase=Running | grep sherpa-mongodb | awk '{print $1}'):/data/dumps
Navigate to the MongoDB backup/restore directory
Return to the archive extraction directory and go to kubernetes/mongodb-backup-restore:
cd kubernetes/mongodb-backup-restore/
Apply the restore job
kubectl apply -f mongodb-restore-projects-only-job.yaml
Wait for the restore job to complete
kubectl logs -f job.batch/mongodb-restore-projects-only-job
Restart sherpa-core
kubectl rollout restart deployment.apps/$(kubectl get deployments | grep sherpa-core | awk '{print $1}')
Monitor sherpa-core logs
Watch the logs until all projects are deployed and indexed:
kubectl logs -f deployment.apps/$(kubectl get deployments | grep sherpa-core | awk '{print $1}') | grep "Projects deployment complete"