Site Icon

Kairntech Server

Kairntech Server is a powerful and scalable production server to industrialize GenAI language assistants.

See also: Product overview, Kairntech Studio, Kairntech Consulting, Kairntech Software Pricing.

The Kairntech Server is a back-end server to run NLP-driven Machine Learning pipelines in production that reliably process extensive volumes of documents.

The server exposes a rich REST API for integration into business applications. The Rest API is also used to implement feedback loops for continous improvement processes with human-in-the-loop.

Kairntech Server can be deployed on a single machine or within a distributed environment. Deployments can be carried out as a hosted solution or on premise.

To find out more, see Kairntech Documentation, Technical requirements and Installation guide.


Key Features & NLP Tasks

Supported languages
vs
NLP tasks
Western languagesNon-Western languages
Core features (GUI, search, manual annotation…)YesYes
Language identificationYesYes
Token classification (date, amount, address, phrase…)YesYes
Named Entity Recognition (person, location, organization, disease…)YesYes
Sentence classificationYesYes
Text classificationYesYes
Entity Linking (Wikidata/Wikipedia)English, French, German, Spanish, Italian, Portuguese, Swedish.
+ language on demand
Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian.
+ language on demand
Entity Linking (lexicon, business vocabulary)YesYes
Semantic textual similarityYesPartially
Question answering – RAGYes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Text summarization(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Paraphrase generationYesYes
Data augmentationYesYes
Sentiment analysis (polarity, emotion)YesYes
Intent detection & slot fillingcoming soon…coming soon…
Relationship extractioncoming soon…coming soon…
Co-reference resolutioncoming soon…coming soon…
Automatic Speech Recognition(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Machine translation(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used

Core engines

Text classification enginesScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier…
Spacy with Transformer models
Flair with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
FastText
BERTopic
Text clustering enginesBERTopic
Token classification & NER enginesCRF-Suite
Spacy with Transformer models
Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings
Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
Lexicon-based enginesPhraseMatcher
EntityRuler

Core components

Document ConvertersTika (PDF, Office, HTML…)
LLMs, Whisper (Speech to text)
LLMs (Image to Text)
OCRmypdf (scanned PDF to Text)
Grobid (Scholarly documents)
Inscriptis (HTML to txt)
Pubmed XML (Biomedical abstract)
NewsML-G2 XML (news)
Custom converter (on demand)
Document Segmenters (chunking)Microsoft Blingfire
Regular expression segmenter
PySBD segmenter
Spacy Rules segmenter
Segmentation pipelines
Custom segmenter (on demand)
Output FormattersJSON
Tabular (CSV, Excel)
Custom formatter (on demand)

Core models & technical components

Off-the-shelf models & technical componentsAcronyms detection
Duckling (Units & Measure detection)
SpacyNER (Entity detection)
Pattern (regex)
Spacy Rules
Annotations reconciliation
Pseudonymization
Text generation using LLMs
Data augmentation using LLMs
Wikidata Semantic fingerprints
DeepL(2) (Machine Translation)

Custom model & component (on demand)
Language Models (embeddings)All suitable models from Hugging Face hub (AllMiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2, mBERT, CamemBERT, XLM-Roberta…)
OpenAI(2) embeddings
Fine-tuned Language Models (on demand)
Large Language Models (LLMs)OpenAI(2): GPT-4o…
Microsoft Azure(2): GPT-4o…
DeepInfra(2): Meta Llama-3, Mistral Nemo, Qwen, DBRX, NVIDIA…
On premise LLMs: Llama-3, Mistral Nemo, Qwen…
Wikidata/Wikipediaentity-fishing (15 languages)
New language on demand

2) API integration, key required