Kairntech Server
Kairntech Server is a powerful and scalable production server to industrialize GenAI language assistants.
See also: Product overview, Kairntech Studio, Kairntech Consulting, Kairntech Software Pricing.
The Kairntech Server is a back-end server to run NLP-driven Machine Learning pipelines in production that reliably process extensive volumes of documents.
The server exposes a rich REST API for integration into business applications. The Rest API is also used to implement feedback loops for continous improvement processes with human-in-the-loop.
Kairntech Server can be deployed on a single machine or within a distributed environment. Deployments can be carried out as a hosted solution or on premise.
To find out more, see Kairntech Documentation, Technical requirements and Installation guide.
Key Features & NLP Tasks
Supported languages vs NLP tasks | Western languages | Non-Western languages |
Core features (GUI, search, manual annotation…) | Yes | Yes |
Language identification | Yes | Yes |
Token classification (date, amount, address, phrase…) | Yes | Yes |
Named Entity Recognition (person, location, organization, disease…) | Yes | Yes |
Sentence classification | Yes | Yes |
Text classification | Yes | Yes |
Entity Linking (Wikidata/Wikipedia) | English, French, German, Spanish, Italian, Portuguese, Swedish. + language on demand | Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian. + language on demand |
Entity Linking (lexicon, business vocabulary) | Yes | Yes |
Semantic textual similarity | Yes | Partially |
Question answering – RAG | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Text summarization(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Paraphrase generation | Yes | Yes |
Data augmentation | Yes | Yes |
Sentiment analysis (polarity, emotion) | Yes | Yes |
Intent detection & slot filling | coming soon… | coming soon… |
Relationship extraction | coming soon… | coming soon… |
Co-reference resolution | coming soon… | coming soon… |
Automatic Speech Recognition(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Machine translation(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Core engines
Text classification engines | ScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier… Spacy with Transformer models Flair with static embeddings (fasttext…), Flair embeddings… Transformers: Almost all model types & model names from Hugging Face Hub FastText BERTopic |
Text clustering engines | BERTopic |
Token classification & NER engines | CRF-Suite Spacy with Transformer models Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings… Transformers: Almost all model types & model names from Hugging Face Hub |
Lexicon-based engines | PhraseMatcher EntityRuler |
Core components
Document Converters | Tika (PDF, Office, HTML…) LLMs, Whisper (Speech to text) LLMs (Image to Text) OCRmypdf (scanned PDF to Text) Grobid (Scholarly documents) Inscriptis (HTML to txt) Pubmed XML (Biomedical abstract) NewsML-G2 XML (news) Custom converter (on demand) |
Document Segmenters (chunking) | Microsoft Blingfire Regular expression segmenter PySBD segmenter Spacy Rules segmenter Segmentation pipelines Custom segmenter (on demand) |
Output Formatters | JSON Tabular (CSV, Excel) Custom formatter (on demand) |
Core models & technical components
Off-the-shelf models & technical components | Acronyms detection Duckling (Units & Measure detection) SpacyNER (Entity detection) Pattern (regex) Spacy Rules Annotations reconciliation Pseudonymization Text generation using LLMs Data augmentation using LLMs Wikidata Semantic fingerprints DeepL(2) (Machine Translation) … Custom model & component (on demand) |
Language Models (embeddings) | All suitable models from Hugging Face hub (AllMiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2, mBERT, CamemBERT, XLM-Roberta…) OpenAI(2) embeddings Fine-tuned Language Models (on demand) |
Large Language Models (LLMs) | OpenAI(2): GPT-4o… Microsoft Azure(2): GPT-4o… DeepInfra(2): Meta Llama-3, Mistral Nemo, Qwen, DBRX, NVIDIA… On premise LLMs: Llama-3, Mistral Nemo, Qwen… |
Wikidata/Wikipedia | entity-fishing (15 languages) New language on demand |
2) API integration, key required