Site Icon

Kairntech Studio

Kairntech Studio is a user-friendly development environment to experiment with and customize GenAI language assistants.

See also: Product overview, Kairntech Server, Kairntech Consulting, Kairntech Software Pricing.

The Kairntech Studio is an application to build NLP-driven Machine Learning pipelines in a low-code & easy-to-use web environment.

Kairntech Studio incorporates hundreds of technical components (see details below) that are continously enriched and kept up to date. This allows you to concentrate on creating business impact from documents. A lot of attention has been given to the ease-of-user, making this application accessible to domain experts.

Kairntech Studio allows to label data, create datasets, train AI models, embed knowledge and finaly design hybrid AI pipelines with maximum flexibility.

Scenarios include question-answering (RAG), Named Entity Recognition, Text classification, Event detection, Relation extraction…

For more details, see Kairntech Documentation


Key Features & NLP Tasks

Supported languages
vs
NLP tasks
Western languagesNon-Western languages
Core features (GUI, search, manual annotation…)YesYes
Language identificationYesYes
Token classification (date, amount, address, phrase…)YesYes
Named Entity Recognition (person, location, organization, disease…)YesYes
Sentence classificationYesYes
Text classificationYesYes
Entity Linking (Wikidata/Wikipedia)English, French, German, Spanish, Italian, Portuguese, Swedish.
+ language on demand
Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian.
+ language on demand
Entity Linking (lexicon, business vocabulary)YesYes
Semantic textual similarityYesPartially
Question answering – RAGYes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Text summarization(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Paraphrase generationYesYes
Data augmentationYesYes
Sentiment analysis (polarity, emotion)YesYes
Intent detection & slot fillingcoming soon…coming soon…
Relationship extractioncoming soon…coming soon…
Co-reference resolutioncoming soon…coming soon…
Automatic Speech Recognition(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Machine translation(2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used

Core engines

Text classification enginesScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier…
Spacy with Transformer models
Flair with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
FastText
BERTopic
Text clustering enginesBERTopic
Token classification & NER enginesCRF-Suite
Spacy with Transformer models
Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings
Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
Lexicon-based enginesPhraseMatcher
EntityRuler

Core components

Document ConvertersTika (PDF, Office, HTML…)
Whisper (Speech to text)
Deeptranscript(2) (Speech to text)
OCRmypdf (scanned PDF to Text)
Grobid (Scholarly documents)
Inscriptis (HTML to txt)
Pubmed XML (Biomedical abstract)
NewsML-G2 XML (news)
Transformer models (Speech to text)
Custom converter (on demand)
Document Segmenters (chunking)Microsoft Blingfire
Regular expression segmenter
PySBD segmenter
Spacy Rules segmenter
Segmentation pipelines
Custom segmenter (on demand)
Output FormattersJSON
Tabular (CSV, Excel)
Custom formatter (on demand)

Core models & technical components

Off-the-shelf models & technical componentsAcronyms detection
Duckling (Units & Measure detection)
SpacyNER (Entity detection)
Pattern (regex)
Spacy Rules
Annotations consolidation
Pseudonymization
Restore punctuation and true casing
Annotation-based segmentation
Group sentences by chunks
DeepL(2) (Machine Translation)
Transformer models (Q&A, SA, Zero shot classifier…)

Custom model & component (on demand)
Language Models (embeddings)All suitable models from Hugging Face hub (AllMiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2, mBERT, CamemBERT, XLM-Roberta…)
OpenAI(2) embeddings
Fine-tuned Language Models (on demand)
Large Language Models (LLMs)OpenAI(2): GPT-3.5, GPT-4…
Microsoft Azure(2): GPT-3.5, GPT-4…
DeepInfra(2): Llama3, Mixtral-8x22B, DBRX, Dolphin-2.6, Zephir…
Wikidata/Wikipediaentity-fishing (15 languages)
New language on demand

2) API integration, key required