Kairntech Studio
Kairntech Studio is a user-friendly development environment to experiment with and customize GenAI language assistants.
See also: Product overview, Kairntech Server, Kairntech Consulting, Kairntech Software Pricing.
The Kairntech Studio is an application to build NLP-driven Machine Learning pipelines in a low-code & easy-to-use web environment.
Kairntech Studio incorporates hundreds of technical components (see details below) that are continously enriched and kept up to date. This allows you to concentrate on creating business impact from documents. A lot of attention has been given to the ease-of-user, making this application accessible to domain experts.
Kairntech Studio allows to label data, create datasets, train AI models, embed knowledge and finaly design hybrid AI pipelines with maximum flexibility.
Scenarios include question-answering (RAG), Named Entity Recognition, Text classification, Event detection, Relation extraction…
For more details, see Kairntech Documentation
Key Features & NLP Tasks
Supported languages vs NLP tasks | Western languages | Non-Western languages |
Core features (GUI, search, manual annotation…) | Yes | Yes |
Language identification | Yes | Yes |
Token classification (date, amount, address, phrase…) | Yes | Yes |
Named Entity Recognition (person, location, organization, disease…) | Yes | Yes |
Sentence classification | Yes | Yes |
Text classification | Yes | Yes |
Entity Linking (Wikidata/Wikipedia) | English, French, German, Spanish, Italian, Portuguese, Swedish. + language on demand | Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian. + language on demand |
Entity Linking (lexicon, business vocabulary) | Yes | Yes |
Semantic textual similarity | Yes | Partially |
Question answering – RAG | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Text summarization(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Paraphrase generation | Yes | Yes |
Data augmentation | Yes | Yes |
Sentiment analysis (polarity, emotion) | Yes | Yes |
Intent detection & slot filling | coming soon… | coming soon… |
Relationship extraction | coming soon… | coming soon… |
Co-reference resolution | coming soon… | coming soon… |
Automatic Speech Recognition(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Machine translation(2) | Yes but may depend on the third-party solutions used | Yes but may depend on the third-party solutions used |
Core engines
Text classification engines | ScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier… Spacy with Transformer models Flair with static embeddings (fasttext…), Flair embeddings… Transformers: Almost all model types & model names from Hugging Face Hub FastText BERTopic |
Text clustering engines | BERTopic |
Token classification & NER engines | CRF-Suite Spacy with Transformer models Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings… Transformers: Almost all model types & model names from Hugging Face Hub |
Lexicon-based engines | PhraseMatcher EntityRuler |
Core components
Document Converters | Tika (PDF, Office, HTML…) Whisper (Speech to text) Deeptranscript(2) (Speech to text) OCRmypdf (scanned PDF to Text) Grobid (Scholarly documents) Inscriptis (HTML to txt) Pubmed XML (Biomedical abstract) NewsML-G2 XML (news) Transformer models (Speech to text) Custom converter (on demand) |
Document Segmenters (chunking) | Microsoft Blingfire Regular expression segmenter PySBD segmenter Spacy Rules segmenter Segmentation pipelines Custom segmenter (on demand) |
Output Formatters | JSON Tabular (CSV, Excel) Custom formatter (on demand) |
Core models & technical components
Off-the-shelf models & technical components | Acronyms detection Duckling (Units & Measure detection) SpacyNER (Entity detection) Pattern (regex) Spacy Rules Annotations consolidation Pseudonymization Restore punctuation and true casing Annotation-based segmentation Group sentences by chunks DeepL(2) (Machine Translation) Transformer models (Q&A, SA, Zero shot classifier…) … Custom model & component (on demand) |
Language Models (embeddings) | All suitable models from Hugging Face hub (AllMiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2, mBERT, CamemBERT, XLM-Roberta…) OpenAI(2) embeddings Fine-tuned Language Models (on demand) |
Large Language Models (LLMs) | OpenAI(2): GPT-3.5, GPT-4… Microsoft Azure(2): GPT-3.5, GPT-4… DeepInfra(2): Llama3, Mixtral-8x22B, DBRX, Dolphin-2.6, Zephir… |
Wikidata/Wikipedia | entity-fishing (15 languages) New language on demand |
2) API integration, key required