Kairntech Studio

Kairntech Studio is a low-code and easy-to-use web environment that helps domain experts, data analysts and data scientists to leverage off-the-shelf AI models & LLMs, create high quality dataset, train custom AI models and build NLP pipelines

Deployment options: SaaS / Hosted(1) / On-premise

Supported languages: English, French, German, Spanish, Italian, Dutch, Portuguese, Russian, Arabic, Chinese, Hindi, Urdu, Japanese, Persian

1) Hosted deployment: Hosting & monitoring fees will be added (GPU in option)
2)  Off-the-shelf AI models & LLMs: Hugging Face models, Wikidata-based entity disambiguation model, BERTopic, DeepL, GPT3.5, GPT4, Llama2 70B, Mistral 7B, Mixtral 8*7B and much more!

Key Features & NLP Tasks

Supported languages
vs
NLP tasks
Western languages
Non-Western languages
Core features (GUI, search, manual annotation…)YesYes
Language identificationYesYes
Token classification (date, amount, address, phrase…)YesYes
Named Entity Recognition (person, location, organization, disease…)YesYes
Sentence classificationYesYes
Text classificationYesYes
Entity Linking (Wikidata/Wikipedia)English, French, German, Spanish, Italian, Portuguese, Swedish.
+ language on demand
Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian.
+ language on demand
Entity Linking (lexicon, business vocabulary)YesYes
Semantic textual similarityYesPartially
Question answering – RAGYes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Text summarization (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Paraphrase generationYesYes
Data augmentationYesYes
Sentiment analysis (polarity, emotion)YesYes
Intent detection & slot fillingcoming soon…coming soon…
Relationship extractioncoming soon…coming soon…
Co-reference resolutioncoming soon…coming soon…
Automatic Speech Recognition (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Machine translation (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used

Core engines

Text classification enginesScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier…
Spacy with Transformer models
Flair with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
FastText
BERTopic
Text clustering enginesBERTopic
Token classification & NER enginesCRF-Suite
Spacy with Transformer models
Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings
Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
Lexicon-based enginesPhraseMatcher
EntityRuler

Core components

Document ConvertersTika (PDF, Office, HTML…)
Whisper (Speech to text)
Deeptranscript(1) (Speech to text)
OCRmypdf (scanned PDF to Text)
Grobid (Scholarly documents)
Inscriptis (HTML to txt)
Pubmed XML (Biomedical abstract)
NewsML-G2 XML (news)
Transformer models (Speech to text)
Custom converter (on demand)
Document Segmenters (chunking)Microsoft Blingfire
Regular expression segmenter
PySBD segmenter
Spacy Rules segmenter
Segmentation pipelines
Custom segmenter (on demand)
Output FormattersJSON
Tabular (CSV, Excel)
Custom formatter (on demand)

Core models & technical components

Off-the-shelf models & technical componentsAcronyms detection
Duckling (Units & Measure detection)
SpacyNER (Entity detection)
Pattern (regex)
Spacy Rules
Annotations consolidation
Pseudonymization
Restore punctuation and true casing
Annotation-based segmentation
Group sentences by chunks
DeepL(1) (Machine Translation)
Transformer models (Q&A, SA, Zero shot classifier…)

Custom model & component (on demand)
Language Models (embeddings)All suitable models from Hugging Face hub (mBERT, CamemBERT, XLM-Roberta…)
Fine-tuned Language Models (on demand)
Large Language Models (LLMs)OpenAI(1): GPT3-5, GPT4
Microsoft Azure(1): GPT3-5, GPT4
DeepInfra(1): Llama2, Mistral 7B, Mixtral 8x7B…
Mixtral 8*7B
Wikidata/Wikipediaentity-fishing (15 languages)
New language on demand

1) API integration, key required