Kairntech API Server

The API Server runs AI models and pipelines that are integrated within applications via a REST API. Feedback loops can be implemented to update AI models and enrich knowledge bases.

See also: Kairntech Studio, Kairntech Professional, Kairntech Consulting

Deployment options: Hosted(1) / On-premise

Supported languages: English, French, German, Spanish, Italian, Dutch, Portuguese, Russian, Arabic, Chinese, Hindi, Urdu, Japanese, Persian

For additional pipelines, non-production instances (development & integration, pre-production…) or other specific requests, please contact us.

1) Hosted deployment: Hosting & monitoring fees will be added (GPU as an option)
2) Core engines, core components & models: see below
3) An NLP pipeline: In general, a pipeline corresponds to a use case in a given language. A pipeline may include document conversion, document segmentation (chunking), AI models (custom, ready-to-use, LLM…), technical components and output formatter.

Key Features & NLP Tasks

Supported languages
vs
NLP tasks
Western languages
Non-Western languages
Core features (full text search, vector search, hybrid search)YesYes
Language identificationYesYes
Token classification (date, amount, address, phrase…)YesYes
Named Entity Recognition (person, location, organization, disease…)YesYes
Sentence classificationYesYes
Text classificationYesYes
Entity Linking (Wikidata/Wikipedia)English, French, German, Spanish, Italian, Portuguese, Swedish.
+ language on demand
Arabic, Japanese, Russian, Ukrainian, Chinese, Bengali, Hindi, Persian.
+ language on demand
Entity Linking (lexicon, business vocabulary)YesYes
Semantic textual similarityYesPartially
Question answering – RAGYes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Text summarization (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Paraphrase generationYesYes
Data augmentationYesYes
Sentiment analysis (polarity, emotion)YesYes
Intent detection & slot fillingcoming soon…coming soon…
Relationship extractioncoming soon…coming soon…
Co-reference resolutioncoming soon…coming soon…
Automatic Speech Recognition (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used
Machine translation (2)Yes
but may depend on the third-party solutions used
Yes
but may depend on the third-party solutions used

Core engines

Text classification enginesScikitLearn: MultinomialNB, ComplementNB, SVC, LinearSVC, LogisticRegression, MLPClassifier, RandomForestClassifier, DecisionTreeClassifier, GradientBoostingClassifier, XGBClassifier, KerasMLPClassifier…
Spacy with Transformer models
Flair with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
FastText
BERTopic
Text clustering enginesBERTopic
Token classification & NER enginesCRF-Suite
Spacy with Transformer models
Delft: BidLSTM-CRF, BidGRU-CRF with ELMO embeddings
Flair: Optimizers (SGD, Adam…), RNN-type (LSTM, GRU) with static embeddings (fasttext…), Flair embeddings…
Transformers: Almost all model types & model names from Hugging Face Hub
Lexicon-based enginesPhraseMatcher
EntityRuler

Core components

Document ConvertersTika (PDF, Office, HTML…)
Whisper (Speech to text)
Deeptranscript(1) (Speech to text)
OCRmypdf (scanned PDF to Text)
Grobid (Scholarly documents)
Inscriptis (HTML to txt)
Pubmed XML (Biomedical abstract)
NewsML-G2 XML (news)
Transformer models (Speech to text)
Custom converter (on demand)
Document Segmenters (chunking)Microsoft Blingfire
Regular expression segmenter
PySBD segmenter
Spacy Rules segmenter
Segmentation pipelines
Custom segmenter (on demand)
Output FormattersJSON
Tabular (CSV, Excel)
Custom formatter (on demand)

Core models & technical components

Off-the-shelf models & technical componentsAcronyms detection
Duckling (Units & Measure detection)
SpacyNER (Entity detection)
Pattern (regex)
Spacy Rules
Annotations consolidation
Pseudonymization
Restore punctuation and true casing
Annotation-based segmentation
Group sentences by chunks
DeepL(1) (Machine Translation)
Transformer models (Q&A, SA, Zero shot classifier…)

Custom model & component (on demand)
Language Models (embeddings)All suitable models from Hugging Face hub (mBERT, CamemBERT, XLM-Roberta…)
Fine-tuned Language Models (on demand)
Large Language Models (LLMs)OpenAI(1): GPT3-5, GPT4
Microsoft Azure(1): GPT3-5, GPT4
DeepInfra(1): Llama2, Mistral 7B, Mixtral 8x7B…
Mixtral 8*7B
Wikidata/Wikipediaentity-fishing (15 languages)
New language on demand

1) API integration, key required