Automated Text Labeling

Labeling and organising content to enhance knowledge can be time-consuming, tedious and full of errors. Automated text labeling makes this job easier by quickly sorting and labelling content. This saves time and avoids errors, enabling information providers to get more value from their content with less effort.

Can Kairntech’s AI make life easier for information providers?

automated-text-labeling-entity-linking
  • Create your dataset, train your models
  • Disambiguate, normalize, link entities to Knowledge Base
  • Contextualize with Wikidata and business vocabularies
  • Supports any classification plan including IPTC Media Topic
  • Unique semantic fingerprint technology
  • A single model for multiple languages
automated-text-labeling-iptc-classification
train-ai-models-quality-assessment
  • Deploy on premise and embed with a rich REST API
  • Seamlessly scales to millions of documents
  • Human-in-the-loop continuous improvement
1

For entity recognition, upload a document corpus, pre-annotate with 0- or few-shot learning, validate manually, enrich by using active learning.

2

For document classification, upload a document corpus, pre-annotate with manually, enrich by using active learning.


3

The AI learns from the annotations. Train models, combine with technical components into AI pipelines. Deploy, embed and industrialize at scale.

All our data storage systems take into account the constraints of the GDPR.

Manage fine-grained access rights to facilitate access to multiple stakeholders.

In the cloud or on-premise, choose the mode that best suits your organization.

Want to learn more ?

Yes, we can integrate business vocabularies in the solution with a tight integration with World Knowledge like Wikidata that we keep up-to-date. We can also suggest you new candidates (that you will have to validate) to continuoulsy enrich business vocabularies.

We implement pipeline that combines AI models, Entity Linking & Disambiguation leveraging advanced algorithms that allows us to reach the state of the art in this domain.

We use various metrics like fmeasure (precision, recall), ROC-AUC… through interactive dashboards. Confusion matrix are still missing but we’re on it!

Once the system is in production, we collect feedback from users to (i) enrich the commercial vocabulary and synchronise the automatic text labelling system with new entities in real time and (ii) add new texts to the training dataset to re-train the model and the update the pipeline. These operations are carried out in Kairntech Studio.

Yes, we can run the automated text labeling solution on prem. We do support Single Sign-On. The documents are not stored in the database, only the training dataset & business vocabulary.