How to automatically annotate a set of documents?

User can annotate the documents in their project with off-the-shelf models or custom-built models & pipelines. A lot of time is gained with this build-in option to annotate a set of documents.

This allows you add filters in Question-answering projects, pre annotate raw text when building a dataset, compare manual & automated annotations for dataset optimization…

  • If you want to use an existing model in a project you have access
  • Go to the main project menu
  • Click on Automatically annotate “with a project
  • Select a project
  • Select the model or pipeline (Annotator) you want to use
  • You can annotate “all documents“, “search result list” or “dataset” (dataset is all documents or segments with annotations)
  • Check the box to receive an email when the job is completed because this may take time
  • Once the job is completed, the Documents view is automatically refreshed with the new annotations.

If you want to annotate your set of documents with a predefined annotator

  • Go to the main project menu
  • Clic on Automatically annotate “with a predefined annotator
  • There are a number of pre-configured annotators:
    • All Wikidata concept
    • Media-related Wikidata concept to extract Person, Location & Organization
    • Health-related Wikidata concept” to extract Disease, Symptom and Drug
    • Trankit NER to extract Person, Location, Organization
    • Spacy NER to extract Person, Location, Organization
    • A pipeline combining Spacy NER and Wikidata concept
  • Annotate either “All documents“, “Dataset” or “Search result list
  • Check the box to receive an email when the job is completed
  • In case you want to remove the automatically generated information: