How to configure an acronyms extraction annotator?

A prepackaged Acronyms extractor component is available to extract Acronyms from a piece of text.

Acronyms carry a lot of meaning in many documents in technical, scientific and many other subdomains: Technical language often uses many newly coined acronyms to express a specific concept. An acronym extraction annotator detects and extracts acronyms in their long and short form. The results can be important input for the creation and updating of domain specific vocabularies.

For a piece of text that mentions a long and short form of a technical term, the annotator returns a result that retains this relation: “… often new technologies such as airborne laser swath mapping (ALSM) can be used to …” returns a json data structure:

{
identifier:"ALSM#airborne laser swath mapping"
lexicon:"acronyms"
preferredForm:"airborne laser swath mapping"
}

In order to add a Acronym extraction annotator to your project, you can proceed as described below:

  • Go to the Processing view
  • Create a new pipeline.
  • Create a processing component and select Off-the-shelf annotator in the drop-down list
  • Select “Acronym detection” component