A prepackaged Entity Normalizer component is available to normalize entities from a piece of text.
Normalization is the process of mapping different variants of an entity to a canonical preferred form. For instance a number can be written in digits (“65”) or in letters (“sixty five”). A telephone number can be written with various blanks, dashes, parentheses (“(+49) 89 123 456 789”, “0049 89 123456789” etc) and still it is always “the same” phone number.
Normalization allows to avoid a lot of this variation when extracting entities such as numbers, durations, dates, phone numbers and many more and allows for cleaner extraction results.
Kairntech embeds the “duckling” entity normalization component for this task. In order to use it:
- Go to the Processing view
- Create a new pipeline.
- Create a processing component and select External annotator in the drop-down list
- Select “duckling” component
- Adjust available parameters with the settings icon.
- Save the pipeline, test in the Test Page or annotate automatically your documents by selecting your pipeline.