How to create a custom segmentation pipeline?

If you are not fully satisfied with the default segmentation or with off-the-shelf segmenters, you can create your own segmentation pipeline in combining for instance an AI model with off-the-shelf components.

  • Go to the Processing view
  • Create a new segmenter
  • Give a name to your segmenter
  • Select “Advanced segmenter”
  • Add a first component to the pipeline
  • Select an existing model that you have already built in your project. For instance a CRF model that detects document boundaries (for instance: “Article 3.1”, “Article 3.2″….)
  • Then, add a new component as “off-the-shelf component”
  • Select the Annotation-based segmentation that will segment document at each annotation extracted by the CRF model
  • Save and activate the segmentation pipeline on your project. All existing annotations will be kept.
  • In the Processing view, the yellow star shows the default segmenter used in the project. This segmenter will be used as soon as you upload new documents in the project.
  • Check the new segmentation in the Segments view
  • If you want to use or experiment an off-the-shelf segmenter