How to upload your documents? - Kairntech Documentation

You can import your documents at project creation in the following formats: TXT, HTML, Office, PDF, JSON, ZIP. If you have metadata associated to your documents or if you want to upload a dataset, use Kairntech JSON format.

For any type of project, you can also import documents after project creation in the main menu.

How to upload audio files?

How to upload scanned PDF?

How to upload scientific articles in PDF?

How to upload XML files?

Question answering project

Corpus size don’t really matter. Few documents (say 5) are enough to give a first try (bear in mind you’ll have to find question to ask!)
Open “Show advanced settings” if you want to ignore possible annotations to populate the filters or use existing segmentation (otherwise a default segmentation will be applied).

Text classification project

Corpus size matters:
- 100 documents minimum to give a try!
- It does not make much sense to go beyond 10000 documents if you want to create a dataset.

Note:
- If you have already classified your documents into folders, do a zip of all the documents and import it! The categories will be created automatically from your folder names.
- Your corpus can be monolingual or multilingual.

Entity detection project

Corpus size matters:
- 50 documents minimum to give a try!
- It does not make much sense to go beyond 1000 documents if you want to create a dataset.

Note:
- Beyond 5-10 sentences in document, document segmentation is mandatory.
- A default segmentation will be applied (sentence-based)
- But you may need to build a custom segmenter