You can import your documents at project creation in the following formats: TXT, HTML, Office, PDF, JSON, ZIP. If you have metadata associated to your documents or if you want to upload a dataset, use Kairntech JSON format.
- For any type of project, you can also import documents after project creation in the main menu.
More on Question answering project
More on Text classification project
More on Entity detection project
Question answering project
- Corpus size don’t really matter. Few documents (say 5) are enough to give a first try (bear in mind you’ll have to find question to ask!)
- Open “Show advanced settings” if you want to ignore possible annotations to populate the filters or use existing segmentation (otherwise a default segmentation will be applied).
Text classification project
- Corpus size matters:
- 100 documents minimum to give a try!
- It does not make much sense to go beyond 10000 documents if you want to create a dataset.
- Note:
- If you have already classified your documents into folders, do a zip of all the documents and import it! The categories will be created automatically from your folder names.
- Your corpus can be monolingual or multilingual.
Entity detection project
- Corpus size matters:
- 50 documents minimum to give a try!
- It does not make much sense to go beyond 1000 documents if you want to create a dataset.
- Note:
- Beyond few sentences in document, document segmentation is mandatory.
- A default segmentation will be applied (sentence-based)
- But you may need to build a custom segmenter