Introduction

The Kairntech platform goes a long way in order to give the users access to powerful machine learning capabilities wrapped inside an intuitive, easy-to-use GUI. While this is key for allowing non-technical users and domain experts without data science background to use the platform, there is a second approach to working with it: Using the REST API. In this little tutorial we’ll explain how to implement a sample client in python that gives you access to the functionalities of the Kairntech API.

The complete code of the sample client can be found here, and we’ll explain the setup step by step.

Prerequisites

In what follows we assume that you have python 3 installed on your machine and are familiar with using a text editor in order to write code and finally execute a python script on the command-line. We also assume that you have access to the platform which is a proprietary software from Kairntech. We maintain instances of the platform online for testing and demos. If you need access please notify us at info@kairntech.com and we’ll get back to you.

Finally, the client that we are about to implement requires a handful of python modules to be installed. If when launching the client, an error message informs you that this specific module is not present, you can add it easily using the default python procedures. For instance if your python installation does not yet contain the pandas library, the command below allows you to quickly add that:

# pip install pandas 
Collecting pandas 
  Downloading https://files.pythonhosted.org/packages/*******/pandas-0.24.2-cp35-cp35m-manylinux1_x86_64.whl (10.0MB) 
    100% |████████████████████████████████| 10.0MB 928kB/s 
Installing collected packages: pandas 
Successfully installed pandas-0.24.2 
#

A full introduction into the use of the pip python module installer is beyond the scope of this text. See here for details on pip.

The Kairntech API

The platform can be accessed via calls to REST API methods via any client that “speaks” REST. A full documentation of the API can be accessed here. This page contains a full list of the available calls, the required parameters and the format of the results returned by the respective call. All the user interactions that can be performed on the GUI web interface can be executed also via the API, so you have complete access to the full range of methods such as logging in, checking the list of existing projects, creating a new one, uploading documents, launching a training job and collecting the results of an annotation and many more.

Interacting productively with the API is in principle possible just by checking the documentation above, but having a sample client that spells out how things are supposed to work often helps to save some time. So here we go:

In what follows we’ll follow one specific scenario, namely the process of logging on to the platform, checking the list of installed models, then checking the list of available trained models in one of these and finally sending a text (or a directory of texts) to the platform to be annotated with that model and returning the results.

Authentication

In order to interact with the platform you first need to log in. The respective call will return a “Bearer token” that must be submitted with subsequent calls in the same session. The respective code in python is pretty straightforward:

import json
import requests
server = 'https://sherpa.kairntech.com/api'
login_info = json.dumps({"email": 'YOUR-LOGIN',
"password": 'YOUR-PASSWORD'})
headers = {"Accept": "application/json",
"Content-Type": "application/json"}
def get_token(server, login_info):
url = server + "/auth/login"
#print("calling sherpa server '%s' …" % url)
try:
response = requests.post(url,data=login_info, headers=headers)
json_response = json.loads(response.text)
except Exception as ex:
print("Error connecting to Sherpa server %s: %s" % (server, ex))
return
#print("response = %s" % response.text)
if 'access_token' in json_response:
token = json_response['access_token']
return token
else:
return
token = get_token(server,login_info)
print("token = %s" % token)

When executing the code above a “bearer token” will be returned and printed. This shows that accessing the server was successful and we are now ready to use the obtained token to access the API with specific tasks.

Listing projects and models

When working with the platform GUI, the application immediately presents me the list of installed projects, inviting me to select one (or create a new one). When working with the API, however, I may not know what projects are available, so we first request a list of the installed projects. Fortunately there is an API call for that. And the same is true for the list of models inside one project. The code below exemplifies this for the projects:

def get_projects(server,token):
url = server + "/projects"
headers2 = {'Authorization': 'Bearer ' + token}
#print("calling sherpa server '%s' …" % url)
response = requests.get(url,headers=headers2)
json_response = json.loads(response.text)
projects = ", ".join([project['name'] for project in json_response])
print("Available projects on %s: %s" % (server, projects))

Using the bearer token that we received as a result of the previous call, we can ask for a list of the installed projects.

Sending content

After having selected the project we want to use (and the specific model – there can be several models inside one project corresponding to training runs with the different available training algorithms) we are now set to send content to the platform and have it annotated.

def call_sherpa(text,model,annotator,server,token)
url = server+"/projects/"+model+"/annotators/"+annotator+"/_annotate"
results = {}
text = text.encode(encoding='utf-8')
headers = {"Accept": "application/json",
"Content-Type": "text/plain",
"Authorization": "Bearer " + token}
response = requests.post(url,data=text, headers=headers)
if (response.status_code != 200):
print("error from server: %s" % response.status_code)
return
json_response = json.loads(response.text)
documenttext = json_response['text']
annotations = json_response['annotations']
for annotation in annotations:
start = annotation['start']
end = annotation['end']
term = documenttext[start:end]
type = annotation['labelName']
if type in results:
results[type].extend([term])
else:
results[type] = [term]
for key in results.keys():
list_set = set(results[key])
results[key] = list(list_set)
return results

The “call_sherpa” function above calls the platform with the token, the project and the model on a piece of text and then returns a dictionary of the resulting annotations, where the dictionary key is the entity type (say, PERSON or LOCATION or whatever the selected model has been training to recognize). Multiple values for the same key are returned as a list.

Printing the results

The only thing left to do now is to decide how our client shall return the results of the call. We have decided here for the results to be printed into a CSV file with one line per processed file, the columns being the entity types of the selected models and each cell containing the list of entities for that entity type in that file:

output = pd.DataFrame()
text = open(txtfile, "r",encoding='UTF-8', errors='ignore').read()
result = call_sherpa(text,project,model,server,mytoken)
result['document'] = txtfile
output = output.append(result, ignore_index=True)
output = output.set_index('document')
output.to_csv(MYOUTPUTFILE, sep='\t')

When launching our sample client on a directory with English public tenders collected from the EU Tender website https://ted.europa.eu/TED/main/HomePage.do and sending them to a Sherpa model trained to find certain metadata such as the tender’s due date, the estimated volume and the tender’s subject, the script returns a csv file with the corresponding information, allowing to easily check for, say, the next tender that is due or the one with the largest volume or the one that best fits my company’s profile – all these questions would have kept me busy for hours if I had to compile the respective information manually.

Conclusion

We have introduced the use of Kairntech REST API with a simple example: Sending a textfile to a given entity extraction model and returning the results of the annotation as a CSV Excel table. Please checkout the complete code here in order to have a working example (the snippets above only outline the highlevel approach).

Our suggested client already responds to a wide range of use cases: Given the appropriate model one could now send documents on clinical trials to the server and automatically generate an Excel table listing what drugs are investigated for which disease; or working on public tenders one could generate a table listing when what kind of project with what contract volume is due; or working with invoices, which amount was paid when to which recipient.

As mentioned at the beginning, the scenario selected here only serves as an example of the potential use of the REST API. Please let us know about additional scenarios that you have in mind. Also note that the REST API is under constant development. At the time of writing this little tutorial we are finalizing the next version of the API that will also offer calls for document categorization and more and more functionality will be added in the near future.