Introduction

We will focus here on the investigation and verification part, which is undoubtedly one of the most important but also one of the most time-consuming for the auditors. However, it is essential to guarantee the sincerity of a company’s accounts.

This part already has powerful analysis tools for all structured data (mainly numerical numbers). They make it possible to detect anomalies, sudden variations or even errors in the data produced by the audited company’s information system. These tools point out the operations to be checked as a priority and guide the investigations. Their continuous improvement in recent years has enabled auditors to cope with the dizzying increase in the amount of data to be processed.

There remains the even greater mass of unstructured data. They represent the bulk of the data available in the company. They include, among others:

  • Employment contracts with their amendments for employees,
  • Sales contracts for establishing turnover and therefore credit offers for banks, insurance policies for insurers and commercial leases for property companies,
  • Contracts with suppliers for the purchase of raw materials or services.
  • etc..

In addition, the massive use of e-mail has made the situation even more complex. Thus, e-mails offer a new means of modifying agreements on a permanent or temporary basis and sometimes even significantly. This is extremely frequent, especially in the retail sector, during discounts, one-off promotions or when changing rates. It is also often the case with employment contracts for which it is rare for all the changes to be included. This phenomenon concerns all companies.

The objective of the auditor is to verify that information contained in these contracts or emails is correctly reported in the company’s accounts. This information may be amounts, rates, numbers of months or dates. It is then necessary to add any clauses that are likely to create a commitment on or off the balance sheet.

Unfortunately, and until now, there was only one solution for this: reading all these documents and extracting the necessary information from them, which was transferred to an Excel file and then compared with the information in the information system. In practice, in order to carry out this long and tedious operation, the auditors proceeded by sampling, randomly choosing a few contracts in each category they checked.

Contract Analysis and Artificial Intelligence

Today, there are tools that are easy to use by auditors and do not require any programming skills. These tools require them to define the information they need to extract in the different types of contracts and then from examples provided by these same auditors can learn (thanks to neural networks) how to extract this information. It is therefore the auditors themselves who build their own information extraction solution applicable to all the contracts they have to audit.

In Practice: Commercial Leases

Let’s take the concrete example of the audit of a property company where the objective is to analyse all the contracts of the “commercial lease” type. The auditor’s need is to find in each lease the duration, the annual rent, the address, the amount of the security deposit… and to fill in an Excel file with all this data, then consolidate all the numerical data and make audit calculations.

There are hundreds of leases to be analysed for each audit, it is impossible to do this by hand. What can AI do to help the auditor? Automatically extract this data from the documents with a quality of over 90% and generate an Excel report.

How to proceed?

The auditor must first define all these concepts or “labels”: duration of the lease, amount of rent, address of the lease, duration…

Labels definition in the project

Then the auditor must highlight in contracts examples for each of these labels:

Contract labeling

This work done manually can be long and tedious.

What can AI do to help the auditor create examples more quickly?

Once a few examples have been created, the AI can begin to learn and suggest examples for the auditor to validate, correct or reject. In this way the auditor will save time or can easily subcontract this task if necessary.

Validation UI for suggested labelling to speed up the annotation task

Once the auditor has built up a representative dataset, he or she can check its quality, for example the correct distribution of the examples (annotations) at the label level.

Labels distribution in the dataset

Once this data has been tagged, the auditor will want to find the best algorithm providing the best possible quality to automatically extract this data from new documents. Thus he will want to test different algorithms among the most powerful on the market, and adapted to his problem, such as Spacy, CRF-Suite, DeLFT (BiLSTM), Flair… for the extraction of named entities for example.

Experimentation of different machine learning algorithms and neural networks on the dataset

He will be able to train these algorithms, then test them on his dataset and finally compare their respective quality… label by label:

Comparison of the quality of different algorithms

The results often need to be refined by changing the parameters of the algorithms, this is the domain of Data Scientists and AI specialists.

Tuning of each algorithm proposed for the Data Scientist

The auditor will be able to refine his data set on such and such a label and proceed again to an experimentation phase.

The last step is to put the model thus created into production. It will be able to extract data from thousands of commercial leases without human intervention, with 95% quality in the best case, i.e. better than what humans can do.

Automatic extraction of key information on commercial lease in production
The automatically extracted data are finally consolidated in an Excel file for the auditor.

It is easy to imagine the time saved, time that can be used to bring more value to the audited clients.

Commercial leases are just one example and, as we indicated at the outset, it may be necessary today to analyse emails to validate agreements between a buyer and its suppliers. This is typically the case with mass distribution where changes are very frequently requested in the context of commercial promotions. The enormous number of documents involved, their structure in the form of dialogue, makes their “manual” processing extremely difficult. Furthermore, the sampling method is by definition not applicable since certain information must be found in all these e-mails. In a French legal context, allowing us to go back 5 years, for payments it is easy to guess how many e-mails would have to be processed if we wanted to check that all the suppliers’ discounts or promotions had been taken into account and deducted.

Conclusion

IA now offers new tools to auditors in their missions. These tools can be used by people who have no IT or linguistic skills. But they are above all easily configurable by the latter and can therefore evolve according to the missions for which they are used. Conversely, they retain the experience acquired during each mission, thus increasing their overall quality over time.

We believe that, like the digital data analysis tools that are commonly used by auditors today, these new AI-based solutions will be quickly adopted by these professionals and will transform the way they work.