offers -hosted AI Capabilities you can use in your system for clause extraction and named entity recognition (NER) models for several standard contract types. Beyond the hosted AI Capabilities, additional options are available through Amazon SageMaker that can be easily configured and trained. Some of these are already set up for seamless integration with your system. These options are all described in detail below.

AI features require an Enterprise license. Training AI models requires an Extended Enterprise license.

Hosted AI Capabilities

maintains two custom-built AI Capabilities that can identify, delineate, and extract data from contracts. One specializes in the extraction of metadata such as contracting parties, effective dates, and contract titles. The other AI Capability can extract and name clauses such as the termination clause or limitation of liability. There are three additional models that can, respectively, be used to ask questions about the content of a contract, classify the contract as a specific Contract Type, and compare the meaning of two clauses. These AI Capabilities integrate seamlessly with using the steps in Setting Up AI.

The metadata extraction and clause extraction AI Capabilities are already trained to extract items common to most contracts, such as termination clauses and job titles, and can output data directly into records in Agiloft using machine learning actions. can customize each capability to work with your specific contract types to extract a customized set of items. In Agiloft's AI Core, the five capabilities are:

ATHENA-CE-AS: This model is used to extract clauses from Contract records. CE stands for Clause Extraction.
ATHENA-NER-AS: This model is used to extract metadata from Contract records. NER stands for Named Entity Recognition.
QUESTION-ANSWERING-AS: This model is used to find data in Contract records that answer questions being asked by a user. This model can process both extractive questions and yes/no questions.
CLASSIFICATION-FT-AS: This model is used to classify the Contract Type of a contract.
SEMANTIC-TEXTUAL-SIMILARITY-AS: This model is used to evaluate how similar two clauses are. This model can recognize similarity based on both overall meaning and text character analysis.

Question Answering

The Question Answering model can pull data from a contract document based on an input that is posed as a question. It can process both binary and extractive questions. Binary refers to questions that require a yes or no answer, whereas extractive refers to questions that require contextual information. Included below are some examples of how the Question Answering model processes the two types of questions. When asked either kind of question, the model outputs an answer, a confidence score, an answer type, a long answer, and passages.

Here is an example of a binary question.

"question": "Does the agreement renew automatically?"

"answer": "No",
"answer score": 0.9982195496559143,
"answer type": "Yes/No",
"long answer": "This Agreement shall be terminated as of the end of the defined term, unless the parties renew the same in writing.",
"passages": "This Agreement shall be terminated as of the end of the defined term, unless the parties renew the same in writing.\nThis Agreement shall be effective as of December 1, 2018 and shall continue until July 31, 2019 (\u201cTermination Date\u201d), subject to the termination provisions contained in paragraph 6.\nIf any provisions of this Agreement are declared invalid and unenforceable, the remainder of this Agreement shall continue in full force and effect.\nD. This eight (8) month Employment Agreement replaces the previous eleven (11) month Employment Agreement, dated December 29, 2017 (the \u201c11-month Employment Agreement\u201d), which terminates on November 30, 2018, between the Company and Mr.\nIt may not be changed orally but only by an agreement in writing signed by any party against whom enforcement of any waiver, change, modification, extension, or discharge is sought.\n",

Here is an example of an extractive question.

"question": "How can the agreement be renewed?"


"answer": "by an agreement in writing",
"answer score": 0.6870729327201843,
"answer type": "Text",
"long answer": "It may not be changed orally but only by an agreement in writing signed by any party against whom enforcement of any waiver, change, modification, extension, or discharge is sought.",
"passages": "It may not be changed orally but only by an agreement in writing signed by any party against whom enforcement of any waiver, change, modification, extension, or discharge is sought.\nThis Agreement shall be terminated as of the end of the defined term, unless the parties renew the same in writing.\nThis Agreement shall be effective as of December 1, 2018 and shall continue until July 31, 2019 (\u201cTermination Date\u201d), subject to the termination provisions contained in paragraph 6.\n",

The Long Answer output generally contains full sentences or longer phrases that explain where model got the answer from. Passages can return an even larger set of related sentences.

ATHENA

The ATHENA models are proprietary models that are used in a shared AI Capability referred to as Contracts AI. The ATHENA models are multimodels that can find both NER and CE data for numerous different Contract Types. The Contracts AI capability currently consists of two ATHENA models: the ATHENA-NER-AS model and the ATHENA-CE-AS model. ATHENA-NER-AS is used to extract metadata, such as Contract Amount, from multiple different types of contracts. ATHENA-CE-AS is used to extract clauses, such as Termination or Payment Description. The ATHENA model is available in KBs that have an Enterprise license and have AI enabled.

The ATHENA models can extract important data from virtually any kind of contract at lightning speed. Starting with Release 21, the ATHENA models can be configured to only extract metadata or clauses of your choosing using the Labels tab of the machine learning action wizard. This new Labels tab creates a framework where the AI is capable of extracting nearly everything, but can then be tailored down to only extract data relevant to the user's goals. You can configure multiple machine learning actions that use different ATHENA configurations in order to extract different datasets, as well as map extracted data to different fields. This gives users more flexibility around the data that their machine learning actions should extract from contracts, as well as how it is used.

Although ATHENA is available and functional in the current state, it will be continuously improved upon and updated as the Agiloft Data Science team develops and adds more functionality. To get the latest version of models, go to Setup > Integrations and click Configure under AI.

Amazon SageMaker AI Capabilities

Customized AI Capabilities for CLM are hosted in your own AWS environment, and are only available to you once you connect your AWS account with Agiloft. Although it's preferred you use your own AWS account, Agiloft can create and maintain an AWS account for you and then pass the cost on as a component of your regular invoice if necessary. This ensures the privacy of your data and provides you with the freedom to scale processing capacity as needed.

If you want to go beyond the hosted or customized AI Capabilities, additional algorithms on Amazon SageMaker are also available for use with your system. SageMaker is only available with an AWS account. Each algorithm is listed below in more detail.

BlazingText

The BlazingText algorithm implements text classification. It is useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, and more. has trained BlazingText on Amazon SageMaker on over 6,500 documents. It can be used to identify the type of contract from six different contract types.

Examples:

BlazingText is used in 's contract type classification AI Capability.

Purpose: Text Classification

Training Notes: BlazingText expects training material formatted as a text file. Each line should represent a separate training document, with the first element in the row being the classification label in the format __label__YOURLABEL and then the actual text of the document inserted on the same line.

For additional information, see the Amazon AWS BlazingText documentation.

DeepAR

DeepAR is a Neural Network-based forecasting algorithm. Forecasting is a central problem in many businesses and forecasting algorithms are crucial for most aspects of supply chain optimization. For example, these algorithms are used for inventory management, staff scheduling and topology planning.

Examples:

Predict closing time of support tickets
Early detection of projects running out of budget or time
Cost estimation in service agreements or SOW

For additional information, see the Amazon AWS DeepAR documentation.

Factorization Machines

Factorization machines are used for making predictions on large, sparse data sets, where only a few values are non-zero. Some examples include click predictions or movie recommendations.

Examples:

Contract risk evaluation based on risk assignments to another contract

For additional information, see the Amazon AWS Factorization Machines documentation.

K-Nearest Neighbor (K-NN)

K-NN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). The algorithm is automatically non-linear, it can detect linear or non-linear distributed data, and it tends to perform very well with a lot of data points.

Examples:

Predict the values for missing data points in a KB table
Find similar clauses in the clause library

For each class definition, the model expects a field named label_X where X is a class ID number, and several numerical_feature_Y fields, where Y is a numerical feature ID number incremented for each field. The numerical features can also hold non-numerical values, which are stored as integers and resolved back after inference.

Training Notes: This model includes three parameters:

feature_dim: number of numerical features in the training/inference set; this parameter can be tuned
k: number of nearest neighbors to use for inference; this parameter can be tuned
sample_size: to what limit reduce data sample on training with big data sets (to let all data for prediction to fit in memory)

For additional information, see the Amazon AWS K-NN documentation.

Latent Dirichlet Allocation (LDA)

LDA can be used to sort data to a number of topics. Each topic represents a set of words and the distribution of these words in the text. The goal of LDA is to map all input documents to the topics so that the words in each document are mostly captured by those topics. The results from an LDA analysis are often plotted on a scatter chart and colored in order to visualize the distribution of documents between the topics. This model is similar to the NTM model, but it allows a range for how many topics you expect, whereas NTM requires a specific number of topics.

Examples:

Explore the number and type of contracts that are contained in a large batch of documents without having prior knowledge of the document batch’s content.

Purpose: Topic Modelling

Training Notes: There is no way to specify the expected topics, so this model requires multiple rounds of training, alternated with manual review of the results and adjustment of vocabulary, number of topics, or document selection for the next training round. The model doesn't provide a name for the derived topics, so the trainer or another admin must name the topics manually.

For additional information, see the Amazon AWS LDA documentation.

Neural Topic Model (NTM)

Amazon SageMaker’s Neural Topic Model (NTM) caters to use cases where a finer control of the training, optimization, or hosting of a topic model is required. For example, if you need to train models on texts of a particular writing style or domain, such as legal documents, NTM is well-suited to those needs. This AI Capability is similar to LDA, but it requires a specific value for how many topics you expect, whereas LDA allows a range.

Examples:

Automated content tagging
Document summarization
Content recommendations
Support ticket categorization

Purpose: Topic Modelling

Training Notes: There is no way to specify the expected topics, so this model requires multiple rounds of training, alternated with manual review of the results and adjustment of vocabulary, number of topics, or document selection for the next training round. As part of this process, a specialist can download the model and perform additional handling to see the word clouds generated by the model, and then use the word clouds to determine how best to correct the model's training. If NTM is used to prepare for further classification, a specialist can run inference on all the records to update them with their top-scoring topic, review the results, update records with low-scoring or undetected topics, and then run the data set through BlazingText training to create a best-trained BlazingText model. This process is useful for data sets that are completely unsorted and uncategorized.

For additional information, see the Amazon AWS NTM documentation.

XGBOOST

This AI Capability attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models. XGBoost is an algorithm that has recently been dominating applied machine learning and Kaggle competitions for structured or tabular data. It is an implementation of gradient boosted decision trees designed for speed and performance. The algorithm can be used to solve a wide variety of regression or classification problems.

Examples:

Company/customer profiles and behavior predictions
Customer churn predictions
Risk analysis with complex datasets

For additional information, see the Amazon AWS XGBoost documentation.

Hosted AI Capabilities

Question Answering

ATHENA

Amazon SageMaker AI Capabilities

BlazingText

DeepAR

Factorization Machines

K-Nearest Neighbor (K-NN)

Latent Dirichlet Allocation (LDA)

Neural Topic Model (NTM)

XGBOOST

Related articles