Available AI Capabilities
Agiloft offers Agiloft-hosted AI Capabilities you can use in your system for clause extraction and named entity recognition (NER) models for several standard contract types. Beyond the hosted AI Capabilities, additional options are available through Amazon SageMaker that can be easily configured and trained. Some of these are already set up for seamless integration with your Agiloft system. These options are all described in detail below.
AI features require an Enterprise license. Training AI models requires an Extended Enterprise license.
Hosted AI Capabilities
Agiloft maintains two custom-built AI Capabilities that can identify, delineate, and extract data from contracts. One specializes in the extraction of metadata such as contracting parties, effective dates, and contract titles. The other AI Capability can extract and name clauses such as the termination clause or limitation of liability. Both AI Capabilities integrate seamlessly with Agiloft using the steps in Setting Up AI.
The metadata extraction and clause extraction AI Capabilities are already trained to extract items common to most contracts, such as termination clauses and job titles, and can output data directly into records in Agiloft using machine learning actions. Agiloft can customize each capability to work with your specific contract types to extract a customized set of items. In Agiloft's AI Core, there are four main models:
- CLASSIFICATION-FT-AS: This model is used to classify the Contract Type of a contract.
- SEMANTIC-TEXTUAL-SIMILARITY-AS: This model is used to evaluate how similar two clauses are. This model can recognize similarity based on both overall meaning and text character analysis.
- ATHENA-NER-AS: This model is used to extract metadata from Contract records. NER stands for Named Entity Recognition.
- ATHENA-CE-AS: This model is used to extract clauses from Contract records. CE stands for Clause Extraction.
The ATHENA models are proprietary Agiloft models that are used in a shared Agiloft AI Capability referred to as Contracts AI. The ATHENA models are multimodels that can find both NER and CE data for numerous different Contract Types. The Contracts AI capability currently consists of two ATHENA models: the ATHENA-NER-AS model and the ATHENA-CE-AS model. ATHENA-NER-AS is used to extract metadata, such as Contract Amount, from multiple different types of contracts. ATHENA-CE-AS is used to extract clauses, such as Termination or Payment Description. The ATHENA model is available in KBs that have an Enterprise license and have AI enabled.
The ATHENA models can extract important data from virtually any kind of contract at lightning speed. Starting with Agiloft Release 21, the ATHENA models can be configured to only extract metadata or clauses of your choosing using the Labels tab of the machine learning action wizard. This new Labels tab creates a framework where the AI is capable of extracting nearly everything, but can then be tailored down to only extract data relevant to the user's goals. You can configure multiple machine learning actions that use different ATHENA configurations in order to extract different datasets, as well as map extracted data to different fields. This gives users more flexibility around the data that their machine learning actions should extract from contracts, as well as how it is used.
Although ATHENA is available and functional in the current state, it will be continuously improved upon and updated as the Agiloft Data Science team develops and adds more functionality. To get the latest version of Agiloft models, go to Setup > Integrations and click Configure under AI.
Amazon SageMaker AI Capabilities
Customized AI Capabilities for CLM are hosted in your own AWS environment, and are only available to you once you connect your AWS account with Agiloft. Although it's preferred you use your own AWS account, Agiloft can create and maintain an AWS account for you and then pass the cost on as a component of your regular invoice if necessary. This ensures the privacy of your data and provides you with the freedom to scale processing capacity as needed.
If you want to go beyond the hosted or customized AI Capabilities, additional algorithms on Amazon SageMaker are also available for use with your Agiloft system. SageMaker is only available with an AWS account. Each algorithm is listed below in more detail.
The BlazingText algorithm implements text classification. It is useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, and more. Agiloft has trained BlazingText on Amazon SageMaker on over 6,500 documents. It can be used to identify the type of contract from six different contract types.
- BlazingText is used in Agiloft's contract type classification AI Capability.
Purpose: Text Classification
Training Notes: BlazingText expects training material formatted as a text file. Each line should represent a separate training document, with the first element in the row being the classification label in the format
__label__YOURLABEL and then the actual text of the document inserted on the same line.
For additional information, see the Amazon AWS BlazingText documentation.
DeepAR is a Neural Network-based forecasting algorithm. Forecasting is a central problem in many businesses and forecasting algorithms are crucial for most aspects of supply chain optimization. For example, these algorithms are used for inventory management, staff scheduling and topology planning.
- Predict closing time of support tickets
- Early detection of projects running out of budget or time
- Cost estimation in service agreements or SOW
For additional information, see the Amazon AWS DeepAR documentation.
Factorization machines are used for making predictions on large, sparse data sets, where only a few values are non-zero. Some examples include click predictions or movie recommendations.
- Contract risk evaluation based on risk assignments to another contract
For additional information, see the Amazon AWS Factorization Machines documentation.
K-Nearest Neighbor (K-NN)
K-NN is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). The algorithm is automatically non-linear, it can detect linear or non-linear distributed data, and it tends to perform very well with a lot of data points.
- Predict the values for missing data points in a KB table
- Find similar clauses in the clause library
For each class definition, the model expects a field named
label_X where X is a class ID number, and several
numerical_feature_Y fields, where Y is a numerical feature ID number incremented for each field. The numerical features can also hold non-numerical values, which are stored as integers and resolved back after inference.
Training Notes: This model includes three parameters:
- feature_dim: number of numerical features in the training/inference set; this parameter can be tuned
- k: number of nearest neighbors to use for inference; this parameter can be tuned
- sample_size: to what limit reduce data sample on training with big data sets (to let all data for prediction to fit in memory)
For additional information, see the Amazon AWS K-NN documentation.
Latent Dirichlet Allocation (LDA)
LDA can be used to sort data to a number of topics. Each topic represents a set of words and the distribution of these words in the text. The goal of LDA is to map all input documents to the topics so that the words in each document are mostly captured by those topics. The results from an LDA analysis are often plotted on a scatter chart and colored in order to visualize the distribution of documents between the topics. This model is similar to the NTM model, but it allows a range for how many topics you expect, whereas NTM requires a specific number of topics.
- Explore the number and type of contracts that are contained in a large batch of documents without having prior knowledge of the document batch’s content.
Purpose: Topic Modelling
Training Notes: There is no way to specify the expected topics, so this model requires multiple rounds of training, alternated with manual review of the results and adjustment of vocabulary, number of topics, or document selection for the next training round. The model doesn't provide a name for the derived topics, so the trainer or another admin must name the topics manually.
For additional information, see the Amazon AWS LDA documentation.
Neural Topic Model (NTM)
Amazon SageMaker’s Neural Topic Model (NTM) caters to use cases where a finer control of the training, optimization, or hosting of a topic model is required. For example, if you need to train models on texts of a particular writing style or domain, such as legal documents, NTM is well-suited to those needs. This AI Capability is similar to LDA, but it requires a specific value for how many topics you expect, whereas LDA allows a range.
- Automated content tagging
- Document summarization
- Content recommendations
- Support ticket categorization
Purpose: Topic Modelling
Training Notes: There is no way to specify the expected topics, so this model requires multiple rounds of training, alternated with manual review of the results and adjustment of vocabulary, number of topics, or document selection for the next training round. As part of this process, a specialist can download the model and perform additional handling to see the word clouds generated by the model, and then use the word clouds to determine how best to correct the model's training. If NTM is used to prepare for further classification, a specialist can run inference on all the records to update them with their top-scoring topic, review the results, update records with low-scoring or undetected topics, and then run the data set through BlazingText training to create a best-trained BlazingText model. This process is useful for data sets that are completely unsorted and uncategorized.
For additional information, see the Amazon AWS NTM documentation.
This AI Capability attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models. XGBoost is an algorithm that has recently been dominating applied machine learning and Kaggle competitions for structured or tabular data. It is an implementation of gradient boosted decision trees designed for speed and performance. The algorithm can be used to solve a wide variety of regression or classification problems.
- Vendor/customer profiles and behavior predictions
- Customer churn predictions
- Risk analysis with complex datasets
For additional information, see the Amazon AWS XGBoost documentation.