Ephesoft provides intelligent, AI-based Optical Character Recognition (OCR) for PDF files within . Users can customize batch classes of field definitions for various PDF document types that they need to extract data from, and train the system to improve accuracy and the confidence level of the extracted field data. The data from OCR documents can be mapped to fields in a knowledgebase, and converted into records.

Ephesoft allows you to upload sets of documents, and create detailed mappings of the key fields in that document type. For example, a W-2 payment form uses a pro forma layout with fields such as Employee Identification Number, Wages, First and Last Name, Employer, and Social Security Number, which can all be trained for extracting as metadata. Ephesoft OCR uses the following process:

For more information about how it works, see Ephesoft. For information about how users interact with OCR, see Using Ephesoft Intelligent OCR.

A user wishes to capture the key data from the company's PDF contracts into their records. They deploy Ephesoft and work with implementers to train the system to recognize the important fields in their pro forma contracts. Whenever a contract is mailed to the system, they are able to perform Intelligent OCR on it and automatically create a record in the Contracts table that contains the crucial information, including the contract dates, the signing parties, the companies, and the amounts.

The user sends a batch of 20 records for processing non-interactively through Web Services. When the records are returned, five of them have a Document Classification (confidence) score of <5, so they decide to perform manual validation on the files. To do so, they click Validate Files, which sends them to the Ephesoft interface where they can see the files that require validation. They work through the fields that require validation, and once this process has completed the status of the records changes to "Completed".

For more information on implementing Intelligent OCR in your system once the integration has been completed, see Using Ephesoft Intelligent OCR.

The information below will help you to set up an integration and basic configurations for Ephesoft Intelligent OCR.

Prerequisites

Before you begin setting up this feature, you must:

Deploy the Intelligent OCR Entities

  1. In Setup > Integration > Intelligent OCR extension, click Deploy. 
  2. A confirmation dialog will open informing you that new tables, actions, and fields will be added to the knowledgebase. Click OK. 
  3. After running, the following new entities will have been added to the KB:
Entity NameEntity TypeDescription and Purpose
Intelligent OCR FilesTable

The main processing table for Ephesoft files. This table holds the attachment file field and other fields that will be populated when the OCR is performed. These primarily include:

  • Intelligent OCR Method. Select between Interactively and Web Services.
  • Document Classification Score. This floating point field represents a confidence score for the accuracy of the OCR. See below for more information on confidence scores.
  • Document Classification. Displays the document's identified classification.
  • Status/OCR Status. Shows the status of the OCR returned by Ephesoft. This will be updated in real-time. For more information, see OCR Statuses.
  • Batch Class ID. Shows the ID for the batch class that is being used to extract data from the file. For more information, see Batch Classes.

This is also where fields will be added for each batch class metadata item created. Each piece of metadata extracted by Ephesoft needs two corresponding fields: one storing the data itself and one storing the system confidence score representing the accuracy of that data. You won't need to maintain these fields, but you might want to reference them.

It can be helpful to add this table to the left pane, particularly for users who are responsible for manually validating documents with a low confidence score.

Configure Intelligent OCR ActionTable

Acts as an action wizard for the new Ephesoft actions. When the Ephesoft action is set up in any table, it opens a new record in this table where you can add a custom name and description for one of the four actions.

You can create new Ephesoft actions in any other table without opening this table directly, so this table should generally not be added to the left pane.

Check WS StatusAction

If the OCR file is sent through Web Services, this checks the current status of the processing and updates the OCR Status field. When you deploy Ephesoft, the Intelligent OCR Files table includes a Status Check action button that runs this action. Generally, this action should be run periodically by a rule to make sure results are pulled into when they are available.

Intelligent OCR with Web ServicesActionSends the attached PDF for OCR via the Web Services method, which uses the existing batch class definitions to perform a best-guess OCR on the fields. See OCR Methods for more information. When you deploy Ephesoft, the Intelligent OCR Files table includes an Extract Data action button that runs this action.


Configure Server Connection

Now, you need to connect your system to the Ephesoft KB server. Work with your  implementer to configure the global variables and Ephesoft server settings.

When you have confirmation that your system has been connected, continue to the next section.

Create and Connect an Ephesoft Account

After the deployment of the Ephesoft entities, you will be able to create your knowledgebase account with Ephesoft, which will be managed by 

  1. In Setup > Integration > Enable/configure Intelligent OCR, click Configure, then Create and Connect Intelligent OCR Account. 
  2. Choose a password for your account and enter it twice.
  3. Click Create. A message appears at the top of the page if your account was created successfully.

Configure Tables and Fields

The tables that were created when you deployed Ephesoft are not added to the left pane by default. You might want to add them to the left pane in the Preferences menu while you configure your system to make them easier to access.

Before you configure field mappings in the next section, go to the new Intelligent OCR Files table and create all the fields you need to store the extracted data. For each metadata item, you need one field representing the data and one field representing the OCR confidence score. The data field should match the type of data being extracted, but the confidence score field should always be a floating point field.

Note that different batch classes can share the same fields, but you can't reuse the same fields within the same batch class.

Configure Field Mappings

After setting up the connection to the Ephesoft server and creating your own account, you can define the field mappings between the batch classes and the fields in the Intelligent OCR Files table. The field mapping screen contains an aggregate of the fields that have been defined for all document types in the batch class.

When you're ready to map the fields: 

  1. In Setup > integration > Enable/configure Intelligent OCR, click Configure, then Configure Field Mappings. 
  2. Select a batch class from the list and click Next. If you don't see the batch class you need in the list, contact your  implementer to either add the batch class to the list or to request a new batch class for your use.
  3. The Field Mapping tab displays the ID for the batch class, the Ephesoft metadata field list, and drop-down fields for each  field in the Intelligent OCR Files table. For each Ephesoft metadata field, map the  fields for the data and confidence scores.
  4. At the bottom of the list, you can choose whether to create missing choice values if an Ephesoft choice field contains values that do not exist in the  mapped choice field. With this option enabled, you can save time by maintaining the list of choice values only in Ephesoft and letting  copy them over as needed.
  5. Click Finish to complete the mapping, and continue to map as many of the remaining batch classes as you need. 

If you need to extract additional data from a document type, work with your  implementer to configure Ephesoft to allow that data to be extracted. Be prepared with several example documents that include the data you want to extract.


Set Up Batch Classes

In , each batch class has an ID and represents a class of document types. When the file is sent to Ephesoft, it determines what type of document it is for the specified batch class and attempts to capture the fields that have been defined for the type. The default configuration includes some preconfigured batch classes for document classifications. You can also create new classifications in Ephesoft with the help of an   implementer.

To create a new batch class, you need at least ten examples of the document type you want to configure.

Set Up Data Conversions to Other Tables

Once the field mappings have been set up for each batch class, you can define the conversions that will create records in other tables based on the mapped fields. The data conversion action can be used to create records in other tables based on the batch class ID. You might also consider adding a choice field with the different types of documents you'll analyze, then configure a rule to set the batch class based on the choice field.

Disable Ephesoft Integration

If you need to deactivate your Ephesoft integration, you can do so from the configuration page.

  1. In Setup > integration > Enable/configure Intelligent OCR, click Configure.
  2. Click Remove Intelligent OCR Connection.


Related articles