Ephesoft provides intelligent, AI-based Optical Character Recognition (OCR) for PDF files within . Users can customize batch classes of field definitions for various PDF document types that they need to extract data from, and train the system to improve accuracy and the confidence level of the extracted field data. The data from OCR documents can be mapped to fields in a knowledgebase, and converted into records.
Ephesoft allows you to upload sets of documents, and create detailed mappings of the key fields in that document type. For example, a W-2 payment form uses a pro forma layout with fields such as Employee Identification Number, Wages, First and Last Name, Employer, and Social Security Number, which can all be trained for extracting as metadata. Ephesoft OCR uses the following process:
For more information about how it works, see Ephesoft. For information about how users interact with OCR, see Using Ephesoft Intelligent OCR.
A user wishes to capture the key data from the company's PDF contracts into their records. They deploy Ephesoft and work with implementers to train the system to recognize the important fields in their pro forma contracts. Whenever a contract is mailed to the system, they are able to perform Intelligent OCR on it and automatically create a record in the Contracts table that contains the crucial information, including the contract dates, the signing parties, the companies, and the amounts. The user sends a batch of 20 records for processing non-interactively through Web Services. When the records are returned, five of them have a Document Classification (confidence) score of <5, so they decide to perform manual validation on the files. To do so, they click Validate Files, which sends them to the Ephesoft interface where they can see the files that require validation. They work through the fields that require validation, and once this process has completed the status of the records changes to "Completed". |
For more information on implementing Intelligent OCR in your system once the integration has been completed, see Using Ephesoft Intelligent OCR.
The information below will help you to set up an integration and basic configurations for Ephesoft Intelligent OCR.
Before you begin setting up this feature, you must:
Entity Name | Entity Type | Description and Purpose |
---|---|---|
Intelligent OCR Files | Table | The main processing table for Ephesoft files. This table holds the attachment file field and other fields that will be populated when the OCR is performed. These primarily include:
This is also where fields will be added for each batch class metadata item created. Each piece of metadata extracted by Ephesoft needs two corresponding fields: one storing the data itself and one storing the system confidence score representing the accuracy of that data. You won't need to maintain these fields, but you might want to reference them. It can be helpful to add this table to the left pane, particularly for users who are responsible for manually validating documents with a low confidence score. |
Configure Intelligent OCR Action | Table | Acts as an action wizard for the new Ephesoft actions. When the Ephesoft action is set up in any table, it opens a new record in this table where you can add a custom name and description for one of the four actions. You can create new Ephesoft actions in any other table without opening this table directly, so this table should generally not be added to the left pane. |
Check WS Status | Action | If the OCR file is sent through Web Services, this checks the current status of the processing and updates the OCR Status field. When you deploy Ephesoft, the Intelligent OCR Files table includes a Status Check action button that runs this action. Generally, this action should be run periodically by a rule to make sure results are pulled into when they are available. |
Intelligent OCR with Web Services | Action | Sends the attached PDF for OCR via the Web Services method, which uses the existing batch class definitions to perform a best-guess OCR on the fields. See OCR Methods for more information. When you deploy Ephesoft, the Intelligent OCR Files table includes an Extract Data action button that runs this action. |
Now, you need to connect your system to the Ephesoft KB server. Work with your implementer to configure the global variables and Ephesoft server settings.
When you have confirmation that your system has been connected, continue to the next section.
After the deployment of the Ephesoft entities, you will be able to create your knowledgebase account with Ephesoft, which will be managed by .
The tables that were created when you deployed Ephesoft are not added to the left pane by default. You might want to add them to the left pane in the Preferences menu while you configure your system to make them easier to access.
Before you configure field mappings in the next section, go to the new Intelligent OCR Files table and create all the fields you need to store the extracted data. For each metadata item, you need one field representing the data and one field representing the OCR confidence score. The data field should match the type of data being extracted, but the confidence score field should always be a floating point field.
Note that different batch classes can share the same fields, but you can't reuse the same fields within the same batch class.
After setting up the connection to the Ephesoft server and creating your own account, you can define the field mappings between the batch classes and the fields in the Intelligent OCR Files table. The field mapping screen contains an aggregate of the fields that have been defined for all document types in the batch class.
When you're ready to map the fields:
If you need to extract additional data from a document type, work with your implementer to configure Ephesoft to allow that data to be extracted. Be prepared with several example documents that include the data you want to extract. |
In , each batch class has an ID and represents a class of document types. When the file is sent to Ephesoft, it determines what type of document it is for the specified batch class and attempts to capture the fields that have been defined for the type. The default configuration includes some preconfigured batch classes for document classifications. You can also create new classifications in Ephesoft with the help of an implementer.
To create a new batch class, you need at least ten examples of the document type you want to configure.
Once the field mappings have been set up for each batch class, you can define the conversions that will create records in other tables based on the mapped fields. The data conversion action can be used to create records in other tables based on the batch class ID. You might also consider adding a choice field with the different types of documents you'll analyze, then configure a rule to set the batch class based on the choice field.
If you need to deactivate your Ephesoft integration, you can do so from the configuration page.