TeleForm Process

TeleForm Process Summary

To OCR information from a form with TeleForm, you need to start by doing a form definition. You use a blank form to specify where you want to look for specific pieces of information. After setting up a form definition, the software reads the forms through the following process:

  1. Scan The input can come from several sources – paper, fax, email, the web, etc. The system gathers all information and processes it regardless of source. If the information is on paper, a scanner is needed to create an image file which is interpreted by the software.
  2. Analyze Powerful recognition engines can interpret handwritten text, machine print, bar codes, etc. The techniques used for interpreting are called:
  • Optical Character Recognition (OCR – for reading typed and printed information)
  • Intelligent Character Recognition (ICR – for handwritten texts)
  • Optical Mark Recognition (OMR – for checkboxes)
  1. Verify Not all documents can be fully read. In cases where the software is not 100% certain, the system is programmed to ask for verification. The verification is easily done with a click of the mouse. This “better-safe-than-sorry” philosophy keeps errors to an absolute minimum. 4. Export Perhaps the greatest advantage of the forms processing technology is its integration possibilities. The software is built to easily integrate with any computer system and transfer the interpreted information into any database, spreadsheet, statistical or content management system.

TeleForm Process Details

Form Design

Although TeleForm scan software is designed to handle almost any kind of document, the best return comes from those documents that have been made friendlier to the customer and the technology in the system. These include things like well spaced OMR bubbles, boxes to write in, page anchors, and page identifiers.

Formtran encourages every customer to realize the true potential of OCR software by optimizing their forms for scanning and imaging, recognizing that there is always a productivity tradeoff. The system is designed to determine the optimal balance between recognition performance and customer acceptance; you may find that an image friendly form is also very customer friendly. Formtran can conduct a hands-on class with the people in charge of form design to lay the foundation for this special type of form design.

Forms Recognition

The forms recognition process consists of two parts: Forms Definition and Forms Processing:

Forms Definition is performed by the system administrator but is typically performed only once per form. When a form is created or modified, the new blank form is scanned into the system and “trained” or defined in preparation for use by the Forms Recognition process. New or changed forms may be added as required. During the forms definition process, the operator defines the form to the system and identifies the location of the textual data fields that are contained in the form using simple click-and-drag operations without programming. The system records the topology of the form and uses this topology map to recognize forms as completed forms are scanned and passed to the recognition process.

Forms processing is performed by scanning forms containing data and passing it to the forms recognition process. The form is then compared against templates defined to the system. When a match of the template to the current form is found, the template information is passed to the OCR software to extract text data from the image bitmap. The OCR software is responsible for performing image pre-processing and cleanup, print recognition, data validation, and data formatting.

The OCR engine uses information entered during the forms definition process to extract specified fields from the form. Machine print (OCR), handprint (ICR), bar code, and OMR (mark sense) information is automatically captured. Data successfully extracted will be stored in a file along with its associated image. The OCR engine will attempt to extract data from all fields defined on the form. Any characters that have not been correctly recognized by the OCR engine, as well as any validation errors detected, will be sent to an edit station for correction.

Document Preparation and Scanning

The process starts with the forms received each day. Forms can be separated and batched and ultimately prepared for the scanning function. These batches can then be scanned for processing. Forms must be unfolded, unstapled, etc. prior to scanning. Each stack of documents will be individually placed in the scanner’s feeder. At the command of the scanner operator, the scanner will then automatically feed each stack, which then becomes a batch.

Form Identification

When the software detects that a batch has been scanned successfully, the first operation that is performed is document identification. Every image in every batch is identified to be one of the types of documents that the system has been trained to process. Various forms are detected by the software, allowing the system to know how to read the required information from each document.

Automatic Data Capture

Any questionable characters, fields, groups or validation checks that fail are flagged. Typically, the system’s initial accuracy varies from about 80% to 99.5%. Those fields or documents that are difficult for the OCR software to read will generate more questions. There are several factors that can influence the initial accuracy, such as the following:

  • amount of handprinted fields on the form vs. OMR, OCR or barcodes
  • quality of the actual data being filled (e.g. yes and no are not both marked)
  • number of pre-assigned validations
  • quality of the form design and printing

Data Editing

In an emergency situation, no human intervention is done. Errors are flagged but data is automatically sent to the database without human verification. For daily data collection or where human verification is desired, edit operators handle only those items that were poorly written or faulty in some way. The system flags each field that has been questioned for some reason, each of which is automatically brought to an operator’s attention. Those fields and documents that have no questions pass through the system untouched.

Data Validation and Formatting

One of the most powerful features is the ability to validate data as it is being captured, as well as reformat data to a required style. Data validation checks can include table lookups, math checks, validity checks, etc. Reformatting can include case changing, justification, trimming and padding, and the like. More complex checks or formats can easily be specified.

Data Transfer

When the data passing through the system has been processed, cleaned up, and validated, it is ready to be transferred to the target database along with the original image. The field order and file format to be used can be specified at the time of design and can be easily changed at any time. The transfer process can be set up to be automatic or manual. The transfer of data can either be via ASCII files (csv, xml, etc.), or via ODBC to a database such as Oracle or MS SQL Server.