My Document - Texas Imaging Company

PaperVision® Capture
Administration Guide
PaperVision Capture Release 79
November 2014
Information in this document is subject to change without notice and does not represent a commitment on
the part of Digitech Systems, Inc. The software described in this document is furnished under a license
agreement or nondisclosure agreement. The software may be used or copied only in accordance with the
terms of the agreement. It is against the law to copy the software on any medium except as specifically
allowed in the license or nondisclosure agreement. No part of this manual may be reproduced or
transmitted in any form or by any means, electronic or mechanical, including photocopying and recording,
for any purpose without the express written permission of Digitech Systems, Inc.
Copyright © 2014 Digitech Systems, Inc. All rights reserved.
Printed in the United States of America.
PaperVision Capture and the Digitech Systems, Inc. logo are trademarks of Digitech Systems, Inc.
PaperVision Enterprise, ImageSilo, and PaperFlow are registered trademarks of Digitech Systems, Inc.
Microsoft, Windows, SQL Server, Access, and .NET Framework are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries.
All other trademarks and registered trademarks are the property of their respective owners. The Microsoft
Office User Interface is subject to protection under U.S. and international intellectual property laws and is
used by Digitech Systems, Inc. under license from Microsoft.
PaperVision Capture contains portions of OCR code owned and copyrighted by OpenText™ Corporation.
All rights reserved.
PaperVision Capture contains portions of OCR code owned and copyrighted by Nuance
Communications,Inc. All rights reserved.
PaperVision Capture contains portions of imaging code owned and copyrighted by EMC Corporation. All
rights reserved.
Digitech Systems, Inc.
8400 E. Crescent Parkway, Suite 500
Greenwood Village, CO 80111
Phone: (303) 493-6900 Fax: (303) 493-6979
www.digitechsystems.com
Table of Contents
Chapter 1 Introduction
PaperVision Capture Terminology
Supported Users in the PaperVision Administration Console
System Requirements
Supported Scanners
Maximum Image Sizes
Initial Log In
Logging Out
Using Online Help
9
9
10
11
12
12
13
13
13
Chapter 2 Global Administration
Automation Service Status
Email Queue
Global Administrators
Licensing
Maintenance Queues and Maintenance Logs
Process Locks
System Settings
Automation Service Scheduling
16
16
17
18
19
22
24
25
27
Chapter 3 Entity Administration
Creating a New Entity
Deleting an Entity
Editing the Properties of an Entity
General Security
Encryption Keys
Security Policy
System Groups
System Users
Current Activity
30
30
31
32
32
32
34
37
39
43
Chapter 4 Job Creation and Configuration
Opening the Job Definitions Window to Create or Edit a Job
Job Definitions Window Overview
Working with Job Steps
Available Job Steps
Adding Job Steps to a Job
Working with Job Step Links
Editing Job Steps
Moving Job Steps
Specifying the Appearance of Job Steps
Setting Common Job Step Properties
Assigning Users to Manual Job Steps
Working with Jobs
Setting Job Properties
Configuring Detail Sets
46
46
47
47
48
49
50
51
51
52
53
56
57
58
59
PaperVision® Capture Administration Guide
iii
Table of Contents
Saving Jobs
Validating Jobs
Activating Jobs
Checking in Jobs
Checking out Jobs
Undoing Checkout for Jobs
Deactivating Jobs
Deleting Jobs
Importing Jobs
Exporting Jobs
Cloning Jobs
Customizing the Job Definitions Window
Moving Components
Applying Auto Hide to Components
Viewing/Hiding Components
Using the Main Workspace
Viewing an Open Job
Setting Zoom Options
Sorting Properties
Customizing Columns on the Job Steps Grid
Sorting Columns on the Job Steps Grid
60
60
61
61
62
62
62
62
63
63
64
64
64
65
65
66
66
67
67
67
68
Chapter 5 Capture Step
Configuring a Capture Step
Auto Document Break
Auto Page Rotation
Black and White Image File Type
Color Image File Type
Display Saved Images Only
Max Number Documents Per Batch
Minimum Page Size
New Batch Name (Regular Expression)
Prompt for New Batch Information (Auto)
Rotate Before Barcode
Custom Code Events (Step Level)
Update Detail Sets on Save
Indexes
Manual Barcode and OCR Indexing
Manual QC
Operator Permissions
Scanner Requirements
Manual Barcode and OCR Indexing
69
69
70
70
71
71
72
72
72
72
72
73
73
74
74
75
75
76
77
77
Chapter 6 Indexing Configuration
Configuring an Indexing Step
Viewing the Index Configuration Settings
Adding, Removing, and Sorting Indexes
Indexing Properties
Index Configuration - General (Job Level)
Index Configuration - General (Step Level)
Predefined Index Values (Job Level)
81
81
82
82
83
86
90
93
PaperVision® Capture Administration Guide
iv
Table of Contents
Index Types and Formats
General [Step Level] Property Settings
Update Detail Sets on Save
Configuring an Indexing Step to Include Forms Magic QC
Manual Barcode and OCR Indexing
Index Zones
94
95
95
96
97
100
Chapter 7 Barcode Configuration
Auto Document Break
Index - General (Job Level)
Indexes - General (Step Level)
Indexes - Predefined Index (Job Level)
Barcode Zones
Barcode Explorer
Barcode Explorer Properties
104
104
104
106
107
107
111
114
Chapter 8 Zonal OCR
Auto Document Break
Indexes - Line Feed Delimiter
Indexes OCR Parsing
Edit OCR Zones Operations
General OCR and Miscellaneous Properties
Nuance Zonal OCR
Nuance OCR Page Properties
Nuance OCR Zone Properties
Open Text Zonal OCR
Open Text OCR: Supported Countries and Languages
118
118
118
119
121
126
127
127
130
147
150
Chapter 9 Nuance Full-Text OCR
Configuring a Nuance Full-Text OCR Job Step
Setting the Auto Image Orientation Property
Setting the Outputs Property
Setting the Override Invalid Pages Property
Setting the Timeout (sec) Property
Editing Nuance Full-Text OCR Settings
Configuring Output Types
Testing Full-Text OCR Filters
Nuance Full-Text OCR Output Types
154
154
155
156
156
157
157
159
161
163
Chapter 10 Open Text Full-Text OCR
Supported Output File Types
Custom Code
Auto Rotate
Brightness Sample Size
Brightness Threshold
Country/Language
Minimum Confidence
Remove Line System
Timeout Value (in seconds)
Compression
PDF Version
229
230
230
230
230
231
231
232
232
232
232
233
PaperVision® Capture Administration Guide
v
Table of Contents
Rejection Symbol
233
Chapter 11 Image Processing
Configuring an Image Processing Job Step
Configuring Image Processing Filters
Image Processing Filters
Background Dropout
Binary Dilation
Binary Erosion
Binary Halftone Removal
Binary Hole Removal
Binary Invert Image
Binary Line Removal
Binary Noise Removal
Binary Skeleton
Binary Smoothing
Black Overscan Removal
Color Adjustments
Color Detection and Conversion
Color Dropout
Crop
Deskew
Image Fit
Page Deletion - Always
Page Deletion - Blank
Page Deletion - Color Content
Page Deletion - Dimensions
Page Deletion - File Size
Redaction
Rotation
Scaling
Threshold
234
234
236
242
243
244
245
246
246
248
248
250
251
252
253
255
256
258
259
260
262
263
263
265
266
266
267
269
271
272
Chapter 12 Custom Code Configuration
Custom Code Generators
Digitech Systems' API
Custom Code Event Arguments
Additional API Functions
Enumerations
Public Properties
Debugging Custom Code
Script Editor
Match and Merge Wizard
Exports
Export Definitions
Job Configuration
275
275
277
279
287
291
294
295
295
302
306
307
350
Chapter 13 Quality Control (QC)
Automated Quality Control (QC)
Manual Quality Control (QC)
Adding Custom QC Tags
355
355
362
362
PaperVision® Capture Administration Guide
vi
Table of Contents
Adding Pass and Fail Links
Custom Code Events (Step Level) Properties
Index
QC Auto Play
Operator Permissions
363
365
366
366
367
Chapter 14 Batch Splitting
Configure Batch Splitting
Test Batch Splitting Configurations
369
369
380
Chapter 15 Forms Magic Processing
Configuring a Forms Magic Processing Job Step
383
383
Chapter 16 Forms Magic Index Mapping
Configuring a Forms Magic Index Mapping Job Step
Mapping Forms Magic Detail Sets and Fields
386
386
387
Chapter 17 AP Processing
Configuring an AP Processing Job Step
Setting Properties for Multiple PO Processing
Defining Rejection Reasons
Setting Properties for Single PO Processing
Configuring an External Data Source
389
389
390
390
390
391
Chapter 18 Business Rules
Configuring a Business Rules Job Step
Configuring AP (Accounts Payable) Business Rules
Configuring Capture Detail Set Business Rule
Configuring Capture Index Business Rules
Configuring Forms Magic Business Rules
Specifying an External Data Source Provider for Business Rules
394
394
395
397
398
402
405
Chapter 19 Capture Batches
File Menu
Help Menu
Batch Management
Batch Statistics
408
408
408
408
415
Appendix A Additional Help Resources
Technical Support
Contacting Digitech Systems
Community Support
425
425
425
426
Appendix B Nuance OCR Spelling Languages
428
Appendix C Modifying the Process Batch Operation
432
Appendix D Maximum Image Sizes
434
Appendix E Terminal Services Configuration
435
PaperVision® Capture Administration Guide
vii
Table of Contents
Appendix F Open Text Countries and Languages
436
Appendix G Digitech Logging Utility
Configuring the Digitech Logging Utility
440
440
PaperVision® Capture Administration Guide
viii
Chapter 1 Introduction
The PaperVision Capture Administration Console provides a single location for global, system, and job
administration. This tool helps you manage jobs, batches, statistics, user and group profiles, and automation
service settings. The Job Definitions window provides detailed control over image-capture settings when you
define PaperVision Capture jobs and job steps, as well as the users and groups who are assigned to these
steps.
PaperVision Capture Terminology
This content includes definitions for terms used in the documentation and online help for PaperVision
Capture. Learning this terminology will help you use the product and its documentation more effectively.
Batch - A batch is a collection of documents and their associated index name-value pairs and statistics that
are moved as a logical unit of work through a job.
Batch Priority - Batch priority refers to the order in which batches awaiting ownership are displayed in the
PaperVision Capture Operator Console and processed by the PaperVision Capture Automation Service. The
following values are assigned by administrators to calculate the overall batch priority.
l
l
l
l
Job age priority is a number associated with the job and is multiplied by the number of elapsed minutes
since the batch was created.
The job step's age priority is a value associated with the current job step and is multiplied by the number of
elapsed minutes the batch has been waiting in the current step.
The job step priority is a value associated with the current job step and assigned by an administrator.
Administrative priority is a value associated with each specific batch. To have a significant impact on the
overall calculation, administrators can assign a wide range of values (0-999,999) to this priority.
Administrators assign numbers to indicate batch urgency and assist with scheduling and resource allocation.
The system uses these numbers, which range from 0 (not urgent) to 100 (urgent), to schedule system
resources and assign higher-priority batches to users. Batch priority helps administrators efficiently manage
job loads and enables the system to automatically assign prioritized batches to operators in a round-robin
fashion.
The overall batch priority is calculated as follows:
(Job age priority X elapsed minutes since batch was created) + (step age priority X elapsed minutes batch
has been waiting in current step) + job step priority + administrative priority
NOTE: If all priority values are set to zero, the overall calculated priority in the PaperVision Capture
Operator Console’s batch creation screen will remain at zero (regardless of how long batches await
ownership in the Batches Waiting list).
Detail Sets - Detail sets expand the capabilities of standard index fields because they define "many-to-one"
relationships, which allow multiple sets of field data to reference a single document. In a many-to-one
relationship, an index field contains a value that references another field or set of fields that contain unique
values.
PaperVision® Capture Administration Guide
9
Chapter 1 Introduction
Document - A document is the equivalent of a file folder within a filing cabinet. A document holds all of the
pages for a given set of index values.
Image - An image is a visual representation of a picture or graphic, such as an electronic file with the
extensions .bmp, .jpg, or .tif.
Index - An index is a value that users apply to a document for reference and retrieval.
Job - A job is a defined process comprised of one or more job steps through which batches are processed. At
a minimum, each job must contain a start step. Each job is unique by name within an entity.
Job Step - A job step is an automated or manual operation that is performed on a batch. Manual job steps are
performed by assigned users through the PaperVision Capture Operator Console. Automated job steps are
completed by the PaperVision Capture Automation Service, and require no user intervention.
Master Batch Repository - The Master Batch Repository is the centralized storage area where PaperVision
Capture stores all captured images. When installing PaperVision Capture in an environment containing
multiple PaperVision Capture Gateway or PaperVision Capture Automation Servers, this location should be a
network accessible location (for example, \\SERVER\SHARE).
Page - One or more images (files with extensions .bmp, .jpg, and .tif,) comprise a single page within a
document. For example, a page can include the originally captured image and a manipulated image after
noise removal.
PaperVision Capture Administration Console - The PaperVision Capture Administration Console
provides administration and job configuration capabilities.
PaperVision Capture Automation Service - The PaperVision Capture Automation Service is a Microsoft ®
Windows service that performs automated tasks and batch processing at specified time intervals. Examples
of work performed by the PaperVision Capture Automation Service include the compilation of statistics when
an operator completes a batch and the processing of automated job steps. Multiple Automatic Services can
be installed on distinct machines or multiple PaperVision Capture Automation Service processes may be
configured to run on the same machine.
PaperVision Capture Data Transfer Agent Service - The PaperVision Capture Data Transfer Agent
Service is a Microsoft® Windows service that moves batches in local temporary batch repositories to/from
the Master Batch Repository.
PaperVision Capture Gateway Server - The PaperVision Capture Gateway Server is an application server
that enables communication between PaperVision Capture modules and provides access to databases and
the Master Batch Repository in distributed deployment scenarios.
PaperVision Capture Operator Console - The PaperVision Capture Operator Console provides scanning,
indexing, and batch processing capabilities.
Supported Users in the PaperVision Administration Console
The robust security architecture in PaperVision Capture grants control of nearly every aspect of your entity's
access to an entity administrator, who configures the entity's users and groups, security policies,
PaperVision Capture jobs and job steps.
Entity-level security - Entity-level security policies in PaperVision Capture grant entity administrators the
ability to define general security settings for system users and groups.
PaperVision® Capture Administration Guide
10
Chapter 1 Introduction
User-level and group-level security - User- and group-level security defines which jobs and job steps users
can access and which tasks they can perform within those jobs and job steps. Additionally, administrators
can limit which data users can search for, view, and alter.
The PaperVision Capture Administration Console supports the following types of users:
l
Global administrators can configure all settings for all entities.
l
System administrators can administrate all settings for a particular entity.
l
Capture administrators can administrate an entity's job settings, including the configuration of jobs and
job steps within the entity.
System Requirements
This section describes the minimum software and hardware requirements for PaperVision Capture.
Minimum Software Requirements
The following table outlines the minimum software requirements for the PaperVision Capture application.
Software
Operating Systems
Microsoft® .NET
Framework
Version
Windows®
XP Pro SP3 or later (both 32- and 64-bit operating
systems are supported)
Version 4.0 or later (included on the installation media)
Windows Installer
Version 3.1 or later (included on the installation media)
Microsoft® SQL Server
SQL Server 2005 or later
NOTE: SQL Server 2008 R2 Express Edition is included on the
installation media.
Minimum Hardware Requirements
Most enterprise software is capable of operating on a basic hardware configuration that includes a current
processor and 4 GB of memory for desktops and 8 GB of memory for servers. However, each organization and
their intended use of PaperVision Capture are unique. The intended workload, (including the maximum number of
users, and the quantity and types of operations performed within a specific periodicity, etc.) coupled with security
and redundancy requirements will dictate the hardware requirements for each implementation.
PaperVision Capture has the distinct capability to scale both up and out. You can configure most of the functions
performed by PaperVision Capture to take advantage of powerful hardware configurations, such as those with
many processor cores and hundreds of GB of memory (scaling up). Additionally, PaperVision Capture can spread
its processing requirements across numerous computers (scaling out).
PaperVision Capture products are designed and tested for specific operating systems, not hardware
environments. Numerous customers successfully run PaperVision Capture in virtual environments, including
VMware® and Microsoft® Hyper-V. While this technology has matured over the years, issues have occurred
with common software (other than PaperVision Capture) not operating properly or efficiently because of the
virtual environment. In the cases that Digitech Systems' Technical Support has witnessed, the issue was
PaperVision® Capture Administration Guide
11
Chapter 1 Introduction
with the virtual environment, not our software. If our technical support believes that the hardware environment
(including virtual environments) is contributing to an operational or performance issue, they may request that
you ensure the issue exists in a different (or non-virtual) environment.
If you intend to use a virtual environment for your PaperVision Capture implementation, carefully consider the
implications of running in a shared environment. Remember, you are not just sharing processors and memory.
You are also sharing network and disk resources with the other virtual environments on the same hardware.
Supported Scanners
PaperVision Capture supports more than 300 ISIS-compatible scanners. If you need additional scanner
drivers, please contact Digitech Systems’ Technical Support at support@digitechsystems.com or by phone
at (877) 374-3569. If the driver is available, our support personnel will assist you in obtaining the driver.
PaperVision Capture also offers the ability to use TWAIN scanners. The use of TWAIN scanners is generally
intended for extremely low-volume scanners as ISIS drivers are available for most scanners on the market.
Maximum Image Sizes
This topic outlines the approximate limits in image sizes that can be imported into PaperVision Capture and
processed through the Nuance Full-Text OCR, Zonal OCR, and Image Processing steps. The Thumbnails
windows, found in both the Administration and Operator Consoles, can handle substantially larger images.
Additionally, images only stored in memory or simply ingested by PaperVision Capture (therefore not viewed
in Thumbnails windows or processed through the Nuance Full-Text OCR, Zonal OCR, or Image Processing
steps), can also be significantly larger in size.
DISCLAIMER – PLEASE READ: These dimensions are provided only as estimates to identify
size limits in importing, viewing, and processing images in PaperVision Capture. Variations in technical
environments may cause maximum image sizes to fluctuate across systems.
Maximum Image Sizes (in Pixels)
Stored Images
10,000 x 10,000*
* These dimensions can be greater in bitonal images
Image Processing
10,000 x 10,000*
* These dimensions can be greater in bitonal images
Nuance Full-Text OCR and
Zonal OCR
8400 x 8400
Open Text Full-Text OCR and
Zonal OCR
32000 x 24000
PaperVision® Capture Administration Guide
12
Chapter 1 Introduction
Initial Log In
When you open the PaperVision Capture Administration Console,a log in prompt appears. If this is your first
time logging in, the user name is ADMIN and the password is ADMIN.
NOTE: Passwords are case-sensitive.
Logging Out
To log out of the PaperVision Capture Administration Console, click File, and then click Exit. If you have any
unsaved changes, you are prompted to save those changes before you are logged out.
Using Online Help
To effectively use the online Help, you should be familiar with your computer and know how to perform basic
tasks such as running applications; opening, saving, and closing files; using menus, dialog boxes, and
windows; and using the mouse, keyboard commands, and modifier keys. If you have questions about
performing any of these tasks, consult the documentation provided with your computer.
If you are a new user of PaperVision Capture, see the Welcome to the PaperVision Capture Administration
Console topic in the online Help to familiarize yourself with PaperVision Capture terminology and supported
users.
Accessing Online Help
To access online help from the PaperVision Capture Administration Console, click Help
key to open a topic related to the screen you are currently viewing.
or press the F1
Navigating Online Help
The online help is specific to your location in the application. To view other topics, click the topic on the
Contents pane.
l
l
l
l
l
Contents displays the contents for the online Help. Click a book to display the pages that are associated
with each topic, and then click a page to display the corresponding topic on the right pane.
Index lets you search for specific words or phrases and lets you select from a list of index keywords. Click
the keyword to display the corresponding topic on the right pane.
Search allows you to locate words or phrases within the content of your topics. Type the word or phrase to
search for in the text field, press the Enter key, and then select your topic from the list.
Glossary displays a list of definitions for words and short phrases related to PaperVision Capture. When
you select a term, its corresponding definition displays.
Print allows you to print the topic currently viewed or all topics within the main Table of Contents book.
PaperVision® Capture Administration Guide
13
Chapter 1 Introduction
This Help file employs the following types of navigation aids:
l
l
l
l
Click the green hyperlinks to open another topic or additional instructions or supplemental information within
a topic currently viewed.
Click the orange hyperlinks to get information on a web site. These navigational features are underlined and
display in a different color so that you can easily locate them.
Click the Related Topics button to display a list of associated topics, and then click the topic you want to
open.
Click the Back and Forward arrows on the toolbar to page backward and forward through your browsing
history.
You can also press BACKSPACE to return to your previously-viewed topic.
Or, right-click within the topic, and then select Back or Forward to move through your browsing history.
l
Click Hide
or press F6.
on the toolbar to close the Table of Contents pane. To display it again, click Show
Understanding Online Help Conventions
To help you find information quickly, the online Help uses the following conventions:
Bold type style - The names of all dialog boxes, fields, and other controls appear in bold type. For example,
“Click the Properties tab to view the properties of a job.”
Notes and Tips - Notes cover additional information about particular features or concepts. Tips contain
suggestions to help you perform a step more efficiently.
Printing Topics in the Online Help
1. On the Contents tab, click the topic you want to print.
2. On the toolbar, click Print.
3. In the Print Topics dialog box, select one of the following options.
l
l
Print the selected topic prints only the topic currently open.
Print the selected heading and all subtopics prints all of the topics in the book on the Contents
tab where the topic is located.
PaperVision® Capture Administration Guide
14
Chapter 1 Introduction
To include the text accessed from the green underlined links, expand the link(s) before
printing.
4. In the Print dialog box, select the appropriate printer, and then click Print.
Downloads
The Downloads directory contains updates for PaperVision Enterprise or downloads to install other Digitech
Systems tools and software. You can install the available tools and software directly from the web site, or
you can download and install them at a later time. You must have appropriate permissions granted by the
administrator to perform the tasks that are available in this directory. You must also have administrative
rights in Windows to successfully install the Document Viewer plug-in.
NOTE: Windows® XP Pro Service Pack (SP3) is the minimum supported operating system for
PaperVision Enterprise. For more information on installing Windows XP Pro SP3, see the Microsoft
TechNet web site at http://technet.microsoft.com/en-us/library/cc507836.aspx.
Installing Applications
1. From the Downloads page, choose the application, and then click Install Now.
2. If a security dialog box appears, click Install.
3. On the InstallShield Wizard window, click Next.
4. Click Yes to accept the end-user license agreement.
5. Click Finish when the installer notifies you that the software is installed.
Downloading Applications for Future Installation
1. From the Downloads page, choose the application to download, and then click Download.
2. When the File Download dialog box appears, click Save.
3. Choose the file location.
4. Click Save.
PaperVision® Capture Administration Guide
15
Chapter 2 Global Administration
Global administration encompasses the overall functionality of PaperVision Capture that affects all entities. To
access global administration settings, when you log in to the PaperVision Capture Administration Console with the
appropriate global administrator credentials, ensure that the Global check box is checked. After you are logged in
as a global administrator, you can access the following global administration settings for all entities.
l
Automation Service Status displays the current status of all automation servers connected to the
PaperVision Capture database.
l
Email Queue holds email notifications until the automation service actually processes (sends) the email.
l
Global Administrators contains PaperVision Capture's global administrators.
l
Licensing allows global administrators to manage PaperVision Capture licenses for each entity.
l
l
l
Maintenance lists maintenance jobs to be processed by the PaperVision Capture Automation Service and
logs of completed maintenance jobs.
Process Locks contains a list of operations currently locked by the system to prevent attempts to run the
same operations simultaneously.
System Settings contains configuration items for Automation Service Scheduling that automates the
execution of certain operations on timed intervals. This screen also contains configuration items for system
settings such as general utilization limits, email settings, and local settings.
Automation Service Status
Automation Service Status displays the current status of all automation servers connected to the
PaperVision Capture database. More than one automation server process may be running on a single
computer. You can start and stop automation service operations for any process.
Opening the Automation Service Status Pane
l
After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Automation Service Status. The Automation Service Status pane
appears on the right side of the window.
Starting an Automation Service Process
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Automation Service Status.
2. On the right pane, select the server.
3. On the toolbar, click Start
.
PaperVision® Capture Administration Guide
16
Chapter 2 Global Administration
Stopping an Automated Service Process
Stopping the service operations does not stop the process itself; rather, the process receives a command to
not perform further processing after it has finished its current operation.
To stop the service
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Automation Service Status.
2. On the right pane, select the server.
3. On the toolbar, click Stop
.
Deleting an Automation Service Process
This command does not delete the process itself; rather, the status of the process is deleted from the database.
To delete the service
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Automation Service Status.
2. On the right pane, select the server.
3. On the toolbar, click Delete
.
Email Queue
When email notifications are sent via custom code, they are placed into a holding queue which is then
processed by the automation service (to perform the actual sending of the email). The Email Queue
automation service operation must be scheduled before it can process the email. The Email Queue is a list of
emails that are waiting to be sent.
Opening the Email Queue
l
After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Email Queue. The Email Queue appears on the right pane.
Viewing an Email Queue Item
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Email Queue. The Email Queue appears on the right pane.
2. On the right pane, double-click the item you want to view. An Email Queue Properties dialog box displays
the contents of the email.
PaperVision® Capture Administration Guide
17
Chapter 2 Global Administration
Deleting an Email Queue Item
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Email Queue.
2. On the right pane, select the item you want to delete.
3. On the toolbar, click Delete
.
NOTE: If Email Queue items are deleted (either via sending the email or deleting them manually), their
attachments are removed from the Attachment Path, which is assigned in the Email System Settings
screen.
Global Administrators
As a global administrator, you can configure any system setting for all PaperVision Capture entities. You can
also access the settings for each job and job step for all entities. You can create and manage global
administration accounts on the Global Administrators pane.
Viewing Global Administrators
l
After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Global Administrators. A list of global administrators appears on the right
pane.
Creating a New Global Administrator
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Global Administrators.
2. On the toolbar, click Create New Global Administrator
appears.
. The New Global Administrator dialog box
3. In the User Name box, type the name that will be used to log in to PaperVision Capture.
4. In the Full Name box, type the user's full name (optional). The full name is used for PaperVision Capture
reporting.
5. In the Email Address box, type the email address (optional). This specifies where system notifications
should be sent.
6. In the Password box, type the initial password to access the system.
7. In the Confirm Password box, enter the password again to confirm it.
8. Click OK.
PaperVision® Capture Administration Guide
18
Chapter 2 Global Administration
Setting a Password
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Global Administrators.
2. On the right pane, select the global administrator.
3. On the toolbar, click Set Password
. The Set Password dialog box appears.
4. In the New Password box, type the new password.
5. In the Confirm Password box, type the password again.
6. Click OK.
Deleting a Global Administrator
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Global Administrators.
2. On the right pane, select the global administrator you want to delete.
3. On the toolbar, click Delete
.
4. Click Yes to proceed with the deletion.
Editing Properties of a Global Administrator
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Global Administrators.
2. On the right pane, double-click the global administrator.
3. In the Global Administrator Properties dialog box, make the necessary modifications to the account.
4. Click OK.
NOTE: Modifications are applied the next time the global administrator logs in to the PaperVision
Capture Administration Console.
Licensing
PaperVision Capture provides Entity, Concurrent, and Named licenses. Entity licenses are assigned to an
entity and are available to any users for that entity. Concurrent licenses are assigned to a specific entity and
are restricted to a single user at any given time. Concurrent licenses provide the greatest flexibility since a
license is only consumed when a user is logged on to the PaperVision Capture Operator Console. If no
licenses have been added on the Administration Console, the user is prompted that none are available for the
session in the Operator Console.
PaperVision® Capture Administration Guide
19
Chapter 2 Global Administration
Named licenses are assigned per machine or per process and not to individual users. Named licenses are
consumed only by the machine or process to which they are assigned. To ensure that a specific machine is
always available to process automated jobs,assign a named license to your automation server. In this case,
a named license is required for each instance of an automation server.
When an automation service process is executing custom code that adds new documents to a batch, then
the process requires the appropriate licenses based on job configuration. You can configure multiple
automation service processes to run on a single physical machine. When named licenses are used, each
automation server process consumes a license. For example, if three automation service processes were
running on a machine named WINXP, you would need three named licenses as follows:
1. WINXP_0
2. WINXP_1
3. WINXP_2
Conversely, for concurrent licensing, each automation service process still requires a license, but the naming
scheme is not relevant.
In most scenarios, a license is consumed when a user works on a manual step in the PaperVision Capture
Operator Console. A license is released after a user logs off. A license is also released when a user session
has timed out or when a user session is ”killed” via the Current Sessions window in the PaperVision Capture
Administration Console.
Getting a Demonstration License
If you want to run PaperVision Capture in demonstration mode, please contact Digitech Systems’ Technical
Support to get a demonstration license key. The demonstration license includes all functionality within
PaperVision Capture, including global administration features. The demonstration license cannot be
combined with the Concurrent or Named license types.
If you add the demonstration license, a watermark is applied on all images that are scanned or imported into
the PaperVision Capture Operator Console for the entire duration of the batch process. Because the
application writes a watermark onto each captured image, non-repudiation is not supported in demonstration
mode. PaperVision Capture’s demonstration license is designed specifically to show the features and
functionality of the product, and is not designed for high-volume, performance testing. To access nonrepudiation technology and remove watermarks or to perform high-volume testing, you must purchase a
license of PaperVision Capture.
WARNING: All images will be watermarked if a demonstration license is present. Removing the
watermark is a violation of the PaperVision Capture End User License Agreement (EULA).
PaperVision® Capture Administration Guide
20
Chapter 2 Global Administration
Creating a New License
If you are integrating with PaperVision Enterprise, a global administrator can also add licenses in the "thick"
PaperVision Administration Console.
To create a new license
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Licensing. The Licensing pane appears on the right side of the window.
2. On the toolbar, click Create New License
. The New License dialog box appears.
3. In the License Code boxes, type the license code that was included with your product documentation and
media.
4. Do one of the following:
l
l
Click Phone Authorization, and then contact Digitech Systems' Technical Support toll-free at (877)
374-3569 or direct at (402) 484-7777 to get your license key.
Click Web Authorization to get the license key online.
NOTE: You must provide the serial number and identifier code before the license key will be given to
you.
5. In the Obtain Authorization Code dialog box, type the Authorization Code.
6. Click OK. The new license appears in the Licensing screen.
7. To assign the license to an entity, double-click the license to open its properties.
8. Select the entity from the Assigned-To list, and then click OK.
Bulk Importing Licenses
If you have many licenses to add, you can do a bulk import of them from a text file to save time. If you want to
use this feature, contact customer support to get a text file that contains your licensing information, and then
perform the following procedure.
To bulk import licenses
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Licensing. The Licensing pane appears on the right side of the window.
2. On the toolbar, click Bulk Import Licenses
.
3. In the Open dialog box, select the text file that contains the license codes you want to import, and then
click Open.
Each license code that was successfully imported appears on the Licensing pane. If a license code does not
import successfully, a prompt appears asking if you would like to do a phone authorization. If you click Yes,
PaperVision® Capture Administration Guide
21
Chapter 2 Global Administration
the Obtain Authorization Code dialog box appears with the information you need to complete the
authorization by phone.
Deleting a License
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Licensing.
2. On the right pane, select the license you want to delete.You can also delete multiple licenses at one time.
3. On the toolbar, click Delete
.
4. Click Yes to confirm the deletion.
Editing the Properties of a License
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click Licensing.
2. On the right pane, select the license for which you want to edit the properties.
3. On the toolbar, click Properties
. The License Properties dialog box appears.
3. To assign a license to an entity, click the Assigned To list to select another entity.
4. To assign a license to a specific computer, type the machine name in the Named System box, or, click the
Browse button to locate the machine name.
5. Click OK.
Maintenance Queues and Maintenance Logs
The Maintenance Queue lists batches that are submitted and other tasks that are queued for processing by
the PaperVision Capture Automation Service. Once a task has been completed, it is automatically removed
from the queue.
Opening the Maintenance Queue
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Queue. The Maintenance Queue appears on the right pane.
Deleting Maintenance Queue Items
Before you delete an item from the maintenance queue, view the Maintenance Logs and Windows Event
Viewer to identify and troubleshoot any processing errors.
If you delete a Submit Batch queue item, the batch will remain waiting for automated processing. To remedy
this, access Batch Management to change the status of the batch to 'Not Owned.' ( "Batch Management" on
PaperVision® Capture Administration Guide
22
Chapter 2 Global Administration
page 408 for more information.) Changing the batch status allows another operator to assume ownership of
the batch and to repeat the current job step.
NOTE: When a job step is repeated for a batch, some changes made by the previous operator may be
retained, but batch statistics for the previous operator’s work will be deleted.
To delete a Maintenance Queue item
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Queue.
3. On the right pane, select the item(s) that you want to delete.
4. On the toolbar, click Delete
.
WARNING: Deleting a maintenance queue item can cause unexpected data integrity results and
should be used only as a last resort. Before proceeding, you may want to consult with Digitech Systems'
Technical Support.
5. To proceed with the deletion, click Yes.
Viewing a Maintenance Log Entry
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Logs. A listing of all log entries appears on the right pane.
3. On the right pane, double-click the maintenance log entry that you want to view. The Maintenance Log
Properties dialog box appears were you can view the log entry.
4. When you are finished viewing the log entry, click Close.
Exporting Maintenance Logs
You can export maintenance logs to an XML file.
To export maintenance logs
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Logs. A listing of all log entries appears on the right pane.
PaperVision® Capture Administration Guide
23
Chapter 2 Global Administration
3. On the right pane, select the log item(s) to be exported, and then click Export
Maintenance Logs dialog box appears.
. The Export
4. From the Save In list, select the directory to which you want the log exported.
5. In the File Name box,type the file name for the exported log, and then click Save.
Deleting Maintenance Logs
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Logs. A listing of all log entries appears on the right pane.
3. On the right pane, select the log item(s) for deletion, and then click Delete
.
4. To proceed with the deletion, click Yes.
Filtering Maintenance Logs
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then expand Maintenance.
2. Click Maintenance Logs. A listing of all log entries appears on the right pane.
3. On the toolbar, click Filter
. The Maintenance Log Filter dialog box appears.
4. In the Maximum Record Count box, type the maximum number of log entries to display on the screen.
5. Click OK.
Process Locks
Process locks prevent multiple systems from simultaneously processing the same task. When a system
attempts to run a process, it creates a "lock" that prevents any other system from starting the same work. For
example, when System A attempts to run a task that System B is currently processing, System A verifies
that a process lock has not been placed before it sets its own lock.
If a system encounters a failure during processing (for example, a power failure), the process lock may not be
released. In this case, you may have to manually release or delete the lock.
Viewing Process Locks
l
After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Process Locks. A listing of locked processes appears on the right pane.
PaperVision® Capture Administration Guide
24
Chapter 2 Global Administration
Deleting Process Locks
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click Process Locks.
2. On the right pane, select the item you want to delete.
3. On the toolbar, click Delete
.
4. Click Yes to confirm the deletion.
System Settings
In the System Settings dialog box, you can configure utilization limits, client ping communication
increments, and Email settings.
General System Settings
You can configure the following items on the General tab of the System Settings dialog box.
Max Full Text Database Results Per Query
The Max Full Text Database Results Per Query option is currently not used in PaperVision Capture.
Max Global Session Idle Time
The Max Global Session Idle Time specifies the number of minutes that a user can remain idle before the
Automation Service automatically terminates the user session. For sessions, each entity can have a
customized setting that is specified in the entity’s security policy. However, the global value found in System
Settings determines the maximum value that can be configured for each entity.
Max Maintenance Log Age
The Max Maintenance Log Age specifies the number of minutes that maintenance logs can remain in the
system before the Automation Service automatically deletes them (provided that the Maintenance Log
Cleanup operation has been scheduled for completion).
Client Ping Increment
If you are running PaperVision Capture in a client/server configuration, when a user logs in, the server
instructs the client application to send a keep-alive “ping” at the interval specified in the Client Ping
Increment box. This background “ping” is used to verify that the client application is still open. If a client has
not sent the expected “ping” within the specified number of seconds, the session is automatically terminated.
Ping validation is useful for freeing up licenses used by applications that were closed without the user logging
off. Without pinging, the session will not terminate until the session times out.
To configure the general system settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double-click Configure System Settings. The System Settings dialog box appears.
PaperVision® Capture Administration Guide
25
Chapter 2 Global Administration
3. In the Max Global Session Idle Time (minutes) box, type the number of minutes that a user session can
remain idle before the user is logged off.
4. In the Max Maintenance Log age (minutes) box, type the number of minutes that maintenance logs can
remain in the system before the Automation Service automatically deletes them.
NOTE: The default setting of 40320 in the Max. Maintenance Log Age box equates to 28 days.
5. Click OK.
Email Settings
On the Email tab of the System Settings dialog box you can configure email settings. Within any custom
code step, you can call the "TrySendEmail" API method (documented in the PVCaptureBatchAPI.chm help
file) and assign specific parameters such as senders and recipients. For example, this method can be used to
email error notifications or batch statistics to an administrator. Emails generated through custom code will be
placed in the Email Queue global settings screen until the Email Queue automation service operation is
executed.
To configure Email settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double click Configure System Settings.
3. In the System Settings dialog box, click the Email tab. The Email Settings dialog box appears.
4. In the SMTP Server Name/IP box, type the server name or IP address for the server the system will use to
send emails.
5. In the SMTP Server Port box, type the port number to be used to communicate with the SMTP server (the
default value is 25).
6. In the SMTP User Name box, type a valid user name if the specified SMTP server requires authentication
prior to sending an email.
7. In the SMTP Password box, type a password for the SMTP user if the specified SMTP server requires
authentication prior to sending an email.
8. In the Send Email From box, type the email address from which automated emails are sent. Examples of
valid addresses include:
l
“Friendly Name” <address@company.com>
l
address@company.com
9. In the Attachment Path box, type a path where the automated email engine can temporarily store email
attachments until the email is sent. This path should be accessible from all PaperVision Capture servers
sending and generating (through automated custom code tasks) emails.
PaperVision® Capture Administration Guide
26
Chapter 2 Global Administration
NOTE: If Email Queue items are deleted (either via sending the email or deleting them manually), their
attachments are removed from the Attachment Path.
10. To verify your settings are correct, you can click Send Test Message, and then type a valid email address
to which you want the test message sent, and then click OK.
11. When you are finished specifying the Email settings, click OK.
Automation Service Scheduling
Automation Service Scheduling allows you to configure services that automate the execution of certain
operations on timed intervals within PaperVision Capture. If these services are not set up, no automated
processes will run and back-end work, such as processing submitted batches, will not be completed. The
following list describes the provided automated services and what they do.
l
l
Maintenance Queue processes any maintenance items listed in the queue. Maintenance queue items
involve one-time operations such as processing completed batches on the server or updating a specific
job step’s list of predefined index values.
Maintenance Log Cleanup automatically deletes maintenance logs older than the value specified in the
Max Maintenance Log Age setting. (You can modify this setting under Global Administration >
Systems Settings > Configure System Settings > General tab.)
l
Process Batch executes automated PaperVision Capture job steps.
l
Destroy Batch automatically deletes batches that are scheduled for destruction.
l
l
Session Grant Cleanup removes sessions that have remained idle longer than the value specified in
the Max Session Idle Time setting for the entity. (You can modify this setting under Entities > Entity
Name > General Security > Security Policy > General tab.)
Email Queue holds email notifications until the automation service actually processes (sends) the
email.
Opening the Automation Service Scheduling Dialog Box
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double-click Configure Automation Service Scheduling. The Automation Service
Scheduling dialog box appears.
NOTE: You can configure multiple automation servers to run on a single PC. You can specify the
number of automation servers in the PaperVision Capture Setup Tool. (You can find this tool at Start
> Programs > Digitech Systems > PaperVision Capture Setup Tool.) Automation servers on the
same PC are identified by a trailing index (0, 1, 2, etc.) in the automation server name.
PaperVision® Capture Administration Guide
27
Chapter 2 Global Administration
Adding a New Automation Service Schedule
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double-click Configure Automation Service Scheduling. The Automation Service
Scheduling dialog box appears.
3. From the Automation Server list, select the automation server on which you want to schedule services.
4. Click Add. The New Automation Service Schedule dialog box appears.
5. From the Operation list, select one of the following:
l
l
Maintenance Queue processes any maintenance items listed in the queue. Maintenance queue items
involve one-time operations such as processing completed batches on the server or updating a specific
job step’s list of predefined index values.
Maintenance Log Cleanup automatically deletes maintenance logs older than the value specified in the
Max Maintenance Log Age setting. (You can modify this setting under Global Administration >
Systems Settings > Configure System Settings > General tab.)
l
Process Batch runs automated PaperVision Capture job steps.
l
Destroy Batch automatically deletes batches that are scheduled for destruction.
l
l
Session Grant Cleanup removes sessions that have remained idle longer than the value specified in
the Max Session Idle Time setting for the entity. (You can modify this setting under Entities > Entity
Name > General Security > Security Policy > General tab.)
Email Queue holds email notifications until the automation service sends the email.
6. In the Start Time box, type the date and time when the operation will begin. By default, the current date and
time appear.
7. From the Schedule list, select the unit of time you want to use.
8. In the Repetition Schedule area, type how often you want the operation to run.
9. Click OK.
10. On the Automation Service Scheduling dialog box, click Save.
Editing an Automation Service Schedule
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double-click Configure Automation Service Scheduling. The Automation Service
Scheduling dialog box appears.
3. From the Automation Server list, select the automation server on which you want to edit services.
4. Under the Operation column, click the operation you want to edit.
5. Click Edit. The Edit Automation Service Schedule dialog box appears.
PaperVision® Capture Administration Guide
28
Chapter 2 Global Administration
6. Edit the values as needed, and then click OK.
7. On the Automation Service Scheduling dialog box, click Save.
Removing an Automation Service Schedule
1. After you have logged on to the PaperVision Capture Administration Console, expand Global
Administration, and then click System Settings.
2. On the right pane,double-click Configure Automation Service Scheduling. The Automation Service
Scheduling dialog box appears.
3. From the Automation Server list, select the automation server on which you want to remove services.
4. Under the Operation column, click the operation you want to remove.
5. Click Remove, and then click Yes to confirm the removal.
6. On the Automation Service Scheduling dialog box, click Save.
PaperVision® Capture Administration Guide
29
Chapter 3 Entity Administration
An entity is a body (for example, a corporation or organization) that provides its own administration. Only
global and system administrators can configure an entity's properties. Each entity contains its own users,
groups, and jobs that are not shared among entities. You can perform entity administration remotely or from a
direct database connection.
In general, most PaperVision Capture installations, including large enterprise installations, will not need more
than one entity. However, you can configure two entities for a distributed, multiple user scenario. For
example, one office (entity) is located in Denver, Colorado, and the other is located in Lincoln, Nebraska.
Each entity has a separate database, and manages jobs, users, and batches solely for that entity. Both
locations are monitored by a single global administrator. This scenario can alleviate network congestion since
each location is a separate entity. If the Denver office becomes inundated with work and needs assistance
from Lincoln, you can create Lincoln user accounts for the Denver entity so users can be assigned to Denver
jobs. As a result, Lincoln users can log in to the Denver entity and process jobs for Denver.
To view existing entities
l
After you have logged in to the PaperVision Capture Administration Console, click the Entities folder.
A listing of entities appears on the right pane.
The need for multiple entities can arise in the following specific circumstances.
l
l
In a hosting environment where an on-demand provider is hosting data for multiple companies and each
company wants to administrate itself and its users.
In a large enterprise that has different departments or cost centers that want to administrate themselves
(separately from other departments) without having to involve a central IT organization.
Entity properties dictate how the server will handle system-level functions relating to that entity. Global and
system administrators can configure entity properties, as well as create, edit, and delete entities.
Creating a New Entity
1. After you have logged in to the PaperVision Capture Administration Console, click Entities.
2. On the toolbar, click New Entity
. The New Entity dialog box appears.
3. In the Entity Name box, type the name you want to use for the entity, for example, the name of your
company or organization.
4. In the Database Settings area, click Configure. The SQL Data Source Information dialog box appears.
NOTE: Database settings include configuration settings for the database where the entity resides. Only
under special circumstances (that is, moving the database to a different server) should these settings
ever be changed once the entity is created. Changing these settings to another database or server for an
existing entity will NOT create new entity tables. The server will expect them to already exist.
PaperVision® Capture Administration Guide
30
Chapter 3 Entity Administration
5. In the Server IP/Name box, type the IP address or the server name where the database resides.
6. In the Database Name box, type the name of the database.
7. In the User Name box, type the user account for the SQL database.
8. In the Password box, type the password for the user.
NOTE: If you leave the User Name and Password boxes blank, the database connection will use
Windows Authentication credentials. Setting a user name and password for the database will supersede
the Windows Authentication credentials.
9. From the Connection Type list, select the type of connection used to access the database. The default
value is TCP/IP.
10. If you chose TCP/IP in the previous step, in the TCP/IP Port box, type the applicable port number.
11. Click OK.
12. In the Entity Paths area, specify the following paths. You can type the path, or click the ellipsis button
to browse to, and then select the location. The following paths are also used by PaperVision Enterprise.
l
Data Group Path specifies the location where PaperVision Enterprise copies data groups and writes
new documents and versions.
l
Migration/Backup Path specifies the location where migration jobs or backup packages are processed.
l
Full-Text Path specifies the location where full-text database indexes are stored.
l
Batch Path specifies the location for the batch repository where batches created by PaperVision
Capture are stored.
13. If you want to disable any users, including administrators, from logging in to the entity, select the Disable
Entity check box.
14. Click OK. The entity you added appears in the entity list.
Deleting an Entity
Deleting an entity removes it from the database. Additionally, deleting an entity removes any full-text
databases and data groups from PaperVision Enterprise (depending on global system settings).
To delete an entity
1. After you have logged in to the PaperVision Capture Administration Console, click Entities. On the
right pane, a listing of entities appears.
2. On the right pane, click the entity you want to delete.
3. On the toolbar, click Delete
.
4. Click Yes to confirm the deletion.
PaperVision® Capture Administration Guide
31
Chapter 3 Entity Administration
Editing the Properties of an Entity
1. After you have logged in to the PaperVision Capture Administration Console, click Entities. On the
right pane, a listing of entities appears.
2. On the right pane, click the entity you want to edit.
3. On the toolbar, click Properties
. The Entity Properties dialog box appears.
4. When you are finished making changes, click OK to save the new settings.
NOTE: Changing database settings to a new or different database does not create entity tables in the
new database. You must create a new entity to create new entity tables in the database.
General Security
From the General Security screen, you can access settings for encryption keys, security policy,system
groups and system users, system groups, and passwords.
To access the General Security screen
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click General Security, the following options appear on the right pane. Double-click the item you want to
access.
l
Encryption Keys - See "Encryption Keys" on page 32 for more information.
l
Security Policy - See "Security Policy " on page 34 for more information.
l
System Groups - See "System Groups" on page 37 for more information.
l
System Users - See "System Users" on page 39 for more information.
Encryption Keys
You can create encryption keys to protect your data while it resides within PaperVision Capture. Once
created, you can use an encryption key to encrypt batches, images, indices, and full-text OCR data. After a
batch is encrypted, its data is accessible from within PaperVision Capture (even when the encryption key is
modified or deleted), but you can’t open batch images with any viewer. When encryption is enabled, images,
indices, and full-text OCR data that are exported from PaperVision Capture are decrypted during the export.
Generally, encrypted batches may impact overall system performance.
NOTE: Encryption keys created in PaperVision Capture can be used in PaperVision Enterprise and vice
versa.
PaperVision Capture’s encryption process uses the following design:
l
Algorithm: Rijndael – AES (256-bit)
l
Encryption Mode: CBC (Cipher Block Chaining)
l
Padding Method: FIPS81 (Federal Information Processing Standards 81) scheme (ISO10126)
PaperVision® Capture Administration Guide
32
Chapter 3 Entity Administration
l
Secret Key Generation: User-defined pass phrase is passed through the SHA-2 algorithm (Secure Hashing
Algorithm) to generate a 256-bit hash
Viewing Encryption Keys
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Encryption Keys. A listing of encryption keys appears on the
right pane.
Adding Encryption Keys
After new encryption keys are added to the system, only their descriptions can be edited.
To add a new encryption key
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Encryption Keys.
3. On the toolbar, click Add Key
. The Add Encryption Key dialog box appears.
4. In the Key Name box, type a name to identify the key.
5. From the Key Type list, select the type of encryption for the key.
6. In the Pass Phrase box, type a pass phrase for generating the key.
7. Optionally, in the Description box you can type a description of the key.
8. Click OK to save the new encryption key.
Editing an Existing Encryption Key
To prevent any previously-encrypted data from becoming unreadable, you can modify only the description of
the encryption key.
NOTE: You must restart the IIS service after changing the encryption key.
To edit an existing encryption key
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Encryption Keys. A listing of encryption keys appears on the
right pane.
3. Select the encryption key you want to edit, and then click Edit Key
PaperVision® Capture Administration Guide
.
33
Chapter 3 Entity Administration
4. In the Edit Encryption Key dialog box, make your changes to the description, and then click OK. Your
changes will take effect the next time a process loads the key values.
Deleting Encryption Keys
Warning! Data that has been encrypted with an encryption key may become unreadable if that
encryption key is deleted.
To delete an existing encryption key:
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Encryption Keys. A listing of encryption keys appears on the
right pane.
3. Select the encryption key you want to delete, and then click Delete Encryption Key
.
4. Click OK to confirm the deletion.
Security Policy
The Security Policy for entities allows you to specify options for general system settings, authentication,
account lockout, and passwords.
To specify general system settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Security Policy.
3. On the right pane, double-click Security Policy. The Entity Security Policy dialog box appears with the
General System Settings displayed on the right pane where you can specify the following options.
Require all session requests to originate from same source
When you select this check box, all session activity is required to come from the same source as the
original log in. This ensures that someone cannot take over a session to gain access to data. When a user
logs in to PaperVision Capture, it tracks the IP address (or computer name) that was used to gain access.
Each time information is requested from the server, it verifies that the request originated from the same IP
address (or computer name). If not, PaperVision Capture denies the request and sends a notification to
the system administrators.
Enable Integrated Windows Authentication
You can use this option only when PaperVision Capture is connected directly to the client database from a
remote station. When you select this check box, PaperVision Capture users can authenticate using their
Windows domain and user name, thus eliminating the need for them to log in to PaperVision Capture. This
PaperVision® Capture Administration Guide
34
Chapter 3 Entity Administration
requires a PaperVision Capture user account that is in the “Domain\User” format for the Windows user
attempting to log in. You must complete the following steps before you select this check box.
a.
b.
Define the Master Bath Patch as a UNC path, for example,
\\ServerName\MasterBatchPathFolder in the entity’s general properties.
Share the Master Batch Path folder with the appropriate users on the network.
c.
Ensure that the PaperVision Data Transfer Agent service on the client workstation has
access to both the Master Batch Path and the Local Batch Path. If these paths do not
reside on the same machine, a domain account is recommended.
d.
Ensure that the user specified in the previous step has full control (permissions) over the
Master Batch Path folder.
Max session idle time (minutes)
In this box, you can specify the number of minutes that a user session can remain idle before the
automation service automatically terminates the user session. This value cannot exceed the value
assigned in the General Systems Settings by the global administrator.
4. After you are finished specifying general system settings, click OK.
Specifying Authentication Settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Security Policy.
3. On the right pane, double-click Security Policy. The Entity Security Policy dialog box appears.
4. On the left pane, click Authentication. The Authentication Settings appear on the right pane where you
can specify the following options.
Allow logins from any source
When you select this option, users can log in to PaperVision Capture form any IP address.
Allow logins from only these IP addresses/subnets
When you select this option, users can access PaperVision Capture only from the IP addresses or
address ranges that you specify. Enabling this option ensures that access to PaperVision Capture is
limited to specific locations. To enable this option, complete the following steps.
a.
Select Allow logins from only these IP addresses/subnets.
b.
Click Add. The Add IP Address dialog box appears.
c.
In the IP address/subnet box, you can type an exact address, or you can specify a
subnet. To specify a subnet, type only the first X octets of the network, for example,
typing 10.1 or 10.1.0.0 specifies the entire 10.1 class B address space.
d.
Ensure that the PaperVision Data Transfer Agent service on the client workstation has
access to both the Master Batch Path and the Local Batch Path. If these paths do not
reside on the same machine, a domain account is recommended.
PaperVision® Capture Administration Guide
35
Chapter 3 Entity Administration
e.
Ensure that the user specified in the previous step has full control (permissions) over the
Master Batch Path folder.
Remote Authentication Gateway Encryption
This feature is currently not used in PaperVision Capture. If you are sharing a database with PaperVision
Enterprise, in the Key Name box, you can specify the encryption key whose value matches the
encryption key value specified in the PaperVision Gateway Settings application located on the installed
PaperVision Authentication Gateway Server.
5. After you are finished specifying authentication settings, click OK.
Specifying Account Lockout Settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Security Policy.
3. On the right pane, double-click Security Policy. The Entity Security Policy dialog box appears.
4. On the left pane, click Account Lockout. The Account Lockout Settings appear on the right pane where
you can specify the following options.
Accounts never lockout
When you select this option, user accounts are never locked out regardless of how many unsuccessful
log in attempts occur.
Accounts lockout after X invalid attempts
When you select this option, you can specify the number of times an invalid log in attempt can occur
before the user is locked out of PaperVision Capture.
Reset locked accounts after X minutes
When you select this option, you can specify the number of minutes that must elapse before locked
accounts are automatically reset. Otherwise, an administrator must manually unlock the account.
Clear lockout counters after X minutes
When you select this option, you can specify the number of minutes that must elapse before lockout
counters are cleared. Each time a user attempts to log in to PaperVision Capture with an invalid
password, a counter is incremented to record the number of failed attempts since the last successful log
in. Usually, the counter is reset to zero only when the user successfully logs in. However, if no
unsuccessful log in attempts occur during the time period you specify, then the counter is automatically
reset.
5. After you are finished specifying account lockout settings, click OK.
PaperVision® Capture Administration Guide
36
Chapter 3 Entity Administration
Specifying Password Settings
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click Security Policy.
3. On the right pane, double-click Security Policy. The Entity Security Policy dialog box appears.
4. On the left pane, click Passwords. The Password Settings appear on the right pane where you can
specify the following options.
User passwords never expire
When you select this option, user passwords never expire for the entity.
User passwords expire in X days
When you select this option, user passwords automatically expire after the number of days you specify.
When users with expired passwords attempt to log in to PaperVision Capture, they must change their
passwords.
Expire All Passwords
When you select this option, all user passwords are immediately expired so that all users must change
their passwords when they attempt to log in to PaperVision Capture.
No minimum password length
When you select this option, user passwords can be any length, including blank.
Minimum password length X characters
When you select this option, when users reset their passwords, they must contain at least the number of
characters you specify.
Password Complexity
When you select a check box in this area, when users reset their passwords, they must contain the
criteria that you selected
5. After you are finished specifying password settings, click OK.
System Groups
After you have created user accounts, you can define groups of users who have similar access and
functionality needs. Creating system groups lets you assign access more efficiently, as you specify it only
once for the group, rather than for each user. You can create, modify, and delete system groups. You can also
assign groups to job steps on the Job Definitions window.
Adding a New System Group
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
PaperVision® Capture Administration Guide
37
Chapter 3 Entity Administration
2. Expand General Security, and then click System Groups. If any groups exist, a list of them appears on
the right pane.
3. On the toolbar, click New Group
. The New Group dialog box appears.
4. In the Group Name box, type a name for the group.
5. In the Available Users list, select the users you want to include in the group (to include all available users,
select Select All).
6. Click the right arrow. The users you selected appear in the Group Users list.
7. To remove users from the group, in the Group Users list, select the users you want to remove, and then
click the left arrow.
8. To remove all users from the group, under the Group Users list, select Select All, and then click the left
arrow.
9. After you have defined the group, click OK. The group you created appears on the right pane of the
Administration Console.
Deleting a System Group
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Groups. A list of existing groups appears on the right
pane.
3. On the right pane, select the group you want to delete.
4. On the toolbar, click Delete
.
5. Click Yes to proceed with the deletion.
Changing the Members of System Group
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Groups. A list of existing groups appears on the right
pane.
3. On the right pane, select the group you want to edit.
4. On the toolbar, click Properties
. The Group Properties dialog box appears.
5. To add users to the group, in the Available Users list, select the users you want to add, and then click the
right arrow.
6. To add all available users, under the Available Users list, select Select All, and then click the right arrow.
The users you selected appear in the Group Users list.
PaperVision® Capture Administration Guide
38
Chapter 3 Entity Administration
7. To remove users from the group, in the Group Users list, select the users you want to remove, and then
click the left arrow.
8. To remove all users from the group, under the Group Users list, select Select All, and then click the left
arrow.
9. After you have modified the group, click OK .
NOTE: You cannot modify group names. You can modify only group members.
System Users
You can create, modify, and delete system users who have access to PaperVision Capture. You can also
assign and reset passwords for users.
Viewing System Users
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users. If any users exist, a list of them appears on the
right pane.
Creating a New User
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
3. On the toolbar, click Create New User
. The New User dialog box appears.
4. In the User Name box, type the name that will be used to log in to PaperVision Capture.
5. In the Full Name box, type the user’s full name (optional). The user’s full name is used for some of
PaperVision Capture's reporting capabilities.
6. In the Email Address box, type the user’s email address (optional).
7. In the Password box, type the user’s password.
8. In the Confirm Password box, type the password again to confirm it.
9. To force the user to change the password the next time they log in, select User must change password at
next login.
10. To allow the user to change the password at any time, select User can change password when desired.
PaperVision® Capture Administration Guide
39
Chapter 3 Entity Administration
11. In the User Type area, select the appropriate user type. To create a regular user, clear all of the check
boxes. You can specify the following options.
System Administrator - When you select this check box, the user can configure all administrative settings
for a particular entity.
NOTE: If you select System Administrator, the other user types are automatically assigned to the
user.
WorkFlow Administrator - When you select this check box, the user can log in to the PaperVision
Capture Administration Console but cannot perform any functions. This setting is used for PaperVision
Enterprise.
Capture Administrator - When you select this check box, the user can configure jobs and job steps
with the entity.
E-Form Administrator - When you select this check box, the user can create E-Forms in PaperVision
Enterprise. This setting is not used in PaperVision Capture.
12. Click OK.
Setting a User Password
Before changing a password for a system administrator or user, ensure that all batches submitted by the user
have transitioned to the next job step and that the user is not currently logged in to the Operator Console.
To set the user password
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
3. On the right pane, select the user whose password you will set.
4. On the toolbar, click Set Password
. The Set Password dialog box appears.
5. In the New Password box, type the new password for the user.
NOTE: Passwords are case-sensitive.
6. In the Confirm Password box, type the new password again to confirm it.
7. Click OK to set the new password, and then click OK to the confirmation message.
Deleting a User
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
PaperVision® Capture Administration Guide
40
Chapter 3 Entity Administration
3. On the right pane, select the user you want to delete.
4. On the toolbar, click Delete
. The Delete User dialog box appears.
5. Click Yes.
Importing Users
You can import user lists to populate most of their configuration data using a pipe-delimited ( | ) or tabdelimited text file. Each line of the text file can contain the following information in this order:
l
User Name
l
Password
l
Full Name
l
Email Address
l
System Administrator (if value is 1)
l
Other Administrator (if value is 1, 2, or 3)
NOTE: In the Other Administrator column, a WorkFlow Administrator has a value of 1; a Capture
Administrator has a value of 2; and a WorkFlow and Capture Administrator has a value of 3.
l
User must change password at the next log in (if value is 1)
l
User can change password when desired (if value is 1)
Only the first two fields (user name and password) are required on each line of text. If fields are not specified,
the default values are used. The following lines are some sample text of an import file.
user1|password1|Test|test@test.com|0|1|1|1
user2|password2|Test2|test2@test.com|0|3|1|1
To import users
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
3. On the toolbar, click Import Users
. The Import User(s) dialog box appears.
4. Locate the text file that contains the user information, and then select it.
5. Click Open.
6. Click OK to the confirmation message. The imported user information appears on the right pane of the
Administration Console.
NOTE: Existing users are not recreated during the import process.
PaperVision® Capture Administration Guide
41
Chapter 3 Entity Administration
Exporting Users
You can export user configuration data into a pipe-delimited ( | ) or tab-delimited text file. Each line of the text file
can contain the following information in this order:
l
User Name
l
Password
l
Full Name
l
Email Address
l
System Administrator (if value is 1)
l
Other Administrator (if value is 1, 2, or 3)
NOTE: In the Other Administrator column, a WorkFlow Administrator has a value of 1; a Capture
Administrator has a value of 2; a WorkFlow and Capture Administrator has a value of 3.
l
User must change password at next log in (if value is 1)
l
User can change password when desired (if value is 1)
User passwords are not exported and appear as empty strings in the text file. The following lines are some
sample text of an export file.
user1||Test|test@test.com|0|1|1|1
user2||Test2|test2@test.com|0|3|1|1
To export all users
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
3. On the toolbar, click Export Users
.
4. In the Export User(s) dialog box, select the directory where the text file will be saved.
5. In the File Name box, type a name for the file.
6. Click Save.
7. Click OK to the confirmation message.
NOTE: User passwords are not exported from PaperVision Capture and appear as empty strings in
the text file. Consequently, exported users are required to change their passwords the next time they
log in to the Operator Console.
PaperVision® Capture Administration Guide
42
Chapter 3 Entity Administration
Editing User Properties
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand General Security, and then click System Users.
3. On the right pane, select the user.
4. On the toolbar, click Properties
.
5. In the User Properties dialog box, you can edit the following items.
l
l
l
l
l
Full Name - Specifies the user’s full name that is used for some of PaperVision Capture's reporting
capabilities.
Email Address - Specifies the user’s email address
User must change password at next login - When this option is selected , it forces the user to change
the password the next time they log in.
User can change password when desired - When this option is selected, it allows the user to change
the password at any time.
User Type area - Specifies the user type. To create a regular user, clear all of the check boxes. You can
specify the following options.
System Administrator - When you select this check box, the user can configure all administrative
settings for a particular entity.
Workflow Administrator - When you select this check box, the user can log in to the PaperVision
Capture Administration Console but cannot perform any functions. This setting is used for
PaperVision Enterprise.
Capture Administrator - When you select this check box, the user can configure jobs and job steps
with the entity.
E-Form Administrator - When you select this check box, the user can create E-Forms in
PaperVision Enterprise. This setting is not used in PaperVision Capture.
6. When you are finished editing the user properties, click OK.
Current Activity
You can use the Current Activity section in the PaperVision Capture Administration Console to monitor and
manage user sessions. Each time a user logs in to the PaperVision Capture Operator Console, a session is
started. Every time a user accesses the server, PaperVision Capture verifies that the user session is still
valid, performs the requested operation, and then updates the date and time of the user’s last activity in the
Sessions list.
If you are running PaperVision Capture in a client/server configuration, when a user logs in, the server
instructs the client application to send a keep-alive “ping” at the interval specified in the Client Ping
Increment box. (You can modify this setting under Global Administration > Systems Settings >
Configure System Settings > General tab.) This background “ping” is used to verify that the client
application is still open. If a client has not sent the expected “ping” within the specified number of seconds,
the session is automatically terminated. Ping validation is useful for freeing up licenses used by applications
PaperVision® Capture Administration Guide
43
Chapter 3 Entity Administration
that were closed without the user logging off. Without pinging, the session will not terminate until the session
times out.
Opening the Sessions and Licenses List
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand Current Activity, and then click Current Sessions. The Sessions and Available and
Consumed Concurrent Licenses lists appear on the right pane.
For each user logged in to PaperVision Capture, the Sessions list shows the following information.
l
User Name shows the log in name for the session.
l
Login Time shows the date and time when the user logged in to PaperVision Capture.
l
l
l
Last Activity shows the date and time of the user’s last activity in the PaperVision Operator
Console. If a session remains idle longer than the value specified in the Max Session Idle
Time setting for the entity, it is terminated. (You can modify this setting under Entities >
Entity Name > General Security > Security Policy > General tab.)
Last Ping shows the date and time the last ping was sent from the client application to the
server. (You can modify the ping increment under Global Administration > Systems
Settings > Configure System Settings > General tab.)
Licenses shows the licenses being used by the session.
The Available and Consumed Concurrent Licenses list shows the following information.
l
l
l
Concurrent License shows the type of concurrent licenses assigned to the entity.
Number Available shows the total number of licenses available for use by the entity. This
value is the number of licenses purchased, not the number currently available for use.
Number Used shows the number of licenses currently in use. This value is the number of
current user sessions that are using that type of license.
Killing a User Session
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand Current Activity, and then click Current Sessions. The Sessions list appears on the right pane.
3. On the right pane, in the Sessions list, click the user session you want to kill.
4. On the toolbar, click Kill Session
.
5. Click Yes to confirm the session termination.
PaperVision® Capture Administration Guide
44
Chapter 3 Entity Administration
Opening the Lockouts List
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Expand Current Activity, and then click Lockouts. The Lockouts list appears on the right pane.
The Lockouts list shows users that have been locked out of PaperVision Capture based on the entity’s
account lockout security settings. These settings determine whether user accounts are locked after a set
number of unsuccessful attempts to log in. (You can modify this setting under Entities > Entity Name >
General Security > Security Policy > Account Lockout tab.) For each user, the Lockouts list shows
the following information.
l
User ID shows the ID assigned to the locked out user.
l
User Name shows the log in name for the locked out user.
l
Lockout Time shows the date and time when the lockout occurred.
l
Source shows the source from which the user was attempting to access PaperVision
Capture.
PaperVision® Capture Administration Guide
45
Chapter 4 Job Creation and Configuration
In PaperVision Capture, a job is a defined work flow comprised of one or more job steps. For example,you
can set up a job to scan documents, index documents automatically, and then export documents. For
batches to be processed in the PaperVision Capture Operator Console, you must set up at least one job in the
PaperVision Capture Administration Console. Every job must contain, at the minimum, a Capture start step.
Setting up a job requires that you perform the following steps.
1. Define the work flow that you want to use to process batches, and determine which job steps are needed.
(See "Available Job Steps" on page 48 for a brief description of the available job steps.)
2. Create a new job, add the needed job steps, and then link them so that the batch flows logically through the
work flow. (See "Adding Job Steps to a Job" on page 49 and "Working with Job Step Links" on page 50 for
more information.)
3. Set properties for each job step, and assign users to manual job steps. (See "Setting Common Job Step
Properties" on page 53 and "Assigning Users to Manual Job Steps" on page 56 for more information.)
4. Set properties for the job, and then save, validate, activate, and check it in so that the job is available for
use in the PaperVision Capture Operator Console. (See "Working with Jobs" on page 57 for more
information.)
You use the Job Definitions window to create and configure new jobs and modify existing ones. Use the
following procedure to open the Job Definitions window.
Opening the Job Definitions Window to Create or Edit a Job
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
l
l
To edit an existing job, select it, and then click Edit Job
To add a new job, click Create New Job
OK.
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it. The Job Definitions window opens.
If you are ready to create a new job, go to "Working with Job Steps" on page 47.
PaperVision® Capture Administration Guide
46
Chapter 4 Job Creation and Configuration
Job Definitions Window Overview
The Job Definitions window contains the following items.
l
The Job Step Toolbox tab holds the steps that you can use to create jobs.
l
The Properties tab displays the settings for each job and job step.
l
l
The main workspace contains the job steps you select to build the job. Each open job appears on its own
tab.
The Job Steps grid summarizes key information and provides the ability to: assign users and groups to
each job step, specify the order of job steps,and use basic editing features (cut, copy,paste, and delete).
(See "Customizing the Job Definitions Window" on page 64 for more detailed information about using the Job
Definitions window.)
Job Definitions Window
Working with Job Steps
You use job steps to define the work flow for processing batches. Each job step is an automated or manual
operation that is performed on a batch. Manual job steps are performed by assigned users through the
PaperVision Capture Operator console. Automated job steps are completed by the PaperVision Capture
Automation Service and require no user involvement. You work with job steps on the Job Definitions
window. (See "Opening the Job Definitions Window to Create or Edit a Job" on page 46 and "Customizing the
Job Definitions Window" on page 64 for more information.)
PaperVision® Capture Administration Guide
47
Chapter 4 Job Creation and Configuration
As you work with job steps to create a new job, you will perform the following tasks.
1. Determine what job steps are needed for the job. (See "Available Job Steps" on page 48 for a brief
description of the available job steps.)
2. Add and link the required job steps. (See "Adding Job Steps to a Job" on page 49 and "Working with Job
Step Links" on page 50 for more information.)
3. Set job step properties. (See "Setting Common Job Step Properties" on page 53 for more information.)
4. For manual job steps, assign the users that will perform them. (See "Assigning Users to Manual Job Steps"
on page 56 for more information.)
After you have completed the listed tasks, you must save, validate, activate, and then check in the job so
that it is available for use in the PaperVision Capture Operator Console. (See "Working with Jobs" on page 57
for more information.)
Available Job Steps
A job step is an automated or manual operation that is performed on a batch. Manual job steps are performed by
assigned users through the PaperVision Capture Operator console. Automated job steps are completed by the
PaperVision Capture Automation Service and require no user involvement. On the Job Definitions window, the
Job Step Toolbox contains the following job steps that you can use to create jobs. (See "Opening the Job
Definitions Window to Create or Edit a Job" on page 46 if you need help accessing the Job Step Toolbox.)
l
l
l
l
l
l
l
Capture - This manual job step defines parameters for how documents are captured in the PaperVision
Capture Operator Console. For example, you can specify settings for scanning options, when document
breaks should occur, and the maximum number of documents per batch. Every job must contain a Capture
job step to be valid.
Indexing - This manual job step defines how index values are populated and validated in the PaperVision
Capture Operator Console.
Barcode - This automated job step defines how barcodes are processed. You can use barcodes to
populate index values and insert document breaks.
Nuance Zonal OCR and Open Text Zonal OCR - These automated job steps use the Nuance® and
OpenText™ Optical Character Recognition (OCR) engines to extract information from the zones you
define.
Nuance Full-Text OCR and Open Text Full-Text OCR - These automated job steps use the Nuance®
and OpenText™ Optical Character Recognition (OCR) engines to extract pages of text, and then convert
the recognized results into one or multiple file types. File types for the Nuance OCR engine include: .txt,
.rtf, .csv, .pdf, .doc (and .docx) .htm, .xls (and .xlsx), and others. File types for the OpenText OCR engine
include: .pdf, .txt, PaperVision Enterprise (.txt), and PaperFlow (.txt).
Image Processing - This automated job step applies the image processing filters you specify. For
example, filters can remove any unwanted noise, lines, borders, and other extraneous objects from images.
Additional filters identify color within images and delete or retain colors and pages as your specified criteria
are met.
Custom Code - This automated job step runs custom code. You can use this flexible job step to perform
whatever action you define in the custom code.
PaperVision® Capture Administration Guide
48
Chapter 4 Job Creation and Configuration
l
l
l
l
l
l
l
Manual QC - This manual job step defines parameters for performing quality control tasks from the
PaperVision Capture Operator Console. Operators can inspect images and index values, and then apply
QC tags to batches, documents, pages, and index fields. The applied QC tags are used to determine what
action (specified in the job and job step configuration) should be taken next for tagged items.
Automated QC - This automated job step defines parameters for performing quality control tasks on
batches, documents, pages, and index fields without requiring any user action from the PaperVision
Capture Operator Console. This job step can enhance the accuracy and efficiency of batch processing.
Batch Splitting - This automated job step defines parameters for dividing a batch. The splitting of the batch
occurs when the conditions you define are met. The batch can be split so that it flows into another job step
or job.
FM Processing - This automated job step lets you configure a connection to a Forms Magic database so
that the job can access and process data from Forms Magic.
FM Index Mapping - This automated job step can map index fields set up in Forms Magic and process
them. To use this job step, you must first establish a connection to a Forms Magic database using the
FM Processing job step.
AP Processing - This manual job step defines parameters for accounts payable tasks, such as matching
purchase orders to invoices.
Business Rules - This automated job step performs tasks based on predefined business rules. These
business rules perform complex tasks for which there is a common business need, such as ensuring that
invoice totals and date ranges are correct, that specified field values are populated, and performing various
comparison, matching, merging, and validation operations on indexes.
Adding Job Steps to a Job
The Job Step Toolbox contains the steps that you can use to create jobs. (See "Available Job Steps" on
page 48 for a brief description of what each job step does.)
To add a step to a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Click the Job Step Toolbox tab.
3. Add a step to the job using one of the following methods.
l
If you want to automatically create a link between an existing job step and the one you are adding, on the
workspace, select the existing job step that you want the step you are adding to follow. On the Job Step
Toolbox tab, double-click the step you want to add. The added step appears on the workspace linked to
the previous step.
l
On the Job Step Toolbox tab, drag the step you want to add on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select the step you want to add.
If you did not link job steps as you added them (as described under the first bullet of the previous list), you must
create links between job steps so that batches move logically through the work flow. (See "Working with Job Step
Links" on page 50 links for more information.) You also must set properties for job steps and assign the users who
PaperVision® Capture Administration Guide
49
Chapter 4 Job Creation and Configuration
will perform the manual job steps. (See "Setting Common Job Step Properties" on page 53 and "Assigning Users
to Manual Job Steps" on page 56 for more information.)
Working with Job Step Links
This content describes options for job step links. Links define how batches will flow through work steps.You
must have job steps on the workspace of the Job Definitions window to work with links. (See "Adding Job
Steps to a Job" on page 49 for more information.) If you are working with a Manual or Automated QC Job
step, you can define Pass and Fail links that determine how items are routed when they pass or fail the
quality control inspection. (See "Chapter 13 Quality Control (QC)" on page 355 for more information.)
To add a link between job steps from the workspace
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Ensure the job steps you want to work with appear on the workspace. (See "Adding Job Steps to a Job" on
page 49 if you need to add a job step.)
3. On the workspace, hold down the Ctrl key, and then click the two job steps you want to link.
4. On the toolbar or Job Steps menu, click Add Link
.
To add a link between job steps from the Job Steps grid
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Ensure the job steps you want to work with appear on the workspace. (See "Adding Job Steps to a Job" on
page 49 if you need to add a job step.)
3. On the Job Steps grid, click the job step for which you want to create a link.
4. From the list in the Next column, select the job step that you want to come next in the work flow.The link
between the job steps appears on the workspace.
To change the direction of a job step link
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, hold down the Ctrl key, and then click the two job steps for which you want to change
link direction.
3. On the toolbar or Job Steps menu, click Flip Link Direction
.
To remove a link between job steps
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, hold down the Ctrl key, and then click the two job steps for which you want to remove
the link.
3. On the toolbar or Job Steps menu, click Remove Link
PaperVision® Capture Administration Guide
.
50
Chapter 4 Job Creation and Configuration
Editing Job Steps
After job steps are placed on the workspace of the Job Definitions window, you can use the standard Cut,
Copy, Paste and Delete commands to edit them. After you select a job step, you can access these
commands from the Job Steps menu, the main and Job Steps grid toolbars, and by right clicking on the
workspace. (If you need to add job steps, see "Adding Job Steps to a Job" on page 49 for more information.)
To edit a job step
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace or the Job Steps grid, click the job step you want to edit. You can select multiple steps
by holding down the Ctrl key, and then clicking the job steps.
3. On the Job Steps menu, the main or Job Steps grid toolbar, or shortcut menu, click one of the following:
l
Cut
to place the selected job step(s) on the clipboard. A gray grid appears on the job step(s).
l
Copy
to copy the selected job step(s) to the clipboard.
l
Paste
to paste the job steps that are on the clipboard.
l
Delete
to delete the selected job steps.
Moving Job Steps
After a job step is placed on the workspace of the Job Definitions window you can use the following
procedures to move them around. (See "Adding Job Steps to a Job" on page 49 if you need to add job steps.)
To move a job step
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, drag the job step to where you want it.
To move multiple job steps
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, hold down Shift while clicking the left mouse button.
3. Drag the pointer to create a boundary around the job steps you want to move. A blue boundary appears
around the area you define, and the steps in the boundary will move as one unit.
4. Release the mouse button, and then rest the pointer on one of the selected job steps.
5. When the pointer becomes two double-headed arrows, drag the steps to where you want them.
PaperVision® Capture Administration Guide
51
Chapter 4 Job Creation and Configuration
Specifying the Appearance of Job Steps
After a job step is placed on the workspace of the Job Definitions window, you can specify its appearance. (If
you need to add job steps, see "Adding Job Steps to a Job" on page 49.) In addition to the options described in this
content, you can specify appearance settings on the Properties tab. See "To set job step Appearance properties"
on page 53 for more information.
The commands referenced in the following procedures appear on the Format menu and the Alignment toolbar.
To view the Alignment toolbar
l
On the Job Definitions window, click the View menu, point to Toolbars, and then check the
Alignment check box.
Alignment Toolbar
NOTE: When you are applying an alignment or size attribute to multiple job steps, it is based on the job
step that was selected last. For example, if you want to make multiple job steps have the same width, the
resizing is based on the width of the last job step that you select.
Aligning Job Steps
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, hold down the Ctrl key, and then click the job steps that you want to align.
3. On the Format menu, point to Align, and then click one of the following commands.
l
l
Left
Center
aligns the center point of the top and bottom borders of the job steps to the center of the job
step selected last.
l
Right
l
Top
l
l
aligns the left borders of the job steps to the left border of the job step selected last.
aligns the right borders of the job steps to the right border of the job step selected last.
aligns the top borders of the job steps to the top border of the job step selected last.
Middle
aligns the middle point of the left and right borders of the job steps to the middle of the job
step selected last.
Bottom
aligns the bottom borders of the job steps to the bottom border of the job step selected last.
PaperVision® Capture Administration Guide
52
Chapter 4 Job Creation and Configuration
Sizing Job Steps
You can size job steps individually, or you can select multiple job steps and modify their size based on the width
and/or, height, of the job step selected last. Use the following procedures to size job steps.
To size a job step
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, click the job step you want to size, and then rest the pointer on a border of the job step .
3. When the pointer becomes a double-headed arrow, drag the border to size the job step.
To make job steps the same size
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, hold down the Ctrl key, and then click the job steps that you want to make the same
size.
3. On the Format menu, point to Make Same Size, and then click one of the following commands.
l
l
l
Width
resizes the width of the selected job steps so that they are all the same. The resizing is
based on the width of the job step selected last.
Height
resizes the height of the selected job steps so that they are all the same. The resizing is
based on the height of the job step selected last.
Both
resizes the width and height of the selected job steps so that they are all the same. The
resizing is based on the width and height of the job step selected last.
Setting Common Job Step Properties
After job steps are placed on the workspace of the Job Definitions window, you can set properties for them.
(If you need to add job steps, see "Adding Job Steps to a Job" on page 49.) Job step properties let you define
how each job step appears and functions. There are numerous properties available for each job step. The
content in this section describes the Appearance and General properties that are common to all job steps.
For information about properties specific to a job step, see the content for that job step. You use the
Properties tab on the Job Definitions window to set job step properties.
To set job step Appearance properties
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, double-click the job step for which you want to set Appearance properties.
PaperVision® Capture Administration Guide
53
Chapter 4 Job Creation and Configuration
3. On the Properties tab, expand Appearance, and then click the property you want to set. If
or
appears in the column next to the selected property, click the arrow or ellipses button to access available
options.
4. You can set the following properties.
l
l
Automated Image Alignment - This property is available for automated job steps only, and specifies
how images are aligned. The default setting is Bottom-Right.
Color - This property lets you change the color of the job step. You can use custom, web, or system
colors. The default setting is the web color, beige. To change the color, click
Web, or System tab, and then choose the color you want to use.
l
l
l
l
Font - This property lets you change the font that appears on the job step. Click
dialog box where you can select font options.
, select the Custom,
to access the Font
Location - This property displays the current location of the job step based on its X and Y axis
coordinates on the workspace.The displayed values automatically update when a job step is dragged to
a new location on the workspace. You can also change the position of the job step by expanding
Location, and then typing values in the X and Y boxes.
Shape - This property lets you change the shape of the job step. The default setting is Rounded
Rectangle.
Size - This property displays the current width and height (in pixels) of the job step. The displayed values
automatically update when a job step is sized on the workspace. You can also change the size of the job
step by expanding Size, and then typing values in the Width and Height boxes.
To set job step General properties
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the workspace, double-click the job step for which you want to set General properties.
3. On the Properties tab, expand General, and then click the property you want to set. If
or
appears in
the column next to the selected property, click the arrow or ellipses button to access available options.
NOTE: Only the properties that apply to the selected job step are available for you to set. Other properties
may appear, but are unavailable for editing.
4. The following General properties appear.
l
Age Priority - The value of this property is used (with other values) to calculate the overall batch priority
in the PaperVision Capture Operator Console. For details on batch priority calculation, see Batch
Priority under "PaperVision Capture Terminology" on page 9. A value of 0 indicates that this property is
not defined. To define this property, you can set a value from 1 to 100 by typing the value in the box, or
clicking
to access the slider.
PaperVision® Capture Administration Guide
54
Chapter 4 Job Creation and Configuration
l
Assigned To - This property is available for manual job steps only, and specifies the users who can
perform the job step. Click
to access the Job Step Assignment dialog box, select the check boxes
for the users and/or groups you want to assign, and then click OK. For additional information, see
"Assigning Users to Manual Job Steps" on page 56.
l
Batch Destruction Offset - This property is available for any job step, and specifies the amount of time
after the job step is completed that must elapse before the batch is destroyed. The system begins timing
after the operator submits the batch upon completion of the job step. For example, a Capture step has a
Batch Destruction Offset scheduled for 1-hour, and the operator creates a new batch, scans
documents, and then submits the batch. The next time the PaperVision Capture Automation Service
runs (provided that an hour has passed and the Batch Destruction operation has been scheduled to run),
the applicable batch will be destroyed. Click
to access the Destruction Offset dialog box, and then
set values in the Days, Hours, and Minutes boxes. These values represent the duration after which any
batches that complete the step are to be destroyed. If you want to keep the statistics for the batch,
select Retain Statistics, and then click OK.
l
l
l
Is Start Step - By default, this property is available and editable for Capture steps. For a job to be valid, it
must have a Capture step specified as the start step. You can set this property to True or False.
License Requirements - This read-only property displays the software licenses required for the job
step. In most scenarios, a license is consumed when a user works on a manual step in the PaperVision
Capture Operator Console. Automated steps generally do not consume licenses when they run, and will
not display a license in this property. If an automated step requires a license, it will appear under the
License Requirements property for the manual step that precedes it. Barcode and OCR licenses are
only required once those operations are configured. Business Rules job steps are licensed on an entity
basis, so license requirements are not displayed. The system verifies that the entity license exists when
the Business Rules job step is run. For more information, see "Licensing" on page 19.
Merge Like Documents - You can use this property to merge pages from multiple documents with the
same index values into a single document. The merge process is performed on all documents in a batch,
but documents that are not indexed are not included. Use the following procedure to set this property.
1. Click
to access the Merge Like Documents Configuration dialog box.
2. Specify the page order for the merged document by selecting or clearing the Merge In Reverse
Direction check box. Select the check box for pages to appear in the reverse order from which they
are merged. Clear the check box for pages to appear in the order in which they are merged.
3. From the Available list, select the indexes to include in the merge process. This list displays all
indexes that are defined for the job. To select multiple indexes, hold down the Ctrl key. To select all
index values, select the Select All check box.
4. Click the right arrow. The indexes you selected appear in the Selected list.
5. (Optional) To remove indexes from the Selected list, select them, and then click the left arrow. To
remove all indexes, select the Select All check box, and then click the left arrow.
6. For indexes that appear in the Selected list, you can specify whether to include blank values for that
index in the merge process by selecting or clearing the Allow Blank check box. For example, if you
select the Allow Blank check box for an Invoice Number index, all documents must contain blank
index values to be merged into one document. If at least one Invoice Number index value is defined
and the remaining index values are blank (or vice versa), the documents will not be merged. By
PaperVision® Capture Administration Guide
55
Chapter 4 Job Creation and Configuration
default, blank index values are not included in the merged document.
7. Click OK to save your settings.
l
Mode - This read-only property displays whether the job step is manual or automated.
l
Name - This property displays the default name for the job step, but you can edit it.
l
l
Pre-Caching - This property is available for manual job steps only. When you enable this property, the
number of pages you specify is downloaded before the remaining pages in the document are
downloaded. You can use this property to increase productivity in the PaperVision Capture Operator
Console. For example, if an operator manually indexes only the first page of every 10-page document,
you can enable the Pre-Caching property for the Indexing step, and then set Number Pages to 1 so that
when an operator opens a batch, only the first page is downloaded (before the remaining pages) from
each document. This allows the operator to begin working, instead of waiting for the contents of an entire
document or batch to download before they can get started.
Source Image Step - This property specifies a job step from which images are displayed in the
PaperVision Capture Operator Console. For example, you can select the Capture step’s images to
display in the PaperVision Capture Operator Console for the Indexing step so that when the operator
opens the Indexing step, images from the Capture step appear. To set this property, click
select the job step to use as the source for images
l
Step Priority - The value of this property is used (with other values) to calculate the overall batch priority
in the PaperVision Capture Operator Console. For details on batch priority calculation, see Batch
Priority under "PaperVision Capture Terminology" on page 9. A value of 0 indicates that this property is
not defined. To define this property, you can set a value from 1 to 100 by typing the value in the box, or
clicking
l
l
, and then
to access the slider.
Type - This read-only property displays the type of job step.
Use Non-Repudiation - When this property is set to True, images are captured and the SHA-512 hash
value is calculated and stored for each image. The hash value can be exported to content management
systems such that when a user retrieves an image, the hash value is recalculated for the retrieved image
and then verified against the stored hash value to validate that the image was not tampered with.
WARNING: When running a demo license, the application writes a watermark onto each
captured image. Therefore, non-repudiation is not supported in demo mode.
Assigning Users to Manual Job Steps
After a manual job step is placed on the workspace of the Job Definitions window, you can assign which
user(s) will perform it from the PaperVision Capture Operator Console. (If you need to add job steps, see
"Adding Job Steps to a Job" on page 49.)
You can assign manual job steps to single or multiple users, or to a defined group of users. You can create
user groups based on the tasks they perform. For example, you could create a group called “Scan” that
contains all of the users authorized to performing scanning operations, and then assign that group to the
Capture job step. (For information about creating users and groups, see "System Users" on page 39 and
"Adding a New System Group" on page 37.)
PaperVision® Capture Administration Guide
56
Chapter 4 Job Creation and Configuration
You can assign job steps from the Job Steps grid or from the Properties tab. The following content
describes each method. You can use the Job Steps grid to determine which job steps are manual by
checking the Mode column.
Job Steps Grid
To assign users to manual job steps
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Ensure the job steps you want to work with appear on the workspace. (See "Adding Job Steps to a Job" on
page 49 if you need to add a job step.)
Click the Mode column heading to sort the job steps so that all of the Manual job steps are next to
each other. Now you can assign users more quickly.
3. Select the job step to which you want to assign users.
4. Do one of the following:
l
On the Job Steps grid, under the Assigned To column, click
.
l
Click the Properties tab, and then expand General. Click Assigned To, and then click
.
3. On the Job Step Assignment dialog box, select the check boxes for the users and/or groups you want to
assign, and then click OK.
Working with Jobs
This content describes job-related tasks. (If you need help creating a job, see "Chapter 4 Job Creation and
Configuration" on page 46 to get started.) After a job is created, you can set properties for the job, and then
you must save, validate, activate, and check it in so that it is available for use in the PaperVision Capture
Operator Console. The following procedures will walk you these and other job-related tasks.
The following procedures apply to the Job Definitions window. However, the same functionality and toolbar
buttons are also available on the Capture Jobs pane. You can access the Capture Jobs pane from the
Administration Console by expanding Entities, Entity Name, and then clicking Capture Jobs. A list of jobs
PaperVision® Capture Administration Guide
57
Chapter 4 Job Creation and Configuration
appears on the right pane, where you can select a job and perform the tasks described in this content. The
following procedures describe using the toolbar, but you can find the same commands on the Job menu on the Job
Definitions window and the Action menu on the Capture Jobs pane.
Setting Job Properties
Job properties let you define various settings that apply to the entire job. The content in this section describes
the General properties that are common to all job steps. For information about properties specific to a job
step, see the content for that job step. You use the Properties tab on the Job Definitions window to set job
properties.
To set job properties
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Ensure that no job steps are selected by clicking on a blank area of the workspace.
3. Click the Properties tab.
4. If necessary, expand General, and then click the property you want to set. If
or
appears in the
column next to the selected property, click the arrow or ellipses button to access available options.
To clear a field configured by clicking the ellipsis button, right-click the button, and then
click Reset.
5. You can set the following job properties.
l
l
Active - This property specifies whether the job is active. If the property is set to True, the job is
activated. If the property is set to False, the job is not activated in the PaperVision Capture Operator
Console, users can create batches only for active jobs that have been checked into the server. (See
"Activating Jobs" on page 61 for more information.)
Age Priority - The value of this property is used (with other values) to calculate the overall batch priority
in the PaperVision Capture Operator Console. See Batch Priority under "PaperVision Capture
Terminology" on page 9 for details on batch priority calculation. A value of 0 indicates that this property is
not defined. To define this property, you can set a value from 1 to 100 by typing the value in the box, or
clicking
l
l
l
to access the slider.
Comments - This property contains any comments that you want to add about the job. Click the box
next to the Comments property,type your text, and then press Enter. To view a comment, point to the
Comments box.
Custom QC Tags - This property lets you define custom QC tags that appear in the PaperVision
Capture Operator Console. See "Adding Custom QC Tags" on page 362 for more information.
Detail Set - This property defines detail sets for the job. See "Configuring Detail Sets" on page 59 for
more information.
PaperVision® Capture Administration Guide
58
Chapter 4 Job Creation and Configuration
l
Encryption Key - This property lets you select an encryption key to encrypt batches, images, and
indices. Encryption keys must be defined for them to appear in the list. See "Encryption Keys" on page
32 for more information.
WARNING: Using encryption in batches can increase the size of batches and may
adversely affect performance.
l
Entity - This read-only property displays the name of the current entity.
l
Name - This property displays the name of the open job. You can edit this value.
l
Number Steps - This read-only property displays the number of steps in the job.
Configuring Detail Sets
Detail sets are a collection of indexes that define "many-to-one" relationships, which allow multiple sets of
field data to reference a single document. For example, in an accounts payable project, index fields are set up
for the check number, check date, payee, invoice number, and invoice date. If the same check is used to pay
multiple invoices from the same vendor, a single document may be represented as follows:
Check Number
Check Date
Payee
Invoice Number
Invoice Date
12345
12345
12345
09/30/2014
09/30/2014
09/30/2014
ABC Corp
ABC Corp
ABC Corp
A0001
A0002
A0003
09/01/2014
09/02/2014
09/03/2014
The first three index fields (check number, check date, and payee) are duplicated per changing invoice
number. Rather than duplicating the information in the first three fields, you can represent the first three fields
as index fields and assign the remaining two fields, invoice number and invoice date, as detail sets.
Index Fields
Check Number
Check Date
Payee
Document ID (system-generated)*
12345
09/30/2014
ABC Corp
654
* This system Document ID is generated behind the scenes, hidden from your view.
Detail Sets
Invoice Number
Invoice Date
Document ID (system-generated)*
A0001
A0002
A0003
09/30/2014
09/30/2014
09/30/2014
654
654
654
You configure detail sets at the job level. After you define detail sets, you can apply them to specific job steps. The
process for configuring detail sets for jobs follows the same general steps as configuring indexes for a job step.
PaperVision® Capture Administration Guide
59
Chapter 4 Job Creation and Configuration
To configure a detail set
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Ensure that no job steps are selected by clicking on a blank area of the workspace.
3. Click the Properties tab.
4. If necessary, expand General, and then click Detail Set.
5. Click
to open the Detail Set Configuration dialog box.
6. To add an index value, click Add. (See "Adding, Removing, and Sorting Indexes" on page 82 if you need
help).
7. Set the properties under General [Job Level]. (See "Index Configuration - General (Job Level)" on page 86
for a description of each property.)
8. Set the properties under Predefined Index Values [Job Level]. (See "Predefined Index Values (Job Level)
" on page 93 for a description of each property.)
9. When you are finished configuring index properties, click OK.
To clear a configured detail set, on the Properties tab, click Detail Set, right-click the ellipsis
button, and then click Reset.
Saving Jobs
When you are working on the Job Definitions window, an unsaved job shows an asterisk ( * ) next to its
name and appears similar to this:
To save jobs
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Do one of the following:
l
On the toolbar, click Save Job
l
On the toolbar, click Save All
to save the current job.
to save all jobs that are open in the workspace.
Validating Jobs
Before you can make a job available for use in the PaperVision Capture Operator Console, you must validate
it. When you validate a job, the system verifies that all job steps are configured correctly. For example, the
system checks that all manual job steps have users assigned and that required properties are set for all job
steps. The system also verifies that the job step work flow is configured correctly. Jobs can have complex
work flows that include multiple start steps,and steps (such as a QC step with pass and fail links, or a Batch
PaperVision® Capture Administration Guide
60
Chapter 4 Job Creation and Configuration
Splitting step) that can cause some documents in the batch to split from the main work flow. For a job step
work flow to be valid:
l
l
All start steps must end at a single job step, regardless of how documents are routed.
All Batch Splitting job steps must have target jobs/steps configured. (See "Chapter 14 Batch Splitting"
on page 369 for more information.)
To validate a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Validate Job
. If the job is invalid, a window displays the items that must be
fixed, and the borders of invalid job steps are red.
If you place the pointer on an invalid job step, the reason for the error appears.
3. Do one of the following:
l
l
If the job is not valid, fix the errors and then click Validate Job
valid.
. Repeat this process until the job is
If the job is valid, click OK. The job is ready to be activated and checked in.
Activating Jobs
Before you can make a job available for use in the PaperVision Capture Operator Console, you must activate
it. You cannot activate an invalid job, so the system will verify that the job is valid when you activate it. If the
job is invalid, you must fix the errors. (See "Validating Jobs" on page 60 for more information.)
To activate a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Activate Job
.
Checking in Jobs
Before you can make a job available for use in the PaperVision Capture Operator Console, you must check it
in. To check in a job, in must be valid and active. When you check in a job, the system automatically saves it.
To check in a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Check In Job
PaperVision® Capture Administration Guide
.
61
Chapter 4 Job Creation and Configuration
Checking out Jobs
Before you can edit a job, you must first check it out. Only one administrator can check out a job at one time,
that is, multiple administrators cannot work on the same job at the same time.
To check out a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Check Out Job
.
Undoing Checkout for Jobs
If you make changes to a job and do not want to save the changes, you can use the Undo Checkout
command.
To undo a job checkout
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Undo Checkout
.
3. Click Yes to confirm that you want to discard your changes. The job is automatically checked back in.
Deactivating Jobs
A job must be checked out and active before you can deactivate it.
To deactivate a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Deactivate Job
.
Deleting Jobs
Jobs can be deleted regardless of their status, including when they are active and checked in. If you delete a job
that is active and checked in, it will no longer appear in the PaperVision Capture Operator Console.
To delete a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Delete Job
.
3. To proceed with the deletion, click Yes.
PaperVision® Capture Administration Guide
62
Chapter 4 Job Creation and Configuration
Importing Jobs
You can import existing jobs for an entity. To use this feature, you must have an XML file that was created by a job
being successfully exported from the Job Definitions window.
NOTE: Users specified in the Assigned To property are removed when a job is exported. When an
exported jobs is subsequently imported back into the Job Definitions window, the Assigned To
property will be blank and must be reassigned.
When a job that contains a Batch Splitting step is exported, any configured Target Jobs or Steps are
removed, and must be reconfigured when the job is subsequently imported back into the Job Definitions
window.
To import a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Import Job
.
3. In the Open dialog box, locate the XML file that contains the job information, select it, and then click Open.
Exporting Jobs
You can export existing jobs into an XML file.
NOTE: Users specified in the Assigned To property are removed when a job is exported. When an
exported jobs is subsequently imported back into the Job Definitions window, the Assigned To
property will be blank and must be reassigned.
When a job that contains a Batch Splitting step is exported, any configured Target Jobs or Steps are
removed, and must be reconfigured when the job is subsequently imported back into the Job Definitions
window.
To export a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Export Job
.
3. In the Save As dialog box, navigate to the location where you want to save the XML file.
4. In the File Name box, type a name for the job, and then click Save.
PaperVision® Capture Administration Guide
63
Chapter 4 Job Creation and Configuration
Cloning Jobs
Cloning a job copies the components of the open job including its steps, configurations, and assigned users into a
new job.
To clone a job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, click Clone Job
.
3. In the Name box, type a name for the new job.
4. Click OK. The cloned job appears on a new tab.
Customizing the Job Definitions Window
This content describes the components of the Job Definitions window and how you can customize them.
(See "Opening the Job Definitions Window to Create or Edit a Job" on page 46 if you need help accessing this
window.)
The Job Definitions window contains the following components:
l
Main workspace
l
Job Step Toolbox tab
l
Properties tab
l
Job Steps grid
This content describes the features available for working with each component.
Moving Components
You can move components on the Job Definitions screen to accommodate your preferences.You can move
the Properties and Job Step Toolbox tabs, and the Job Steps grid.
To move a component
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Place the pointer on the title bar of the component you want to move.
3. Drag the item to the new location. A purple translucent rectangle indicates where the item will be placed.
4. (Optional) To anchor the component to a side of the Job Definitions window, drag it toward a side until four
arrows in boxes appear. Place the pointer on the box where you want to anchor the component.
PaperVision® Capture Administration Guide
64
Chapter 4 Job Creation and Configuration
Applying Auto Hide to Components
You can apply Auto Hide to the Properties and Job Step Toolbox tabs, and the Job Steps grid. On the
upper-right corner of these components, there is a small button that looks like a pin. If you click this button,
the component is hidden (or “pinned” to the edge of the Job Definitions window). However, you can still see
the title of the component along the edge of the Job Definitions window. When you point to the title, the
component temporarily displays again until you move the pointer.
To apply Auto Hide to a component
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the upper-right corner of the component, click the pin.
Auto-Hide disabled
The tab for the component appears on the side nearest to its original location. For example, if the Job Step
Toolbox tab was anchored on the left side, it now appears as a tab on the left side. The pin appears on its side,
indicating that the Auto Hide feature is enabled. Click the button again to “un-pin” the component.
Auto-Hide enabled
Viewing/Hiding Components
You can view and hide screen components and toolbars on the Job Definitions window.
To view/hide windows
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the View menu, point to Windows.
3. Select the check boxes for the components you want to appear. Clear the check boxes for the components
you do not want displayed.
To view/hide toolbars
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the View menu, point to Toolbars.
3. Select the check boxes for the toolbars you want to appear. Clear the check boxes for the toolbars you do
not want displayed.
PaperVision® Capture Administration Guide
65
Chapter 4 Job Creation and Configuration
Using the Main Workspace
The main workspace makes up most of the Job Definitions window. The main workspace is where you build
jobs. This area always appears on the Job Definitions window, and cannot be hidden or moved like other
components.
Main Workspace
Viewing an Open Job
You can have multiple jobs open on the main workspace of the Job Definitions window. Each open job appears
on its own tab. If the job has unsaved changes, an asterisk ( * ) appears next to the name of the job.
To view an open job
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Do one of the following:
l
l
Click the tab of the job you want to view.
Click the arrow on the upper right side of the workspace, to display a list of open jobs, and then click the
job you want to view.
Open Jobs
PaperVision® Capture Administration Guide
66
Chapter 4 Job Creation and Configuration
Setting Zoom Options
As you work with jobs on the main workspace, you may find it helpful to use the zoom options. For example, if you
have a complex job with many steps, you can zoom out so that you can see all of the job steps without scrolling.
To set zoom options
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar, you can click the following zoom options:
l
Zoom In
l
Zoom Out
l
Zoom Reset
zooms in on the workspace.
zooms out of the workspace.
restores the zoom setting to the default.
Sorting Properties
You can sort the properties that appear on the Properties tab alphabetically or by category.
To sort properties
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Click the Properties tab, and then do one of the following:
l
Click Categorized
to sort the properties by category.
l
Click Alphabetical
to remove the categories and have the properties appear in alphabetical order.
Customizing Columns on the Job Steps Grid
You can specify which columns display on the Job Steps Grid.
To customize columns
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. On the toolbar of the Job Steps grid, click Show/Hide Columns
.
3. In the Select Columns dialog box, select the check boxes for the columns that you want to appear on the
grid. Clear the check boxes for the columns that you do not want displayed.
4. To reorder the columns, select the column you want to move, and then click Move Up or Move Down.
5. When you are finished, click OK.
PaperVision® Capture Administration Guide
67
Chapter 4 Job Creation and Configuration
Sorting Columns on the Job Steps Grid
You can sort some columns alphabetically on the Job Steps grid to help you complete tasks. For example,
you could sort the Mode column so that all manual job steps are together, making it easier to assign users to
them. If sorting is available for a column, an arrow appears in the column heading when you click it.
To sort columns on the Job Steps grid
1. Open the Job Definitions window. (See "Opening the Job Definitions Window to Create or Edit a Job" on
page 46 if you need help.)
2. Click the heading for the column you want to sort.
PaperVision® Capture Administration Guide
68
Chapter 5 Capture Step
The manual Capture step contains scanning options so you can customize PaperVision Capture to meet the
scanning needs for any task. You can also configure index values within the Capture step so operators can
hand-key index values in the PaperVision Capture Operator Console as documents are scanned. Auto
Document Break settings allow you to automatically insert document breaks based on page count, file size,
barcode content, and OCR text. Additionally, you can configure custom code events that the operator can
manually execute while scanning.
This content describes how to configure the Capture properties that you can use when you add an a Capture
job step. See "Chapter 4 Job Creation and Configuration" on page 46 for information about general job set up
and the properties that apply to all job steps.
Configuring a Capture Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. If necessary, add the Capture job step to the workspace using one of the following methods.
l
Select the job step that you want Capture to follow. On the Job Step Toolbox tab, double-click
Capture.
l
On the Job Step Toolbox tab, drag Capture
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select Capture.
7. Double-click the Capture step to display the Properties tab on the left pane.
8. On the Properties tab, you can expand a category, and then click the property you want to set. If
or
appears in the column next to the selected property, click the arrow or ellipses button to access available
options. The properties you can set are described in the subsequent content.
To clear a field configured by clicking the ellipsis button, right-click the button, and then click
Reset.
PaperVision® Capture Administration Guide
69
Chapter 5 Capture Step
Auto Document Break
While scanning documents, you can determine where one document ends and the next document begins
using the Auto Document Break properties. Although you can separate documents manually, you can select
from options that are described below.
Click Mode, and then click the down arrow to select one of the following options:
l
l
l
l
None: This is the default auto-document break type for a newly created step. When set to None, the
system will expect you to manually separate new documents. No options are available for this setting.
Number of Pages Per Document: To assign a fixed number of pages per document, enter the number of
pages that PaperVision Capture will scan before starting a new document. Additionally, you can set the
Prompt Operator property to True to display a message that asks the operator for a fixed number of pages
before breaking to a new document. If you set this property to False, the operator is not prompted.
Barcode: If you select the Barcode mode, click the ellipsis button to the right of the Barcode Zone field to
define the zone. See Barcode Zones for more information. For the Save Page property, select True to leave
the page with the barcode in the batch, or select False to remove the page with the barcode from the batch.
Blank Page: To automatically insert document breaks based on the file size of the image, select Blank
Page. Enter the size (in Kilobytes) of images to be considered blank. You can enter the file size whole
numbers with up to two decimal places. Select True to leave the blank page in the batch, or select False to
remove the blank page from the batch.
NOTE: A job validation error will appear if both the Auto Document Break and Minimum Page Size
Detection properties are enabled.
Auto Page Rotation
The Auto-Page Rotation setting allows you to configure how pages are rotated as images are scanned in
PaperVision Capture.
To assign the page rotation settings:
1. In the Auto Page Rotation field, click the ellipsis button in the right column, to open the Auto Page
Rotation dialog box.
2. Select the page rotation setting from the Apply Rotation To list.
l
l
l
l
None disables the automatic page rotation feature.
All Pages automatically rotates all pages in a document by the specified rotation value as the
documents are scanned.
Even Pages Only automatically rotates only the even numbered pages in a document by the specified
rotation value as the documents are scanned.
Odd Pages Only automatically rotates only the odd numbered pages in a document by the specified
rotation value as the documents are scanned.
PaperVision® Capture Administration Guide
70
Chapter 5 Capture Step
l
l
l
l
Even Pages/Odd Pages automatically rotates the odd and even numbered pages in a document by the
specified rotation values as the documents are scanned. Even pages and odd pages can be assigned
different rotation values.
First Page Only automatically rotates the first page of a document by the specified rotation value as
the documents are scanned.
All Pages Except First automatically rotates all pages except the first page of a document by the
specified rotation value as the documents are scanned.
First Page Only/All Page Except First automatically rotates the first page of a document by the
specified rotation value as the documents are scanned. The remaining pages can be assigned a
different rotation value.
3. Select the rotation value from the list that appears under the value you selected in the previous step. You
can select from 90°, 180°, or 270°.
4. Click OK.
Black and White Image File Type
You can specify the file type for storing black and white images. Click Black and White Image File Type,
and then click the down arrow to select the file type. You can select TIF or PNG. The default setting is TIF.
l
l
TIF files are compressed using Group 4 compression which treats an image as a series of horizontal black
strips on a white page.
PNG files are compressed, so the file size is smaller. However, the applied compression is exactly
reversible, so the image is recovered exactly.
If you change this property after images are scanned or imported into the batch, the file type will change for
only those images subsequently added to the batch. For example, if you change the Black and White Image
File Type property setting from TIF to PNG after scanning or importing 10 out of 20 images in the batch, then
images 1-10 will be TIF file types, and images 11-20 will be PNG file types.
Color Image File Type
You can specify the file type for storing grayscale and color images. Click Color Image File Type, and then
click the down arrow to select the file type. You can select BMP, JPG, or PNG. The default setting is JPG.
l
l
l
BMP files are not compressed and can be large. These files contain pixels and can degrade when you
increase resolution.
JPG images are compressed, so they contain less data and smaller file sizes than other image types.
PNG files are compressed, so the file size is smaller. However, the applied compression is exactly
reversible, so the image is recovered exactly.
If you change this property after images are scanned or imported into the batch, the file type will change for only
those images subsequently added to the batch. For example, if you change the Color Image File Type property
from BMP to JPG after scanning or importing 10 out of 20 images in the batch, then images 1-10 will be .BMP file
types, and images 11-20 will be JPG file types.
PaperVision® Capture Administration Guide
71
Chapter 5 Capture Step
Display Saved Images Only
If you select True, PaperVision Capture only displays the images that are saved (in the manner that they are
being saved). For example, if images are rotated as they are scanned, only the correct rotation orientation will
display. If you select True and you have specified a minimum page size detection, blank pages will not
display. If you select False, all images will display, including blank images.
Max Number Documents Per Batch
You can limit the number of documents that comprise a batch. In the Max Number Documents Per Batch
box, enter the maximum number of documents that are added to a batch before it is considered complete.
This setting applies to typical scanning and importing operations in the PaperVision Capture Operator
Console. For example, as users scan documents, when the number of documents you specify here is
reached, a new batch is created. However, this setting is ignored when documents are manually added in the
PaperVision Capture Operator Console by selecting Add Document from the Edit menu. In this case, the
new document is added to the end of the batch, regardless of the number of documents.
Minimum Page Size
Blank pages can be scanned accidentally or as the blank side of a duplex page. The Minimum Page Size
Detection setting allows you to delete blank pages as they are scanned. In the Minimum Page Size field,
enter the minimum page size detection (in Kilobytes) to be deleted. You can enter the size in whole numbers
with up to two decimal places.
NOTE: Deleting blank pages as they are scanned could make the Number of Pages Per Document Auto
Document Break setting unusable.
New Batch Name (Regular Expression)
The New Batch Name is a regular expression that you can define that validates the batch name entered by
the operator in the PaperVision Capture Operator Console.
To assign a regular expression to batch names
1. Click the ellipsis button in the right column next to the New Batch Name field.
2. In the Regular Expression dialog box, enter the regular expression.
3. Enter the text to validate. Your entry will automatically be validated.
l
A successful validation displays with
l
Invalid entries display with
.
.
Prompt for New Batch Information (Auto)
If you enable this setting, the operator will be prompted for batch information once the maximum number of
documents per batch has been reached when a batch is imported or scanned.
PaperVision® Capture Administration Guide
72
Chapter 5 Capture Step
Rotate Before Barcode
If you enable this setting, the Auto Page Rotation setting is applied to the image before barcoding is
performed to read index values.
NOTE: This setting does not apply to the Auto Document Break setting; images are not rotated before
barcode document breaks are inserted.
Custom Code Events (Step Level)
You can configure custom code that operators can execute in the PaperVision Capture Operator Console.
Click the ellipsis button next to the appropriate event to select the scripting language and to configure the
custom code. Some events contain code-handling arguments that you can modify; these arguments define
what actions are triggered after an operator executes the custom code (see the Custom Code Configuration
topic's section on Digitech Systems' API for more information).
Add Page
Add Page executes custom code just before images are appended to the batch, including rotation or barcode
indexing. When the script is enabled for this option, it will be executed for all images that the operator scans
in or when the operator imports a batch. This script is not executed if the operator performs the Import Images
command.
Barcode Detected
The Barcode Detected event executes custom code after a barcode's value, location, size, orientation, and
type have been successfully read during scanning. When a script is enabled for this option, it will be executed
every time a barcode is successfully read during scanning (multiple barcodes can be read per page). This
event can also be used to apply a page-level custom tag. The script is not executed if a barcode cannot be
successfully read.
Batch Opened
Batch Opened executes custom code when the operator opens a batch in the Operator Console. The
following sample is a custom code event handler that can be inserted into the code to display a message box,
allowing the user to cancel the open batch operation:
CCustomCodeBatchOpeningEventArgs eventArgs
= (CCustomCodeBatchOpeningEventArgs)Parameter;
if (MessageBox.Show("Open Batch?", "Capture", MessageBoxButtons.OKCancel, MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelOpen = true;
}
NOTE: The Batch Opened event will not execute if you have enabled the Max Documents per Batch
property and the user completes the Submit and Create New Batch operation.
PaperVision® Capture Administration Guide
73
Chapter 5 Capture Step
Batch Submitted
Batch Submitted executes custom code when the operator submits a batch in the Operator Console. The
following sample is a custom code event handler that can be inserted into the code to display a message box,
allowing the operator to cancel the submit batch operation:
CCustomCodeBatchSubmittingEventArgs eventArgs
=(CCustomCodeBatchSubmittingEventArgs)Parameter;
if (MessageBox.Show("Submit Batch?", "Capture", MessageBoxButtons.OKCancel,
MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelSubmit = true;
}
Custom Code Execution
Custom Code Execution executes when the operator clicks the Execute Custom Code button in the
PaperVision Capture Operator Console.
Match and Merge
Match and Merge executes when the operator clicks the Match and Merge button in the PaperVision Capture
Operator Console.
Saving Indexes
Saving Indexes executes prior to the operator saving the index values in the PaperVision Capture Operator
Console.
To prevent the programming language prompt from appearing each time you configure custom
code events, right-click the ellipsis button, and select Custom Code Options. Select either the C# or
Visual Basic programming language to use by default, and then choose the option to suppress the dialog
when creating new custom code.
Update Detail Sets on Save
When set to True (the default value), changes to detail sets that occur in the PaperVision Capture Operator
Console are retained when indexes are saved. When set to False, detail fields can be edited in the
PaperVision Capture Operator Console, but the changes are not saved. If index fields only are available for
editing in the PaperVision Operator Console, and detail sets exist but are not available for editing, you must
set this property to False so that the detail fields are not overwritten with blank values when indexes are
saved.
Indexes
You can configure index values in the Capture step if you enable the option, Allow Hand-Key Indexing. For
information on the general indexing settings, see the Indexing Configuration topic.
PaperVision® Capture Administration Guide
74
Chapter 5 Capture Step
Allow Hand-Key Indexing
To maximize scanning and indexing efficiency within one step, you can enable this setting to allow operators
to enter index values while they scan documents in the Capture step. If you enable this setting, you must
define at least one index field.
NOTE: Enabling this property will cause the Capture step to also consume a Capture Index license (in
addition to the Capture Scan license).
Manual Barcode and OCR Indexing
You can configure the Capture and Indexing steps so that scanning operators tasked with indexing can apply
barcode or OCR zones directly on images in order to populate index fields. By manually applying barcode or
OCR zones, operators can easily extract and index text or barcode data that may shift across pages and
documents. When you enable the Allow Barcode Indexing property, a Capture Barcode license (1D or 2D,
depending on the selected barcode type) is also required in addition to the Capture Scan or Capture Indexing
license. Similarly, when you enable the Allow OCR Indexing property, a Capture OCR license is also
required in addition to the Capture Scan or Capture Indexing license.
During configuration, it is only required to draw one barcode or OCR zone to define the applicable properties.
Operators are only restricted to the properties you define for the zone, such as supported barcode types and
OCR recognition languages, but they can apply an infinite number of zones on an image. Similar to the
configuration of the automated barcode and OCR steps, you can test the zone to ensure its contents can be
read successfully. For more information on manual barcode and OCR configuration, see the Manual Barcode
and OCR Indexing topic.
Manual QC
If you require Indexing operators to review and apply QC tags in the Indexing step, the following Manual QC
properties are available for configuration.
Allow Manual QC
You can enable this setting to allow operators to add your selected QC tags within the Indexing job step.
NOTE: When you enable this property, the Indexing step also consumes a Capture QC Manual license
(in addition to the Capture Index license).
Allow Review QC Tags
Applicable to manual job steps, this property allows the operator to view the Browse QC Tags window in the
PaperVision Capture Operator Console. Select True to allow the operator to view the Browse QC Tags
window. Select False to prevent the operator from viewing the Browse QC Tags window.
NOTE: The Capture QC Manual license is not required for the operator to review QC tags.
PaperVision® Capture Administration Guide
75
Chapter 5 Capture Step
QC Auto Play
When the Allow Manual QC property is enabled in the Capture step, you can define how long (in
milliseconds) each image appears on screen so operators can perform visual inspections. Click the ellipsis
button next to the QC Auto Play field to configure the auto play settings.
l
l
The Delay (ms) property determines how long (in milliseconds) each image or group of images remains on
screen at a time in the Manual QC step.
The Skip Mode determines whether auto play skips batches or documents:
1. If you select the Batch skip mode, then you can define how pages are skipped. For page skipping, you can
require that operators inspect all pages (None), by page number (Number, such as 1, 5, 10, etc.), or by a
random number of pages (Random).
2. If you select the Document skip mode, you can define how documents and pages are skipped.
l
l
For document skipping, you can require that operators inspect all documents (None), by document
number (Number, such as 1, 5, 10, etc.), or by a random number of documents (Random).
For page skipping, you can require that operators inspect all pages (None), by page number (Number,
such as 1, 5, 10, etc.), or by a random number of pages (Random).
When you select the Random option, auto play skips an arbitrary number of pages or documents (between
zero and your assigned number). For example, if you enter “10,” then three pages/documents may be skipped
during the first auto play; nine pages/documents during the second auto play; ten pages/documents during
the third auto play; etc.
Operator Permissions
By default, operators can perform most document and page operations while scanning in the Capture step.
You can determine whether operators can import batches and images in the Capture step. In addition, you
can determine whether operators can view the Browse Batch window in the Operator Console.
Browse Batch
When set to True, the operator can view the Browse Batch window.
Import Batch
When set to True, the operator can import batches into the PaperVision Capture Operator Console.
Import Images
When set to True, the operator can import images into a document.
NOTE: When you enable this property, the Indexing step also consumes a Capture Scan license (in
addition to the Capture Index license).
PaperVision® Capture Administration Guide
76
Chapter 5 Capture Step
Scanner Requirements
You can assign specific scanner requirements for a Capture step including color format, minimum and
maximum DPI, and scan type settings. As a result, your specified requirements will be enforced in the
Operator Console’s scanner settings and the operator will not be able to edit these requirements.
NOTE: Some settings may not be available for your scanner. If you select an unavailable option, the
property will become disabled and an error will be logged in the Windows Event Viewer.
Color Format
You can select the scanner’s color format requirements, such as true color, grayscale, and black and white.
To select the color format
1. Click the ellipsis button next to the Color Format field.
2. In the Select Required Color Format Options dialog box, select the appropriate options from the list, and
then click OK.
Vertical and Horizontal Resolution
You can assign the minimum and maximum vertical and horizontal resolution settings for the scanner, such
as 200 DPI, 1200 DPI, etc. As a result, the operator will not be able to assign a value above or below your
specified values.
Scan Type
You can select the scan type, such as duplex, back-only, front-only, and others. The available scan types
include the following:
l
Transparency
l
Flatbed
l
Front-Only
l
Duplex
l
Back-Front
l
Back-Only
Manual Barcode and OCR Indexing
You can configure the Capture and Indexing steps so that indexing operators (or scanning operators tasked
with indexing) can apply barcode or OCR zones directly on images to populate index fields. By manually
applying barcode or OCR zones, operators can easily extract and index text or barcode data that may shift
across pages and documents. When you enable the Allow Barcode Indexing property, a Capture Barcode (1D
or 2D, depending on the selected barcode type) is also required in addition to the Capture Scan or Capture
Indexing license. Similarly, when you enable the Allow OCR Indexing property, a Capture Nuance Zonal
OCR, Nuance OCR Handwriting (depending on selected Recognition Module), or Capture Open Text Zonal
OCR license is also required in addition to the Capture Scan or Capture Indexing license.
PaperVision® Capture Administration Guide
77
Chapter 5 Capture Step
During configuration, it is only required to draw one barcode or OCR zone to define the applicable properties.
Operators are only restricted to the properties you define for the zone, such as supported barcode types and
OCR recognition languages, but they can apply an infinite number of zones on an image. Similar to the
configuration of the automated barcode and OCR steps, you can test the zone to ensure its contents can be
read successfully.
Configuring Manual Barcode Indexing
When you enable manual barcode indexing, the operator can apply barcode zones on an image to populate
required index values. During configuration, it is only required to draw one barcode zone to define the
applicable properties. Similar to the automated Barcode step, you can test the zone to ensure barcodes can
be read successfully prior to activating and checking in the job.
To configure manual barcode indexing in the Capture or Indexing step
1. On the Properties tab, expand Manual Barcode Indexing.
2. Select True in the Allow Barcode Indexing drop-down list.
3. Click the ellipsis button in the Barcode Indexing field. The Configure Manual Barcode Indexing screen
appears.
Configure Manual Barcode Indexing
PaperVision® Capture Administration Guide
78
Chapter 5 Capture Step
4. Draw the zone, and then configure the applicable barcode zone properties (see the Barcode Zones topic for
details on each property).
5. Click the Save Barcode Zones
icon.
NOTE: For more information on the available operations in the Configure Manual Barcode Indexing
screen, see the Barcode Zones topic.
Configuring Manual OCR Indexing
When you enable manual OCR indexing, the operator can apply OCR zones on an image to populate required
index values. During configuration, only one OCR zone is required to define the applicable properties. Similar
to the automated OCR step, you can test the zone to ensure that text is read successfully prior to activating
and checking in the job.
To configure manual OCR indexing in either the Capture or Indexing step
1. On the Properties tab, expand Manual OCR Indexing.
2. Select the zonal OCR engine from the Engine drop-down list.
3. Click the ellipsis button in the OCR Indexing field. On the Configure Manual OCR Indexing window,
properties specific to your engine selection are available for configuration.
Configure Manual OCR Indexing (Nuance Zonal OCR)
PaperVision® Capture Administration Guide
79
Chapter 5 Capture Step
4. Draw the zone, and then configure the applicable OCR properties.
5. Click the Save OCR Zones
icon.
NOTE: For more information on the available operations in the Configure Manual OCR Indexing
screen, see the Zonal OCR topic.
PaperVision® Capture Administration Guide
80
Chapter 6 Indexing Configuration
The Indexing job step allows you to customize PaperVision Capture to meet the indexing needs of any task.
Configuration properties (such as predefined index values, auto-carry/auto-increment, and detail sets) for the
Indexing job step are designed to enhance productivity in the PaperVision Capture Operator Console. You
can configure additional properties to monitor and verify operator indexing entries, such as blind index
verification, regular expressions, and re-key verification. The Allow Manual QC property enables operators to
add your selected QC tags while they hand-key index values in the Operator Console. Index zones that can
be configured in the Indexing job step will help you define areas on the image that will be zoomed into view
when operators hand-key index values. When you configure individual indexes, four categories of settings are
available, including Custom Code Events (Step Level), General (Job Level), General (Step Level), and
Predefined Index Values (Job Level).
NOTE: Enabling the Allow Manual QC property will cause the Indexing step to use a Manual QC
license (in addition to the Capture Index license).
This content describes how to configure indexing properties. See "Chapter 4 Job Creation and Configuration"
on page 46 for information about general job set up and the properties that apply to all job steps.
Configuring an Indexing Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. If necessary, add the Indexing job step to the workspace using one of the following methods.
l
Select the job step that you want Indexing to follow. On the Job Step Toolbox tab, double-click
Indexing.
l
On the Job Step Toolbox tab, drag Indexing
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select Indexing.
7. Double-click the Indexing step to display the Properties tab on the left pane.
8. On the Properties tab, you can expand a category, and then click the property you want to set. If
or
appears in the column next to the selected property, click the arrow or ellipses button to access available
options. The properties you can set are described in the subsequent content.
PaperVision® Capture Administration Guide
81
Chapter 6 Indexing Configuration
To clear a field configured by clicking the ellipsis button, right-click the button, and then click
Reset.
Viewing the Index Configuration Settings
1. On the workspace of the Job Definitions window, select the Indexing job step. (See "Configuring an
Indexing Step" on page 81 if you need help.)
2. On the Properties tab, expand Indexes.
3. Click the Indexes property, and then click the ellipsis button
Configuration dialog box.
in the right column to access the Index
4. In the Indexes list, click the index to view its properties.
Adding, Removing, and Sorting Indexes
You can add an individual or existing index, all indexes (including or excluding those defined in detail fields),
or a job detail set.
To add an index
1. On the workspace of the Job Definitions window, select the Indexing job step. (See "Configuring an
Indexing Step" on page 81 if you need help.)
2. On the Properties tab, expand Indexes.
3. Click the Indexes property, and then click the ellipsis button
Configuration dialog box.
in the right column to access the Index
4. Click Add.
5. In the Add Index dialog box, do one of the following:
l
l
To add a new index, select New Index, and then type its field name in the box.
To add an existing index, select Existing Index. From the Field Name list, you can select an individual
index or all indexes (including or excluding those defined in detail fields).
NOTE: If you are using Forms Magic, the Field Name list includes two “system fields” that are used for
Forms Magic. The Classification [SYSTEM] field will populate the Classification value as defined in
Forms Magic. The Content Type [SYSTEM] field will populate the Content Type as defined in Forms
Magic. These “system fields” are populated only after Forms Magic processes a batch using the FM
Processing step. See "Chapter 15 Forms Magic Processing" on page 383 for information about setting
up this step.
l
To add a new detail set for the job, select Job Detail Set. You can then create and configure each index
comprising the detail set. See "Configuring Detail Sets" on page 59 for more information.
PaperVision® Capture Administration Guide
82
Chapter 6 Indexing Configuration
6. Click OK. The Index Configuration dialog box displays your new index along with its associated
properties that you can configure.
To remove an existing index
1. Select the appropriate index in the Indexes list.
2. Click Remove.
To move an index up or down in the Indexes list
l
Select the index, and then click the up
or down
arrow to the right of the list.
Indexing Properties
To view the properties by category, click the Categorized
order, click the Alphabetical
button. To view the properties in alphabetical
button.
Index properties consist of four categories that are summarized below.
Custom Code Events (Step Level)
In the Properties grid for the Indexing job step, the Index Populated and the Index Validate Events allow
you to select either Visual Basic or C# code to configure an action triggered immediately after an index field is
populated (and the operator returns to re-enter the index value) or validated by the system. The Index Validate
event is triggered after the operator returns to edit an index value, re-enters the index value, and then
proceeds to a subsequent index field (or saves the edited index value).
If you use either of these Custom Code Events to change an index value, the Operator Console's Index
Manager will remain synchronized using the UIRefreshLevel property (for example, "base.UIRefreshLevel
=UIRefreshLevel.Index"). See the section on API Functions in the Custom Code Configuration topic for a
list of API functions and associated enumerations that can be used within Custom Code.
To configure the code
1. Click the ellipsis button in the right column of the Index Populated or Index Validate field.
To prevent the programming language prompt from appearing each time you configure custom
code events, right-click the ellipsis button, and select Custom Code Options. Select either the C# or
Visual Basic programming language to use by default, and then choose the option to suppress the dialog
when creating new custom code.
2. Select either Visual Basic or C# scripting language, and the Script Editor opens.
PaperVision® Capture Administration Guide
83
Chapter 6 Indexing Configuration
Batch Opened - Batch Opened executes custom code when the operator opens a batch in the Operator
Console. The following sample is a custom code event handler that can be inserted into the code to display a
message box, allowing the user to cancel the open batch operation:
CCustomCodeBatchOpeningEventArgs eventArgs
= (CCustomCodeBatchOpeningEventArgs)Parameter;
if (MessageBox.Show("Open Batch?", "Capture", MessageBoxButtons.OKCancel,
MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelOpen = true;
}
NOTE: The Batch Opened event will not execute if you have enabled the Max Documents per Batch
property and the user completes the Submit and Create New Batch operation.
Batch Submitted - Batch Submitted executes custom code when the operator submits a batch in the
Operator Console. The following sample is a custom code event handler that can be inserted into the code to
display a message box, allowing the operator to cancel the submit batch operation:
CCustomCodeBatchSubmittingEventArgs eventArgs
=(CCustomCodeBatchSubmittingEventArgs)Parameter;
if (MessageBox.Show("Submit Batch?", "Capture", MessageBoxButtons.OKCancel,
MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelSubmit = true;
}
Custom Code Execution - Custom Code Execution executes when the operator clicks the Execute Custom
Code button in the PaperVision Capture Operator Console.
Match and Merge - Match and Merge executes when the operator clicks the Match and Merge button in the
PaperVision Capture Operator Console.
Saving Indexes - Saving Indexes executes prior to the operator saving the index values in the PaperVision
Capture Operator Console.
General (Job Level) - The General (Job Level) settings allow you to configure auto-carry and auto-increment
values, index types, and regular expressions. The settings allow you to configure a job's indexing settings for
users who will index documents within the PaperVision Capture Operator Console.
General (Step Level) - The General (Step Level) settings for each index value enable you to configure
settings for operators who will index documents within the PaperVision Capture Operator Console.
Predefined Index Values (Job Level) - The Predefined Index Values (Job Level) settings allow you to
define index field values that can be used for the Auto-Complete feature that finishes information as operators
type in the PaperVision Capture Operator Console.
Manual Barcode/OCR Indexing - If you are requiring Indexing operators to manually apply Barcode and/or
OCR zones in order to populate index fields, see the Manual Barcode and OCR Indexing topic.
PaperVision® Capture Administration Guide
84
Chapter 6 Indexing Configuration
Manual QC - If you require Indexing operators to review and apply QC tags in the Indexing step, the
following Manual QC properties are available for configuration. For more information, see the Manual Quality
Control (QC) topic for more information.
Allow Manual QC - You can enable this setting to allow operators to add your selected QC tags within the
Indexing job step.
NOTE: When you enable this property, the Indexing step also consumes a Capture QC Manual license
(in addition to the Capture Index license).
Allow Review QC Tags - Applicable to manual job steps, this property allows you to choose whether the
operator can view the Browse QC Tags window in the PaperVision Capture Operator Console.
l
Select True to allow the operator to view the Browse QC Tags window.
l
Select False to prevent the operator from viewing the Browse QC Tags window.
QC Auto Play - When the Allow Manual QC property is enabled in the Indexing step, you can define how
long (in milliseconds) each image appears on screen so operators can perform visual inspections. Click the
ellipsis button on the right to configure the auto play settings.
l
l
The Delay (ms) property determines how (in milliseconds) long each image or group of images remains on
screen at a time in the Manual QC step.
The Skip Mode determines whether auto play skips batches or documents:
1. If you select the Batch skip mode, then you can define how pages are skipped. For page skipping, you can
require that operators inspect all pages (None), by page number (Number, such as 1, 5, 10, etc.), or by a
random number of pages (Random).
2. If you select the Document skip mode, you can define how documents and pages are skipped.
l
l
For document skipping, you can require that operators inspect all documents (None), by document
number (Number, such as 1, 5, 10, etc.), or by a random number of documents (Random).
For page skipping, you can require that operators inspect all pages (None), by page number (Number,
such as 1, 5, 10, etc.), or by a random number of pages (Random).
When you select the Random option, auto play skips an arbitrary number of pages or documents (between
zero and your assigned number). For example, if you enter “10,” then three pages/documents may be skipped
during the first auto play; nine pages/documents during the second auto play; ten pages/documents during
the third auto play; etc.
Operator Permissions
You can assign specific permissions that allow operators to perform operations on documents and pages. In
addition, you can determine whether operators can view the Browse Batch window in the Operator Console.
The Import Images operation is the only operation that requires an additional Capture Scan license (in
addition to the Capture Index license). The remaining permissions do not require an additional license and are
enabled by default to provide operators the flexibility in manipulating documents and pages when indexing in
the Operator Console.
Add Documents - When set to True, the operator can append a blank document to the end of the batch.
Allow Browse Batch - When set to True, the operator can view the Browse Batch window.
PaperVision® Capture Administration Guide
85
Chapter 6 Indexing Configuration
Copy Documents - When set to True, the operator can copy all pages and append the new document after
the selected document.
Copy/Move Pages - When set to True, the operator can copy/paste and cut/paste consecutive or nonconsecutive pages in one document or across multiple documents. The operator can also drag and drop
pages from one location to another in the Thumbnails window or multiple-display view.
Delete Documents - When set to True, the operator can delete a document and its associated images.
Delete Pages - When set to True, the operator can delete one or multiple page(s) within one document or
across multiple documents.
Extract and Copy Regions - When set to True, the operator can extract a region of an image and copy it to
the next page of the document.
Import Images - When set to True, the operator can import images into a document.
NOTE: When you enable this property, the Indexing step also consumes a Capture Scan license (in
addition to the Capture Index license).
Insert Document Breaks - When set to True, the operator can insert a document break within a document.
Invert and Save Pages - When set to True, the operator can invert the polarity of one or multiple pages, and
then save the pages.
Remove Document Breaks - When set to True, the operator can remove an existing document break within
a document.
Re-Save Pages - When set to True, the operator can save a page that has been rotated or whose polarity has
been inverted.
Rotate and Save Pages - When set to True, the operator can rotate one or multiple pages and then save the
pages.
Shuffle Documents to Duplex - When set to True, the operator can shuffle documents to duplex.
Index Configuration - General (Job Level)
These settings allow you to configure auto-carry and auto-increment values, index types, and regular
expressions. To access these settings, expand the General (Job Level) node within the Index
Configuration dialog box.
Auto-Carry/Auto-Increment
The Auto-Carry/Auto-Increment settings can greatly increase operator productivity while hand-keying
repetitive or incremental values or characters. Both tools operate during scanning (optional) and hand-keying.
To configure these settings, click the ellipsis button in the Auto-Carry/Auto-Increment field.
PaperVision® Capture Administration Guide
86
Chapter 6 Indexing Configuration
Auto-Carry/Auto-Increment
NOTE: Auto-Carry settings only apply when the operator saves index values in the Operator Console.
Auto-Carry Entire Index Value - This setting allows you to carry all characters from an index in one
document to the corresponding index in the next document. You can then enable Overwrite Existing Values
and/or Carry Values to Copied Document.
Auto-Carry Characters Preceding Number - This setting allows you to define the number of characters
that precede a number. Your specified number of characters will carry from an index in one document to the
corresponding index in the next document. For example, if you have an index that is always (or nearly always)
the letters ABC followed by a number, you may not want to continuously re-enter ABC on each index value.
You could set the number of characters to carry to 3. When the operator is keying the information, ABC would
automatically get carried forward to the next document and they would only have to enter the numeric portion
of the index.
Auto-Carry Characters Following Number - This setting allows you to define the number of characters
that follow a number. Your specified number of characters will carry from an index in one document to the
corresponding index in the next document. For example, if you have an index that is always (or nearly always)
a number followed by the letters ABC, you may not want to continuously re-enter ABC on each index value.
You could set the number of characters to carry to 3. When the operator is keying the information, ABC would
automatically get carried forward to the next document and they would only have to enter the numeric portion
of the index.
Auto-Increment Number - Auto-Increment takes Auto-Carry one step further. For example, if the numeric
portion of the value was an incremental numeric value, you could set Auto-Carry to 3 and Auto-Increment to
1. This would increment the numeric value of any characters remaining after the first three characters by a
value of one.
PaperVision® Capture Administration Guide
87
Chapter 6 Indexing Configuration
l
The Auto-Increment Number can also be used without Auto-Carry if the value is completely numeric.
l
The value entered in the Minimum Number Digits field allows you to pad the new value with zeros.
l
The Preview section displays the original value and displays a preview of the carried value.
Overwrite Existing Values - By default, Auto-Carry and Auto-Increment do not fill in an index value if there
is already information in the index. Selecting this check box will force Auto-Carry and Auto-Increment to
update the index regardless of whether information previously existed.
Carry Values to Copied Document - By default, when documents are copied, no index values are carried
through to the copies. This allows you to specify that the current index should also be copied, leaving the
other indices blank.
Auto-Fill Cursor Location - If you enable this setting, operators are allowed to append to an existing index
value. The setting places the cursor's focus at the end of the original index value so the original value is
retained.
NOTE: This determines whether data will be highlighted or the cursor will be placed at the end of the data
when hand-keying an index that has the Auto-Carry or Auto-Fill option selected.
Index Masking Regular Expression
The Index Masking Regular Expression property lets you predefine a specific format for index values
entered during hand-key indexing. As operators enter index values, their entries will be formatted (masked)
automatically. For example, you can predefine social security numbers to automatically insert dashes; as a
result, operators only have to hand-key the 9-digit social security numbers and not the dashes.
Configuring this property does not validate the operator's index value entries. Validation is
performed as operators enter index values in the Index Manager in the PaperVision Capture Operator
Console.
To configure index masking
1. In the Index Configuration dialog box, select the appropriate index, and then expand General (Job
Level).
2. Click the Index Masking Regular Expression property, and then click the ellipsis button to open the
Regular Expression Mask dialog box.
3. If you select a Predefined Value, select from the Masking drop-down list, and then proceed to step 6.
4. If you select a Custom mask, enter the Pattern Expression. The Pattern Expression is a regular
expression that you define for the index mask. For example, for 5 + 4-digit zip codes such as 80111-2841,
type the following:
(\d{5})(\d{4})
PaperVision® Capture Administration Guide
88
Chapter 6 Indexing Configuration
5. If necessary, you can define a Replace Expression that will automatically format the operator’s entry. To
format an operator’s 9-digit entry to appear as 80111-2841, type the following:
$1-$2
NOTE: If you do not define a Replace Expression, the operator’s entry will not be formatted.
6. To preview how masking formats the number, enter a sample index value that an operator would hand-key
in the Input Text field. The resulting masked index value appears in the Mask Result field.
7. Click OK.
NOTE: Only the Text, Long Text, and Text (900) index types apply to the Index Masking Regular
Expression property.
Date Regular Expression Mask
The following pattern expression formats either a one- or two-digit month and day followed by a two- or fourdigit year:
(^\d{1,2})(\d{1,2})(\d{2,4}$)
Enter the following replace expression to separate the month, day, and year with a dash:
$1-$2-$3
To separate the month, day, and year with a slash mark, enter:
$1/$2/$3
The same pattern expression formats a one-digit month and day followed by a two-digit year.
Credit Card Regular Expression Mask
The following pattern expression formats a 16-digit credit card number:
(\d{4})(\d{4})(\d{4}$)(\d{4})
Enter the following replace expression to separate the digits with a dash:
$1-$2-$3-$4
Index Types and Formats
Document index fields contain values that enable you to identify key elements of documents within a project
during the capture process. For more information, see Index Types and Formats.
Index Verification Regular Expression
You can create a regular expression to validate operator data entry. A regular expression is a pattern of text
that consists of ordinary characters (for example, letters A through Z) and special characters, known as
metacharacters. The pattern describes one or more strings to match when searching a body of text. The
regular expression serves as a template for matching a character pattern to the string being searched.
PaperVision® Capture Administration Guide
89
Chapter 6 Indexing Configuration
Name
This editable field contains the name of the index value.
Index Configuration - General (Step Level)
The General (Step Level) settings enable you to configure settings for operators who will index documents
within the PaperVision Capture Operator Console. To access these settings, expand the General (Step
Level) node within the Index Configuration dialog box.
Blind Index Verification
If you enable this setting, configure at least two Indexing job steps. This setting ensures the index entry of the
first operator matches the second entry (or your specified number of subsequent index entries).
For example, you assign the following for index field SSN:
1. For the first Indexing step, you select False.
2. You assign True for the second Indexing step.
3. You assign User 1 to the first Indexing step.
4. You assign User 2 to the second Indexing step.
5. User 1 enters 1 in the field and submits the batch.
6. User 2 enters 2 in the field, which differs from the first entry.
l
Since Blind Index Verification has been enabled for the second Indexing step, the original index value
for this field is not visible for User 2.
NOTE: Blind index verification is not applicable during Detail Set configuration.
l
An error message notifies User 2 that the index values do not match.
Font Color Customization
You can customize the font characteristics to modify how each index value and label displays in the Operator
Console. You can also change the cell color for each index value to emphasize certain index values and
assist operators who are visually challenged.
To customize the font and cell color
1. Expand the FontColor/Customization node.
2. By default, each background cell color is white. To select another color, click the BackgroundColor dropdown list.
3. To change the label font for the index value, expand the Label node.
4. Click the ellipsis button next to the Label property. You can configure the following font properties in the
Font dialog box or in the Index Configuration dialog box:
PaperVision® Capture Administration Guide
90
Chapter 6 Indexing Configuration
l
Font or Name: This property indicates the name of the font, such as Microsoft Sans Serif (default), Arial,
Times New Roman, etc.
l
Font Style: The font style defaults to Regular, but you can select from Italic, Bold, or Bold Italic.
l
Size: The font size defaults to 8 point, but you can select a larger font size.
l
Effects: To emphasize the font, you can enable the Strikeout and/or the Underline effect.
l
l
l
l
Unit: This is the unit of measurement for the font size, which defaults to Point. Not all units are available
for all fonts.
Bold: This property is False by default and indicates whether boldface type has been applied to the font.
Script: Western script is selected by default, but you can select other scripts such as Arabic, Baltic,
Greek, Vietnamese, etc.
GDICharSet: Depending on the selected font, this byte value specifies the GDI character set that the
font uses.
l
GDIVerticalfont: This property indicates whether the selected font originates from a GDI vertical font.
l
Italic: This property is false by default and indicates whether the font is italic.
l
l
Strikeout: This property is false by default and indicates whether the font displays with a horizontal line
running through it.
Underline: This property is false by default and indicates whether the font is underlined.
5. To change the font appearance of the operator’s index value entry, expand the Value Font node. See the
previous step for descriptions of each customizable property.
6. After you have finished configuring the font characteristics, click OK.
Hot Key Default Value
As operators are keying in index fields and press the assigned hot key, the specified default value will
populate the index field.
Ignore Indexing Errors
If this setting is True, incorrect operator input will be ignored and no prompt will appear for the operator. If this
setting is False, the operator will be notified of an incorrect indexing entry.
No Hand Key Indexing
If this setting is True, the operator will not be allowed to enter index values. If this setting is False, the
operator will be allowed to enter index values.
Re-Key Verification Count
To ensure indexing accuracy, this value forces the operator to enter the index value a specified number of
times, which can range from 0 to 99.
PaperVision® Capture Administration Guide
91
Chapter 6 Indexing Configuration
Valid Field Required
If this setting is True, the operator will be required to enter a valid index value for the field type, such as a
date-formatted value for a date field. If this setting is False, the operator will be allowed to continue and keep
the invalid value.
Verification Search Strings
The Verification Search Strings setting is used to validate index values when the operator saves index
values, tabs to the next field, submits the batch, or executes the Verify Index Values operation. To ensure the
accuracy of hand-key indexing, you can define multiple search strings that can be verified when the operator
executes the Verify Index Values command. For example, you can assign individual characters or numbers to
search for during the index verification process. By default, the verification process will highlight the first
document in the batch that contains a blank value. However, you can exclude blank values from the index
verification process by removing <Blank> from the list of search strings.
Depending on the operator’s index verification settings in Tools > Options > Display Preferences (Verify
Starts from Current Document Forward or Verify Starts at the Beginning of the Batch), the index verification
process starts with the appropriate document in the batch and will highlight the next document that contains
your defined search strings.
To assign verification search strings
1. For the appropriate index, click the ellipsis button to the right of the Verification Search Strings field. The
Verification Search Strings dialog box appears.
2. By default, a <blank> search string appears in the first row. If applicable, enter another search string in the
second row.
3. Enter any subsequent search strings, if necessary.
4. To remove a search string, select it, and then click Remove
.
5. Click OK.
Zoom Zone
This setting allows you to assign an area of the image that will be zoomed into view when operators hand-key
this index field.
If the Automatic Page Location setting is enabled, you can specify the page of the document that is
displayed when index values are entered, which is useful if index values are located on different pages of the
document. This value has to be greater than zero. If you enter a page index value greater than the number of
pages in the document, the last page will display.
NOTE: Details on how to draw index zones are found in Index Zone Configuration.
PaperVision® Capture Administration Guide
92
Chapter 6 Indexing Configuration
Predefined Index Values (Job Level)
These settings allow you to predefine index field values at the job level. You can predefine these values for
the job as you configure the index field or you can allow operators' entries to be added to the predefined
values list. Your specified predefined values are used for the Auto-Complete feature that finishes information
as operators type.
Add New Values
If this setting is True, all new operator-entered values can be added to the Predefined Values list.
Auto-Complete
If this setting is True, the index field will automatically be completed as the operator types.
If this setting is True, the operator can only select from your predefined index values. If the entered data is
not one of the predefined values, the operator will be alerted. If this setting is False, the operator will be
allowed to enter a value in the index field.
Assigning Predefined Values
In addition to adding predefined index values, you can also import and export the index values as text (.txt)
files.
To assign predefined values
1. Click the Predefined Values property, and then click the ellipsis button.
2. In the Predefined Values dialog box, type the values directly in the grid.
3. When you are finished entering all values, click OK.
To import a list of predefined index values
1. To import an index value, click Import
.
2. Select the text document to import.
3. Click Open. A text file is imported that contains any predefined values; each line of the text file is imported
as a separate value.
To export a list of predefined values
1. Click Export
.
2. Enter the name of the text file.
3. Click Save. A text file is exported that contains all predefined values; each line of the text file is exported as
a separate value.
PaperVision® Capture Administration Guide
93
Chapter 6 Indexing Configuration
To delete a value
1. Select the value.
2. Click Delete
.
3. Click OK.
Index Types and Formats
Document indices contain values that enable you to identify key elements of documents within a project
during the capture process. indices contain values that enable you to identify key elements of documents
during the capture process.
PaperVision Capture supports the following types of index fields:
l
Boolean stores Boolean values such as yes/no, on/off, and true/false.
l
Currency stores currency (monetary) values.
l
l
l
l
Date stores date/time values ranging from 12:00:00 midnight, January 1, 0001 through 11:59:59 P.M.,
December 31, 9999 A.D. This index type also supports searches on date ranges.
Double Number represents a double-precision 64-bit number with values ranging from -1.79769E+308 to
1.79769E+308.
Long Text stores textual data that exceeds 255 characters in length (up to approximately 64,000
characters in total).
Number stores whole-number values between -2,147,483,648 and 2,147,483,647. This index type
supports hyphens or dashes at the beginning of the number to indicate a negative value, but it does not
support hyphens or dashes within the number, such as dashes within a social security number (555-555555). This index excludes these dashes from the number.
l
Text stores textual data up to 255 characters in length. This type of index is the most common.
l
Text(900) stores textual data up to 900 characters in length.
Date/Time Formatting
When you select a date index type, you can select from a predefined date/time format or you can customize a
date/time format.
To define the date/time format
1. Click the ellipsis button in the right column of the Index Format field, which opens the Date/Time
Formatting dialog box.
2. Select either a Predefined Format (proceed to the next step) or a Custom Format (proceed to fifth step).
3. If you select a Predefined Format, select from the following Date/Time Order options:
l
Date Only
l
Time Only
PaperVision® Capture Administration Guide
94
Chapter 6 Indexing Configuration
l
Date/Time
l
Time/Date
4. Depending on your Date/Time Order selection, you can choose from the Date/Time Format drop-down
menus.
5. If you select a Custom Format, enter the format in the blank field.
6. To preview a Predefined or Custom format, click the Format button in the Preview section.
7. If you need to preview a calendar, click the Date drop-down menu.
8. If you need to set the time, enter it in the Time field or use the up or down arrows to set the time.
9. Click OK.
Double Number Formatting
When you select a Double Number index type, you can select a predefined or custom format.
To define the double number format
1. Click the ellipsis button in the right column of the Index Format field, which opens the Field Formatting
dialog box.
2. Select either a Predefined Format (proceed to the next step) or a Custom Format (proceed to the fourth
step).
3. If you select a Predefined Format, select from the following format types:
l
Currency
l
Fixed
l
General
l
Percent
l
Scientific
l
Standard
4. If you select a Custom Format, enter the format in the blank field.
5. Click OK when finished.
General [Step Level] Property Settings
On the Properties tab, expand General [Step Level] to access the Update Detail Sets on Save property.
Update Detail Sets on Save
When set to True (the default value), changes to detail sets that occur in the PaperVision Capture Operator
Console are retained when indexes are saved. When set to False, detail fields can be edited in the
PaperVision Capture Operator Console, but the changes are not saved. If index fields only are available for
PaperVision® Capture Administration Guide
95
Chapter 6 Indexing Configuration
editing in the PaperVision Operator Console, and detail sets exist but are not available for editing, you must
set this property to False so that the detail fields are not overwritten with blank values when indexes are
saved.
Configuring an Indexing Step to Include Forms Magic QC
If you are importing documents from Forms Magic, you can configure an Indexing job step to include a Forms
Magic QC option.
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job for which you want to include a Forms Magic QC option, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace, double-click the Indexing job step to display the Properties tab on the left pane.
6. On the Properties tab, expand Handkey Indexing Step, and then click Forms Magic QC.
7. Click the ellipsis button
to open the Forms Magic QC Configuration dialog box.
Forms Magic QC Configuration
PaperVision® Capture Administration Guide
96
Chapter 6 Indexing Configuration
8. To allow users to view confidence levels in the Forms Magic QC step from the PaperVision Capture
Operator Console, select Enable Forms Magic QC In Operator Console.
9. In the Forms Magic Confidence Levels area, you can define values to indicate the level of confidence
that the incoming value was read correctly by the OCR engine. Values range from zero (the lowest
confidence level) to 100 (the highest confidence level). You can use the default values provided, or you can
set your own values. If you want to reset the confidence levels to the default values, click Reset.
10. In the High box, type or select the start of the high confidence range.
11. In the Medium box, type or select the start of the medium confidence range.
12. In the Low box, type or select the start of the low confidence range.
13. (Optional) If you want to change the colors for the various ranges, click the Change link in the Color box to
open the Color dialog box where you can specify or define a color.
14. From the Next Field Level list, select the confidence level that will determine what field appears when a
user clicks Next FM Field on the Index Manager to move through index fields in the PaperVision Capture
Operator Console.Each time the user clicks Next FM Field, the next index field tagged with a confidence
level that is equal to or lower than the one you select will appear. For example, if you select Low, when the
user clicks the Next button, the operator console will go to the next index value that is tagged with a low or
poor confidence level. This allows users to quickly move to the next field that needs attention.
15. In the Detail Fields Flyout Display area, select one of the following options to specify how detail field
information will appear in the operator console.
l
Single Field - This option will show the zone for the detail field
l
Single Row - This option will show all fields on the row for the detail field.
l
All Detail Fields (Within Current Page) - This option shows all fields within a detail set per page.
16. Click OK.
Manual Barcode and OCR Indexing
You can configure the Capture and Indexing steps so that indexing operators (or scanning operators tasked
with indexing) can apply barcode or OCR zones directly on images to populate index fields. By manually
applying barcode or OCR zones, operators can easily extract and index text or barcode data that may shift
across pages and documents. When you enable the Allow Barcode Indexing property, a Capture Barcode (1D
or 2D, depending on the selected barcode type) is also required in addition to the Capture Scan or Capture
Indexing license. Similarly, when you enable the Allow OCR Indexing property, a Capture Nuance Zonal
OCR, Nuance OCR Handwriting (depending on selected Recognition Module), or Capture Open Text Zonal
OCR license is also required in addition to the Capture Scan or Capture Indexing license.
During configuration, it is only required to draw one barcode or OCR zone to define the applicable properties.
Operators are only restricted to the properties you define for the zone, such as supported barcode types and
OCR recognition languages, but they can apply an infinite number of zones on an image. Similar to the
configuration of the automated barcode and OCR steps, you can test the zone to ensure its contents can be
read successfully.
PaperVision® Capture Administration Guide
97
Chapter 6 Indexing Configuration
Configuring Manual Barcode Indexing
When you enable manual barcode indexing, the operator can apply barcode zones on an image to populate
required index values. During configuration, it is only required to draw one barcode zone to define the
applicable properties. Similar to the automated Barcode step, you can test the zone to ensure barcodes can
be read successfully prior to activating and checking in the job.
To configure manual barcode indexing in the Capture or Indexing step
1. On the Properties tab, expand Manual Barcode Indexing.
2. Select True in the Allow Barcode Indexing drop-down list.
3. Click the ellipsis button in the Barcode Indexing field. The Configure Manual Barcode Indexing screen
appears.
Configure Manual Barcode Indexing
4. Draw the zone, and then configure the applicable barcode zone properties (see the Barcode Zones topic for
details on each property).
5. Click the Save Barcode Zones
icon.
NOTE: For more information on the available operations in the Configure Manual Barcode Indexing
screen, see the Barcode Zones topic.
PaperVision® Capture Administration Guide
98
Chapter 6 Indexing Configuration
Configuring Manual OCR Indexing
When you enable manual OCR indexing, the operator can apply OCR zones on an image to populate required
index values. During configuration, only one OCR zone is required to define the applicable properties. Similar
to the automated OCR step, you can test the zone to ensure that text is read successfully prior to activating
and checking in the job.
To configure manual OCR indexing in either the Capture or Indexing step
1. On the Properties tab, expand Manual OCR Indexing.
2. Select the zonal OCR engine from the Engine drop-down list.
3. Click the ellipsis button in the OCR Indexing field. On the Configure Manual OCR Indexing window,
properties specific to your engine selection are available for configuration.
Configure Manual OCR Indexing (Nuance Zonal OCR)
4. Draw the zone, and then configure the applicable OCR properties.
5. Click the Save OCR Zones
icon.
NOTE: For more information on the available operations in the Configure Manual OCR Indexing
screen, see the Zonal OCR topic.
PaperVision® Capture Administration Guide
99
Chapter 6 Indexing Configuration
Index Zones
Index zones help you define areas on the image that will be zoomed into view when operators hand-key index
values.
To access the index zone settings
1. In the Index Configuration dialog box, expand General (Step Level) Settings.
2. Click Zoom Zone, and then click the ellipsis button in the right column to open the Index Zone dialog box.
Drawing Index Zones
To draw an Index Zone
1. In the Index Configuration dialog box, expand General (Step Level) Settings.
2. Click Zoom Zone, and then click the ellipsis button in the right column to open the Index Zone dialog box.
3. Click Draw Zone to open the Select Index Zone window. The following table describes the commands on
the Index Zone toolbar.
Select Index Zone Commands
- Scanner Setup
- Scan Image
- Open Image
- Reset Image
- Rotate Image
- Zoom In
- Zoom Out
- Zoom In Region
Move, Zoom, or Region
Allows you to set up the scanner's settings
Allows you to scan an image into the Select Index Zone screen
Enables you to select a test image from disk that will open in the
window
Reverts to the original view of the image
Rotates the image 90 degrees clockwise
Zooms in the view of the image
Zooms out the view of the image
Zooms in on the boundary of your specified region
Equips the left mouse button with the Zoom, Move or Region
command
PaperVision® Capture Administration Guide
100
Chapter 6 Indexing Configuration
l
Zoom enlarges your specified area
l
Move enables you to pan around your zoomed area
l
Region allows you to define a boundary for barcodes or OCR
regions
4. To scan a sample image, click the Scan Image
5. To open an existing image, click the Open
icon.
icon.
6. On the toolbar, select Region from the drop-down list.
7. Click the left mouse button and drag the cursor around the region.
8. If necessary, widen or narrow the boundaries of the index zone.
9. When you are satisfied with the index zone, click OK.
10. Click OK in the Index Zone dialog box.
Scanner Setup
PaperVision Capture supports more than 300 ISIS-compatible scanners. The PaperVision Capture
installation media contains most of the currently available ISIS scanner drivers. However, as this list is evergrowing, some newer drivers may not be available at the time of distribution. If you need additional drivers,
please contact Digitech Systems’ Technical Support at support@digitechsystems.com or by phone at (877)
374-3569. If the driver is available, our support personnel will assist you in obtaining the driver.
PaperVision Capture also offers the ability to use TWAIN scanners. The use of TWAIN scanners is generally
intended for extremely low-volume scanners as ISIS drivers are available for most scanners on the market.
In the PaperVision Capture Administration Console, you can test and save scanner settings during index, barcode,
and OCR zone configuration. Black and white images are saved in an industry standard Group IV TIFF file format,
while color or grayscale images are saved in a standard JPG or BMP file format.
Settings in the Scanner Settings dialog box can be accessed during index, barcode, and OCR zone configuration.
To access these settings, click Configure Scanner
on the toolbar of the step’s configuration window.
NOTE: Depending on the type of scanner that is used, some scanner options may be disabled, and the
number of options available in the drop-down menus may vary.
PaperVision® Capture Administration Guide
101
Chapter 6 Indexing Configuration
Scanner Settings Dialog Box
Saved Settings
This drop-down list displays any scanner settings that were previously saved.
To save a new scanner setting
1. Enter the name in the Saved Settings field.
2. Click Apply.
To remove a setting
1. Select the setting from the Saved Settings drop-down list.
2. Click Delete.
PaperVision® Capture Administration Guide
102
Chapter 6 Indexing Configuration
Scanner Name
Click the Scanner Name drop-down menu to select a scanner that has been installed and detected by
PaperVision Capture. The Properties menu allows you to configure scanner and file import devices.
Depending on the type of scanner, the menu options will display different settings.
The Properties drop-down list contains the following options:
l
More Settings may contain additional scanner settings that are available for configuration.
l
About displays the driver's version, copyright, and other information specific to the scanner.
l
Area Settings allow you to assign the scanning area.
l
Extended Settings may contain additional scanner settings that are available for configuration.
l
Windows Image Acquisition may contain additional settings if your scanner supports Windows Image
Acquisition.
l
Calibrate allows you to calibrate the scanner driver.
l
Configure allows you to configure the scanner driver settings.
Color Format
Also known as the mode, select from options such as black and white, color, etc.
Dither
Dithering converts and simulates unavailable colors. When dithering is turned on, the system combines two
or more colors to approximate the unavailable color.
Horizontal Resolution
Select the horizontal dots-per-inch resolution setting to apply during the scanning process.
Vertical Resolution
Select the vertical dots-per-inch resolution setting to apply during the scanning process.
Page Size
This setting determines the default page size of the image as it is scanned.
Scan Type
This setting determines if scanning should be two-sided (duplex), one-sided (simplex), etc.
Brightness
Brightness defines a pixel's lightness value from black (darkest) to white (brightest). Select the brightness
level to be applied during the scanning process and whether it should be applied manually or automatically. If
applying the brightness manually, use the slider to increase or decrease its amount.
Contrast
Contrast is a measure of the rate of change of brightness in an image. A high-contrast image contains defined
transitions from black to white. Select the contrast level to be applied during the scanning process and
whether it should be applied manually or automatically. If applying the contrast manually, use the slider to
increase or decrease its amount.
PaperVision® Capture Administration Guide
103
Chapter 7 Barcode Configuration
You can use barcodes to populate index values and insert document breaks. PaperVision Capture recognizes
one- dimensional and two-dimensional, black and white, and color barcodes. The Barcode job step allows you
to configure a barcode reading process that executes automatically in the PaperVision Capture Operator
Console or by the PaperVision Capture Automation Service.
NOTE: Use of the scaling image processing filter can improve the recognition rate of barcode detection.
See "Image Processing Filters" on page 242 for more information.
To view the properties of the Barcode job step
1. On the Job Definitions window, select the Barcode job step in the workspace.
2. On the Properties tab, expand the Auto Document Break, General, and Indexes nodes.
Auto Document Break
While scanning documents, you can determine where one document ends and the next document begins
using the Auto Document Break properties. Although you can separate documents manually, you can select
from options that are described below:
l
l
By default, no auto-document breaks are inserted. When set to None, the system will expect you to
manually separate new documents. No options are available for this setting.
If you select the Barcode mode, click the ellipsis button to the right of the Barcode Zone field to define the
zones in the Edit Document Break Barcodes screen (see Barcode Zones topic for more information). For
the Save Page property, select True to leave the page with the barcode in the batch, or select False to
remove the barcode from the batch.
Index - General (Job Level)
You can configure additional index values and barcode zones for the Barcode step.
To configure index values for the Barcode step:
1. In the Properties grid for the Barcode step, click the ellipsis button next to the Indexes field. The Index
Configuration dialog box appears.
2. Click the Add button to add a new index. The Add Index dialog box appears.
3. To add a new index, select New Index, and then enter its field name. Proceed to step 5.
4. To add a new detail set for the job, select Job Detail Set. You can then create and configure each
individual index comprising the detail set. See "Configuring Detail Sets" on page 59 for more information.
5. Click OK. The Index Configuration dialog box will display your new index along with its associated
properties that you can configure.
PaperVision® Capture Administration Guide
104
Chapter 7 Barcode Configuration
Auto-Carry/Auto-Increment
The Auto-Carry/Auto-Increment settings can greatly increase operator productivity while hand-keying
repetitive or incremental values or characters. Both tools operate during scanning (optional) and hand-keying.
To configure these settings, click the ellipsis button in the Auto-Carry/Auto-Increment field.
NOTE: Auto-Carry settings only apply when the operator saves index values in the Operator Console.
Auto-Carry Entire Index Value
This setting allows you to carry all characters from an index in one document to the corresponding index in
the next document. You can then enable Overwrite Existing Values and/or Carry Values to Copied
Document.
Auto-Carry Characters Preceding Number
This setting allows you to define the number of characters that precede a number. Your specified number of
characters will carry from an index in one document to the corresponding index in the next document. For
example, if you have an index that is always (or nearly always) the letters ABC followed by a number, you
may not want to continuously re-enter ABC on each index value. You could set the number of characters to
carry to 3. When the operator is keying the information, ABC would automatically get carried forward to the
next document and they would only have to enter the numeric portion of the index.
Auto-Carry Characters Following Number
This setting allows you to define the number of characters that follow a number. Your specified number of
characters will carry from an index in one document to the corresponding index in the next document. For
example, if you have an index that is always (or nearly always) a number followed by the letters ABC, you
may not want to continuously re-enter ABC on each index value. You could set the number of characters to
carry to 3. When the operator is keying the information, ABC would automatically get carried forward to the
next document and they would only have to enter the numeric portion of the index.
Auto-Increment Number
Auto-Increment takes Auto-Carry one step further. For example, if the numeric portion of the value was an
incremental numeric value, you could set Auto-Carry to 3 and Auto-Increment to 1. This would increment the
numeric value of any characters remaining after the first three characters by a value of one.
l
The Auto-Increment Number can also be used without Auto-Carry if the value is completely numeric.
l
The value entered in the Minimum Number Digits field allows you to pad the new value with zeros.
l
The Preview section displays the original value and displays a preview of the carried value.
Overwrite Existing Values
By default, Auto-Carry and Auto-Increment do not fill in an index value if there is already information in the
index. Selecting this check box will force Auto-Carry and Auto-Increment to update the index regardless of
whether information previously existed.
PaperVision® Capture Administration Guide
105
Chapter 7 Barcode Configuration
Carry Values to Copied Document
By default, when documents are copied, no index values are carried through to the copies. This allows you to
specify that the current index should also be copied, leaving the other indices blank.
Auto-Fill Cursor Location
If you enable this setting, operators are allowed to append to an existing index value. The setting places the
cursor's focus at the end of the original index value so the original value is retained.
NOTE: This determines whether data will be highlighted or the cursor will be placed at the end of the data
when hand-keying an index that has the Auto-Carry or Auto-Fill option selected.
Index Type
Document index fields contain values that enable you to identify key elements of documents within a project
during the capture process. For more information, see Index Types and Formats.
Name
This editable field contains the name of the index value.
Indexes - General (Step Level)
Barcode Parsing
During indexing configuration in a Barcode step, you can configure a text delimiter or a regular expression to
parse specific index fields from a barcode. You can then specify which field’s index is parsed from the
barcode (e.g., you can select the third field's index so only the last four digits of a social security number are
parsed). Optionally, you can verify that an exact number of index fields results from the parse operation (e.g.,
three index fields indicative of a social security number in the format xxx-xx-xxxx).
NOTE: The Verify Number of Fields setting is intended to verify that an exact number of index fields
(two or more) results from the parse operation.
If errors occur during barcode parsing, such as when the parsed number of index fields differs from your
specified number of fields, you can select one of three subsequent actions. First, the entire index value can
be skipped (therefore, no barcode parsing occurs). In the second option, the entire barcode value is used
(therefore, no barcode parsing occurs). In the last option, you can specify the text used as the parsed value
(e.g., you can enter “unknown value”).
To configure barcode parsing
1. In the Properties grid for the Barcode step, click the ellipsis button to the right of the Indexes row.
2. In the Index Configuration dialog box, expand the General (Step Level) node.
PaperVision® Capture Administration Guide
106
Chapter 7 Barcode Configuration
3. Click the ellipsis button to the right of the Barcode Parsing row. The Configure Barcode Parsing dialog
box appears.
4. In the Delimiter section, select whether to use a text delimiter or regular expression to split the original
index value into fields. If you enter an invalid text delimiter or regular expression, the error symbol
appear to the right of the field.
will
NOTE: Additional information on regular expressions can be located at: http://msdn.microsoft.com/enus/library/vstudio/28hw3sce%28v=vs.100%29.aspx
5. In the Field Parsing section, specify the field index position from which to parse data.
6. Optionally, you can verify that an exact number of index fields (two or more) results from the parse
operation.
For example, you can set the Field Index value to “3” to parse only the last four digits of a social
security number that exists in the format xxx xx xxxx. You can then select the Verify Number of
Fields option to verify that three index fields (indicative of a social security number) result from the
parse operation.
7. In the Parsing Errors section, select the action that will be executed if parsing errors occur:
l
Skip Index Value: The entire index value is skipped, so no barcode parsing occurs.
l
Use Complete Barcode Value: The complete barcode value is used, so no barcode parsing occurs.
l
Use Text: Your specified text is used as the parsed value.
8. In the Preview section, you can enter a sample index value to ensure the text delimiter or regular
expression parses the value correctly.
Barcode Zones
For detailed information on the configuration of barcode zones, see the Barcode Zones topics.
Indexes - Predefined Index (Job Level)
Add New Values
If this setting is True, all new operator-entered index values are added to the Predefined Values list.
Barcode Zones
During index value configuration for a Capture step, you can configure barcode zones to be recognized during
the scanning process in the PaperVision Capture Operator Console.
To view the barcode zone settings
1. In the Index Configuration dialog box, expand the General (Step Level) Settings node for the appropriate
index value.
2. Click the ellipsis button to the right of the Barcode Zones field. The Edit Barcode Zones screen opens.
PaperVision® Capture Administration Guide
107
Chapter 7 Barcode Configuration
Edit Barcode Zones
NOTE: If you define more than one barcode zone in a multiple-page document, the last barcode value that
is read on the last page overrides all others and populates the index. If you define more than one barcode
zone in a single-page document, the last barcode value that passes through the system populates the
index.
The Edit Barcode Zones screen contains the following components:
l
l
l
l
The main window, where you draw the barcode zones, displays the individual images. To draw a barcode
zone, press the left mouse button while you drag a rectangular region around the barcode. You can then
widen and narrow the boundaries of the barcode zone region to adjust its size.
The Barcode Explorer provides an expandable view of each defined barcode zone, its dimensions, and
test results.
The Properties grid, viewable when you highlight a zone in the Barcode Explorer tree, displays all
properties associated with the selected barcode zone.
Thumbnails windows are found in the Edit Barcode Zones, Edit OCR Zones, Edit Nuance Full-Text OCR,
Edit Open Text Full-Text OCR, and Edit Image Processing Filters screens. You can right-click within any
PaperVision® Capture Administration Guide
108
Chapter 7 Barcode Configuration
Thumbnails window to perform basic operations on images, such as the cut/paste, copy/paste, delete, or
select all operations. The cut, copy, paste, and delete operations can be performed on consecutive or nonconsecutive images. Additionally, you can select multiple images and simultaneously rotate them. The
scrolling capability, displayed with up/down or left/right arrows as you drag and drop images, allows you to
quickly scroll through remaining images not shown in the current window.
NOTE: Images viewed as thumbnails can have maximum dimensions of 32,768 x 32,768 pixels.
l
The status bar on the bottom of the screen displays each image’s page number, page size (in KB), and page
dimensions (in mm).
NOTE: The page dimensions 215 x 279 mm are approximately equivalent to 8.5 x 11 inches.
Saving Barcodes
To save all defined barcode zones and return to index configuration, click the Save Barcodes
icon.
Configuring the Scanner
The Configure Scanner command allows you to assign scanner settings for barcode zone recognition. To access
these settings, click the Configure Scanner
icon. See the Scanner Setup topic for details on each setting.
Starting the Scanning Process
After loading images, you can scan them to ensure the barcodes zones are being read successfully. To start
the scanning process, click the Start Scanning
icon.
Stopping the Scanning Process
To stop the scanning process, click the Stop Scanning
icon.
Deleting a Single Image
To delete a single image
1. In the Thumbnails section, select the image to delete.
2. Click the Delete Single Image
icon.
3. Click Yes to the confirmation message.
PaperVision® Capture Administration Guide
109
Chapter 7 Barcode Configuration
Rotating an Image 90° Counter-Clockwise
To rotate the image 90° degrees counter-clockwise, click the Rotate Image 90° Counter-Clockwise
icon
Rotating an Image 90° Clockwise
To rotate the image 90 degrees clockwise, click the Rotate Image 90° Clockwise
icon.
Removing All Images
The Remove All Images command removes all current images from the main scanning window and the
Thumbnails section.
To remove all images
1. Click the Remove All Images
icon.
2. Click Yes to the confirmation message. If you have defined barcode zones prior to clearing all images,
these barcode zones are retained.
Importing Images
To import images
1. Click the Import Images
icon.
2. Locate the directory of the image(s).
3. Click Open.
The Test All Barcode Zones operation verifies that all defined barcode zone regions read barcodes
successfully.
NOTE: If you test multiple barcode zones that exist for the same index, the last barcode read by the
system overrides the others. Results for every barcode will then populate the Results row in the Barcode
Explorer.
To test all barcodes
1. After you insert all barcode zones and assign properties to each, click the Test All Barcode Zones
icon.
PaperVision® Capture Administration Guide
110
Chapter 7 Barcode Configuration
l
l
The Barcode Explorer tree updates the Results row for each page that contains the defined barcodes.
A successful reading, indicated with a green check mark, populates the Results row in the Barcode
Explorer tree.
2. If you do not receive a successful test result, select more barcode types, enable decoding, and/or enable
checksum reading, and run the test once again.
Poor image quality might result in an unsuccessful reading, so try importing a clearer barcode
image.
Zooming In/Zooming Out/Resetting the Zoom
l
To zoom in on an area of the image, click the Zoom In
icon.
l
To zoom out of the current view of the image, click the Zoom Out
l
To reset the image to its original view, click the Zoom Reset
icon.
icon.
Exiting the Barcode Zones Screen
To close and exit out of the Barcode Zone screen
1. Click the Exit
icon.
2. Click Yes if you want to save all barcode changes.
Barcode Explorer
The Barcode Explorer summarizes your defined barcode zones per page and allows you to add, remove, test,
and modify each barcode zone.
l
l
To view the properties of a barcode zone, highlight the Zone node in the tree, and its properties appear in
the grid below.
Expand the Zone node to view a barcode zone's X and Y coordinates, dimensions (in millimeters),
orientation, and test results.
PaperVision® Capture Administration Guide
111
Chapter 7 Barcode Configuration
Barcode Explorer
Adding a Barcode Zone (Selected Page/New Page)
You can add a new barcode zone to the current page or a new page. The Barcode Explorer tree updates with
each addition or modification.
To add a new barcode zone to the selected page
1. Click the down arrow in the Add Zone
icon,.and select Add Zone (Selected Page).
2. With the left mouse button, drag a rectangular region around a barcode.
3. Move and/or edit the barcode zone if necessary.
To add a new barcode zone to a new page
1. Click the down arrow in the Add Zone
icon, and select Add Zone (New Page).
2. In the Page Index dialog box, enter the page number where the new barcode zone will reside.
PaperVision® Capture Administration Guide
112
Chapter 7 Barcode Configuration
NOTE: If you enter a page that already exists or if you enter an invalid character, a reminder message
appears.
3. Enter the new page where the barcode will reside.
4. Move and/or edit the barcode zone if necessary.
Removing a Barcode Zone
To remove a barcode zone
1. In the tree, highlight the zone(s) to remove.
2. Click the Remove Zone
icon.
3. Click OK to confirm the removal.
Removing All Barcode Zone on a Page
To remove all barcode zones on a page
1. In the Barcode Explorer tree, highlight the page where the zones will be removed.
2. Click the Remove All Zones On This Page
icon.
3. Click OK to confirm the removal.
Testing a Barcode Zone
This operation verifies that individual barcode zones can be read successfully. If more than one barcode
exists in one zone, the engine returns the value read from the first barcode.
To test a barcode zone
1. Highlight the zone in the Barcode Explorer.
2. Click the Test Barcode Zone
icon. A successful reading, indicated with a green check mark,
populates the Results row in the Barcode Explorer tree.
3. If you do not receive a successful test result, select more barcode types, enable decoding, and/or enable
checksum reading, and run the test once again.
NOTE: Poor image quality might result in an unsuccessful reading, so try importing a clearer barcode
image.
PaperVision® Capture Administration Guide
113
Chapter 7 Barcode Configuration
Testing a Barcode Zone
l
To expand all barcode zones, click the Expand All
l
To collapse all barcode zones, click the Collapse All
icon.
icon.
Barcode Explorer Properties
The following properties can be configured in the Barcode Explorer:
Image Size
This field is read-only; if no barcode zone is defined, the page size appears in this field. If a barcode zone is
defined, the size of the zone and the page size display in this field. All sizes appear in millimeters.
Barcode Type
The following two-dimensional (2D) barcode types are supported in PaperVision Capture:
l
AZTEC
l
DataMatrix
l
PDF417
l
QR Code
l
Royal Post
l
Australian Post - This is a 4-state barcode that supports the following four different formats:
o
Standard Customer Barcode - The barcode is 37 bars long and contains no customer
information.
o
Reply Paid Barcode - The barcode is 37 bars long and is similar to the Standard Customer
barcode.
o
Customer Barcode 2 - The barcode includes customer information.
o
Customer Barcode 3 - The barcode includes customer information.
NOTE: Currently, only the Standard Customer and Reply Paid barcode formats are supported.
l
Intelligent Mail
The following one-dimensional (1D) barcode types are supported in PaperVision Capture:
l
Addon2
l
Addon5
l
BCDMATRIX
l
Codabar
PaperVision® Capture Administration Guide
114
Chapter 7 Barcode Configuration
l
Code25_Datalogic
l
Code25_IATA
l
Code25_Industrial
l
Code25_Interleaved
l
Code25_Invert
l
Code25_Matrix
l
Code32
l
Code39
l
Code93
l
EAN13
l
EAN8
l
Postnet
l
Type128
l
UCC128
l
UPC_A
l
UPC_E
To select the barcode types
1. Click the ellipsis button in the Barcode Types field in the Properties grid.
2. Select the barcode types to be recognized.
3. Click the Select All button if you want PaperVision Capture to recognize all types.
4. Click OK.
Decode
Some barcode types, such as Code 128, do not represent their data as ASCII characters. Other barcode
types, such as Code 3 of 9, use special characters to extend the basic character set to include the entire
ASCII set. When this setting is enabled, barcode values are converted into human-readable ASCII strings.
For example, if the barcode uses escape characters, as in "*%K123%M?*", and the Decode property is True,
then "[123]" will be returned. If the Decode property is False, the raw barcode is returned.
NOTE: You should enable this setting unless the barcode results should not be converted into ASCII
strings. For example, this setting should be disabled if you are detecting Code 3 of 9 barcodes that
represent dates using the slash mark ”r;/” character (e.g. 01/01/1999). If this setting is enabled, no results
are returned because ”r;/0” and ”r;/1” are not valid ASCII characters.
PaperVision® Capture Administration Guide
115
Chapter 7 Barcode Configuration
Orientation
PaperVision Capture detects horizontal and vertical barcodes with skew angles of no more than fifteen degrees
from the horizontal and vertical axes, respectively. Horizontal barcode detection is slightly faster than vertical
barcode detection. If you are unsure of the expected barcode orientation or if the documents might contain
barcodes with different orientations, select Both from the drop-down menu.
Required for Delete (for Auto Document Breaks)
This property is applicable when you define Auto Document Breaks with barcodes. When set to True, the
break page will be deleted when all defined barcode zones are read successfully.
Region
The Region property displays a barcode zone's X and Y coordinates and its height and width.
To change the dimensions of the barcode zone
1. Click the ellipsis button in the right column next to the Region field.
2. In the Zone Rectangle dialog box, select Whole Page if you want the barcode zone to comprise the entire
height and width of the page.
3. To specify the dimensions of the barcode zone, enter the left, top, width, and height (in millimeters) of the
zone rectangle.
4. Click OK.
NOTE: To clear the Rectangle field, right-click the ellipsis button and select Reset.
Regular Expression Verification (for Auto Document Breaks)
This field is applicable when you define Auto Document Breaks with barcodes. If you enter an exact value or
regular expression into the Regular Expression Verification field, a document break is only inserted when
the system reads barcodes matching your exact value or regular expression. If you leave this field blank, any
barcode read by the system will cause a document break to be inserted. A regular expression is a pattern of
text that consists of ordinary characters (for example, letters A through Z) and special characters, known as
metacharacters. The pattern describes one or more strings to match when searching a body of text. The
regular expression serves as a template for matching a character pattern to the string being searched.
To configure a regular expression
1. Click the ellipsis button in the right column next to the Regular Expression Verification field to enter a
regular expression.
2. In the Regular Expression box, enter the regular expression.
PaperVision® Capture Administration Guide
116
Chapter 7 Barcode Configuration
3. Enter the text to validate.
l
A successful validation displays with a green
l
Invalid entries display with a red
icon.
icon.
NOTE: To clear the Regular Expression Verification property, right-click the ellipsis button and select
Reset.
Use Checksum
A checksum is an error detection process where additional characters are appended to a barcode to ensure
more accurate readings. Enable this setting if you want the checksum to be recognized during the scanning
process.
PaperVision® Capture Administration Guide
117
Chapter 8 Zonal OCR
PaperVision Capture enables you to customize Optical Character Recognition (OCR) settings for individual
index fields and pages of text that you define within zones. The Nuance and Open Text OCR job steps allow
you to configure an OCR process that executes automatically in the PaperVision Capture Operator Console
or by the PaperVision Capture Automation Service. You can also configure OCR zones to insert document
breaks.
PaperVision Capture recognizes OCR zones that you define in Job Definitions. During index value
configuration for the Nuance OCR or Open Text OCR job step, you can define the OCR zones that will be
recognized during OCR processing. Your selected step determines the properties available for zonal OCR
configuration. For more information on specific settings for each step, see the Nuance Zonal OCR and Open
Text Zonal OCR topics.
To view the properties for the Nuance OCR or Open Text OCR job step
1. On the workspace of the Job Definitions window, select the Nuance Zonal OCR or Open Text Zonal
OCR job step.
2. On the Properties tab, expand Auto Document Break, General, and Indexes.
The following settings are available for configuration for both job steps.
Auto Document Break
While scanning documents, you can determine where one document ends and the next document begins
using inserting an Auto Document Break. Although you can separate documents manually, you can select
from options that are described below.
Select an option in the drop-down list found in the right column of the Mode field:
l
l
None - This is the default auto-document break type for a newly created step. When set to None, the
system will expect you to manually separate new documents. No options are available for this setting.
OCR - If you select the OCR mode, click the ellipsis button to the right of the OCR Zones property to
define the zones in the Edit OCR Zones window (see OCR Zones for more information). For the Save
Page property, select True to leave the page with the auto-document break in the batch, or select False to
remove the auto-document break page from the batch.
Indexes - Line Feed Delimiter
You can configure OCR zones specific to each index. The Line Feed Delimiter property, specific to OCR
zones, allows you to define extra spaces, characters, etc. that will replace carriage returns located during
OCR processing. To access settings for an index, click the ellipsis button next to the Indexes row on the
Properties tab.
To define the line feed delimiter for the OCR Zone
1. On the Properties tab for the OCR step, click the ellipsis button to the right of the Indexes row.
2. In the Index Configuration dialog box, expand General (Step Level).
PaperVision® Capture Administration Guide
118
Chapter 8 Zonal OCR
3. Click Line Feed Delimiter, and then click the ellipsis button.
4. In the OCR Line Feed dialog box, select the Replace check box.
5. In the Delimiter box, type the delimiter that will be used to replace the OCR line feed.
6. Click OK.
Indexes OCR Parsing
During indexing configuration in an OCR step, you can configure a text delimiter or a regular expression to
parse specific index fields from OCR text. You can then specify which field’s index is parsed (for example,
the fourth field's index from a credit card number). Optionally, you can verify that an exact number of index
fields results from the parse operation (for example, four index fields indicative of a complete credit card
number).
NOTE: The Verify Number of Fields setting is intended to verify that an exact number of index fields
(two or more) results from the parse operation.
If errors occur during OCR parsing, such as when the parsed number of index fields differs from your
specified number of fields, you can select one of three subsequent actions. First, the entire index value can
be skipped (therefore, no OCR parsing occurs). In the second option, the entire OCR value is used (therefore,
no OCR parsing occurs). In the last option, you can specify the text used as the parsed value (e.g., you can
enter “unknown value”).
To configure OCR parsing
1. On the Properties tab for the OCR step, click the ellipsis button to the right of the Indexes row.
2. In the Index Configuration dialog box, expand General (Step Level).
3. Click the ellipsis button to the right of the OCR Parsing row. The Configure OCR Parsing dialog box
appears.
4. In the Delimiter section, select whether to use a text delimiter or regular expression to split the original
value into fields. If you enter an invalid text delimiter or regular expression, the error symbol
to the right of the field.
will appear
NOTE: Additional information on regular expressions can be located at: http://msdn.microsoft.com/enus/library/vstudio/28hw3sce%28v=vs.100%29.aspx
5. In the Field Parsing section, specify the field index position from which to parse data.
6. Optionally, you can verify that an exact number of index fields (two or more) results from the parse
operation. For example, you can set the Field Index value to "4" to parse only the last four digits of a credit
card number. You can then select the Verify Number of Fields option to verify that four index fields
(indicative of a social security number) result from the parse operation.
PaperVision® Capture Administration Guide
119
Chapter 8 Zonal OCR
7. In the Parsing Errors section, select the action that will be executed if parsing errors occur.
l
Skip Index Value: The entire index value is skipped, so no OCR parsing occurs.
l
Use Complete OCR Value: The complete OCR value is used, so no OCR parsing occurs.
l
Use Error Text: Your specified text is used as the parsed value.
8. In the Preview section, you can enter a sample index value to ensure the text delimiter or regular
expression parses the value correctly.
To view OCR Zone settings
1. In the Job Definitions workspace, select the Nuance OCR or Open Text OCR job step.
2. In the Properties grid, expand the Indexes node, and then click the ellipsis button next to the Indexes row.
3. In the Index Configuration dialog box, highlight the index in the Indexes section.
4. Under the Index Properties section, expand the General (Step Level) node.
5. Click the ellipsis button to the right of the OCR Zones field. The Edit OCR Zones screen appears.
Edit OCR Zones (Nuance OCR)
PaperVision® Capture Administration Guide
120
Chapter 8 Zonal OCR
The Edit OCR Zones screen contains the following components:
l
l
l
l
The main window, where you draw the OCR zones, displays the individual images. To draw an OCR zone,
hold the left mouse button while you drag a rectangular region around the OCR region. You can then widen
and narrow the region's boundaries to adjust its size.
OCR Explorer provides an expandable view of each defined OCR zone, its dimensions, and test results.
The Properties grid, viewable when you highlight a zone in the OCR Explorer tree, displays all properties
associated with the selected OCR zone.
Thumbnails windows are found in the Edit Barcode Zones, Edit OCR Zones, Edit Full-Text OCR, and Edit
Image Processing Filters screens. You can right-click within any Thumbnails window to perform basic
operations on images, such as the cut/paste, copy/paste, delete, or select all operations. The cut, copy,
paste, and delete operations can be performed on consecutive or non-consecutive images. Additionally,
you can select multiple images and simultaneously rotate them. The scrolling capability, displayed with
up/down or left/right arrows as you drag and drop images, allows you to quickly scroll through remaining
images not shown in the current window.
NOTE: Images viewed as thumbnails can have maximum dimensions of 32,768 x 32,768 pixels.
l
The status bar on the bottom of the screen displays each image’s page number, page size (in KB), and page
dimensions (in mm)
NOTE: The page dimensions 215 x 279 mm are approximately equivalent to 8.5 x 11 inches.
Edit OCR Zones Operations
The Edit OCR Zones screens for the Nuance Open Text OCR and the Open Text Zonal OCR steps contains
the following operations:
Saving All OCR Zones
To save all defined OCR zones and return to index configuration, click the Save All OCR Zones
icon.
Configuring the Scanner
To configure the scanner settings, click the Configure Scanner
each setting.
icon. See Scanner Setup for details on
Starting the Scanner Process
After loading images, scan them to ensure OCR zones are being read successfully. To scan the images,
click the Start Scanning
icon.
PaperVision® Capture Administration Guide
121
Chapter 8 Zonal OCR
Stopping the Scanning Process
To stop the scanning process, click the Stop Scanning
icon.
Removing a Single Image
1. In the Thumbnails section, select the image to delete.
2. Click the Delete Single Image
icon.
3. Click Yes to the confirmation message.
Removing All Image
This command removes all current images from the main scanning window and the Thumbnails section.
To remove all images
1. Click the Remove All Images
icon.
2. Click Yes to the confirmation message.
NOTE: If you have defined OCR zones prior to clearing all images, these zones are retained.
Rotate 90° Counter-Clockwise
To rotate the image 90 degrees counter-clockwise, click the Rotate Image 90° Counter-Clockwise
icon.
Rotate 90° Clockwise
To rotate the image 90 degrees clockwise, click the Rotate Image 90° Clockwise
icon.
Importing Images
1. Click the Import Images
icon.
2. Locate the directory of the image(s).
3. Click Open, and the image appears in the main window.
PaperVision® Capture Administration Guide
122
Chapter 8 Zonal OCR
Testing All OCR Zones
This OCR Explorer command verifies that all defined OCR zone regions will recognize OCR characters.
To test all OCR Zones
1. After you insert all barcode zones and assign properties to each, click the Test All OCR Zones
l
The OCR Explorer updates the Results row for each page containing your defined zones.
l
A successful reading, indicated with a green check mark, populates the Results row.
icon.
Poor image quality might result in an unsuccessful reading, so try importing a clearer
barcode image.
2. If you do not receive a successful test result, select a different recognition module, or adjust other
properties if necessary, and run the test once again.
Zooming In/Zooming Out/Resetting Zoom
l
To zoom in on an area of the image, click the Zoom In
icon.
l
To zoom out of the current view of the image, click the Zoom Out
l
To reset the image to its original view, click the Zoom Reset
icon.
icon.
Exiting the OCR Zone Screen
To close and exit the Edit OCR Zones screen
1. Click the Exit
icon.
2. Click Yes if you want to save all changes.
PaperVision® Capture Administration Guide
123
Chapter 8 Zonal OCR
OCR Explorer
The OCR Explorer summarizes your defined OCR zones per page and allows you to add, remove, test, and
modify each OCR zone.
OCR Explorer (Nuance OCR)
l
l
To view the properties of an OCR zone, highlight the Zone node in the tree, and its properties appear in the
grid below.
Expand the Zone node to view an OCR zone's X and Y coordinates, dimensions (in millimeters),
orientation, and test results.
Adding an OCR Zone (Selected Page/New Page)
You can add a new OCR zone to the current page or a new page. The OCR Explorer tree updates with each
addition or modification.
To add a new OCR zone to the selected page
1. Click the down arrow next to the Add Zone
icon, and select Add Zone (Selected Page).
2. With the left mouse button, drag a rectangular region around the text.
3. Move and/or edit the OCR zone if necessary.
PaperVision® Capture Administration Guide
124
Chapter 8 Zonal OCR
To add a new OCR zone to a new page
1. Click the down arrow in the Add Zone
icon, and select Add Zone (New Page).
2. In the Page Index dialog box, enter the page number where the new OCR zone will reside.
NOTE: If you enter a page that already exists or if you enter an invalid character, a reminder message
appears.
3. Enter the new page where the OCR zone will reside.
4. Move and/or edit the OCR zone if necessary.
Removing an OCR Zone
1. In the tree, highlight the zone(s) to remove.
2. Click the Remove Zone
icon.
3. Click OK to confirm the removal.
Removing All OCR Zones on Page
1. Select the page where the zones will be removed.
2. Click the Remove All Zones On This Page
icon.
3. Click OK to confirm the removals.
Testing OCR Zones
The Test OCR Zone command verifies that individual fields can be read successfully.
To test an OCR zone
1. Highlight the zone.
2. Click the Test OCR Zone
icon. A successful reading, indicated with a green check mark, populates
the Result row in the OCR Explorer tree.
3. If you do not receive a successful test result, adjust the properties if necessary; or, resize the zone, and run
the test once again.
NOTE: Poor image quality might result in an unsuccessful reading, so try importing a clearer image.
PaperVision® Capture Administration Guide
125
Chapter 8 Zonal OCR
Expanding All/Collapsing All OCR Zones
l
To expand all OCR zones, click the Expand All
l
To collapse all OCR zones, click the Collapse All
icon.
icon.
General OCR and Miscellaneous Properties
Region Size
This field is read-only; the OCR zone's X and Y coordinates are displayed along with its height and width in
millimeters.
Image Size
Image Size field is read-only; if no OCR zone is defined, the page size appears in this field. If an OCR zone
is defined, the zone and page size display (in millimeters) in this field.
Regular Expression Verification
A regular expression is a pattern of text that consists of ordinary characters (for example, letters A through Z)
and special characters, known as metacharacters. The pattern describes one or more strings to match when
searching a body of text. The regular expression serves as a template for matching a character pattern to the
string being searched.
Regular expressions are applied on a per-zone basis. When you define Auto Document Breaks using OCR
zones, you can assign an exact value or regular expression, and a document break will only be inserted when
the system reads an OCR zone matching that exact value or regular expression. If you leave this field blank,
any OCR zone read by the system will cause a document break to be inserted.
To assign a search value
1. Click the ellipsis button next to the Regular Expression Verification field.
2. Enter the regular expression or exact value.
3. Enter the text to validate.
l
A successful validation displays with a green
l
Invalid entries display with a red
icon.
icon.
NOTE: To clear the field, right-click the ellipsis button and select Reset.
PaperVision® Capture Administration Guide
126
Chapter 8 Zonal OCR
Nuance Zonal OCR
PaperVision Capture lets you customize Optical Character Recognition (OCR) settings for individual index
fields. The Nuance OCR job step lets you configure an OCR process that executes automatically in the
PaperVision Capture Operator Console or by the PaperVision Capture Automation Service. You can also
configure OCR to insert document breaks. Character recognition options let you customize how values are
recognized by processes such as OCR, Intelligent Character Recognition (ICR), and Magnetic Ink Character
Recognition (MICR).
NOTE: The Nuance OCR engine supports incoming images ranging from 75 to 2400 dots per inch (DPI).
In pixels, this ranges from 16 x 16 to 8400 x 8400 pixels. Larger images can be ingested into PaperVision
Capture provided that no Full-Text OCR will be performed on the images (unless they are processed
using the Image Fit filter and cropped to meet size requirements); no image processing will be performed
on the images (unless they are processed using the Image Fit filter and cropped to meet size
requirements); and, images will not be viewed as thumbnails. Additionally, if you process multiple pages
containing large amounts of text, testing and executing the Nuance Full-Text OCR step may take a few
minutes.
To View Nuance Zonal OCR Zone settings
1. On the workspace of the Job Definitions window, select the Nuance Zonal OCR job step.
2. On the Properties tab, expand Indexes.
3. Click the Indexes property, and then click the ellipsis button.
4. In the Index Configuration dialog box, select the index in the Indexes list.
5. In the Index Properties section, expand General (Step Level).
6. Click the OCR Zones property, and then click the ellipsis button to access the Edit OCR Zones window
where you can set the properties described in the subsequent content.
Nuance OCR Page Properties
The Nuance OCR settings described in this section can be configured for each page. Some of the settings
refer to the temporary black and white image that is created during OCR processing.
Additional Character the Scanner
This setting allows you to define additional characters to recognize during OCR processing. Characters that
you define here are processed when you have selected the Plus or Number Character Filter setting.
Additional Language Filters
You can assign additional characters to increase the number of acceptable characters as determined by your
selected spelling language.
PaperVision® Capture Administration Guide
127
Chapter 8 Zonal OCR
Brightness
You can assign the brightness value (between 0 and 100) for the page. A value of 0 is lightest; 100 results in
the darkest image. The default value is 50.
Brightness Threshold
You can assign a brightness threshold value (between 0 and 255) when converting an image to black and
white. The default value is 128.
Enable Fax-Handling (Ominfont Multi-Lingual)
You should enable this setting if you are processing a scanned image that was faxed in draft mode (200 x 100
dpi).
Hand-Printed Character Height
You can assign the expected character height (in 1/1200 of an inch) for the Constrained Handprint
Recognition (Numeric) module. The default value is 0.
NOTE: 1/1200 of an inch is equivalent to approximately 0.021 mm.
Hand-Printed Character Width
You can assign the expected character width (in 1/1200th of an inch) for the Constrained Handprint
Recognition (Numeric) module. The default value is 0.
Hand-Printed Detect Spaces
If this setting is enabled, the Constrained Handprint Recognition (Numeric) module will detect spaces
between characters.
Hand-Printed Leading Spaces
You can assign the expected leading spaces (in 1/1200th of an inch) for the Constrained Handprint
Recognition (Numeric) module. The default value is 0.
Hand-Printed Style
You can select either the European or U.S. writing style of the Constrained Handprint (Numeric) module. For
example, the number seven is crossed in European style and uncrossed in American style.
PaperVision® Capture Administration Guide
128
Chapter 8 Zonal OCR
Recognition Language
The default recognition language is English, and any combination of recognition languages can be selected.
You can increase the number of recognized characters by assigning the Additional Language Filter property,
and you can narrow them by selecting from the Character Filter list.
To select the Recognition Languages
1. Click the ellipsis button to the right of the Recognition Language field.
2. Select the languages to include during the OCR process. Characters from your selected language will be
recognized during OCR.
3. Click OK.
NOTE: A faster reading will result if you match the Spelling Language to your selected Recognition
Language.
Recognition Process Setting
The Recognition Process Setting is applied at the page level during OCR and involves a trade-off between
accuracy and speed.
l
Accurate, the default setting, results in the most accurate recognition.
l
Balanced applies average accuracy and speed recognition.
l
Fast results in the fastest recognition, but accuracy may be compromised.
Rejection Symbol
This property represents rejected characters in output documents. A rejected character is not recognized by
the active OCR recognition engine configuration. The default value is the Tilde character ( ~ ). Only a single
character can be entered in this field.
To prevent unrecognized characters from appearing in output documents, leave this field blank.
Spelling Language
This property accepts all possible recognition languages. The Auto setting matches the recognition language
with the corresponding spelling language. Only one spelling language can be selected at a time.
PaperVision® Capture Administration Guide
129
Chapter 8 Zonal OCR
Vertical Dictionaries
By default, Vertical Dictionaries are disabled, however, you can select any combination of dictionaries to
include in the OCR process. PaperVision Capture supports the following dictionaries:
l
Dutch Legal Professional Dictionary
l
Dutch Medical Professional Dictionary
l
English Financial Professional Dictionary
l
English Legal Professional Dictionary
l
English Medical Professional Dictionary
l
French Legal Professional Dictionary
l
French Medical Professional Dictionary
l
German Legal Professional Dictionary
l
German Medical Professional Dictionary
Nuance OCR Zone Properties
The following Nuance OCR settings can be configured for each OCR zone:
Capitalize Proper Names
If this setting is enabled, the correction feature of the recognition subsystem will capitalize names inside
recognized text.
Character Filter
Character filters that are defined at the zone level will narrow the search for only your specified sets of
characters. By default, all character filters are selected, but you can select a specific set of characters that
will be recognized during OCR processing.
Your selected recognition module may restrict the character filters recognized during OCR processing. For
example, the Constrained Handprint (Numeric) module only supports numerals and four other characters, so
if you select the Alphanumeric character filter, your character filters will not be recognized. All character
filters are supported by the Omnifont Multi-Lingual, Constrained Handprint (Alphanumeric), Omnifont MultiLingual (FRX), and Draft Dot-Matrix modules.
The table below describes each character filter that you can define for the zone:
Character Filter
Description
All
Since all filters are enabled, no filtering is applied
Alpha
Recognizes upper- and lower-case letters only
Default
Causes the zone to be handled globally; do not combine with any other filter
Digit
Recognizes only numerals (e.g. 1, 2, 3, etc.)
PaperVision® Capture Administration Guide
130
Chapter 8 Zonal OCR
Lower-case
Recognizes only lower-case letters (e.g. a, b, c, etc.), including accented letters
Miscellaneous
Only recognizes other miscellaneous characters (e.g. +, -, etc.)
Numbers
Recognizes only the digits and any values defined in the Additional Character Filters
field for the page
Plus
Enables the use of the defined Additional Character Filters; these characters added
after all other filters
Punctuation
Recognizes only punctuation signs (e.g. !, @, #, etc.)
Upper-case
Recognizes only upper-case letters (e.g. A, B, C, etc.), including accented letters
Filling Method
This setting is based on the selected recognition module and contains the filling method for the specified
OCR zone. The filling method corresponds with the zone’s contents. If an incorrect filling method is chosen
for the zone, its contents will not be recognized. The following table displays the filling methods, their
descriptions, and the supported recognition modules.
Filling Method
Description
Supported Recognition
Module
Default
This is the filling method to be used, acquired
from the page's Default Filling Method property
N/A
Omnifont
(Default setting) indicates machine-printed text
with any typeface
Omnifont Plus (2W)
Omnifont Plus (3W)
Omnifont Multi-Lingual
Omnifont Multi-Lingual (FRX)
Omnifont Matrix
Draft-Dot 9
9-pin draft dot-matrix printout
Draft Dot-Matrix
Omnifont Matrix
Handprinted
Hand-printing within the zone
Constrained Handprinted
Recognition (Numeric)
Constrained Handprinted
(Alphanumeric) Recognition
Draft-Dot 24
24-pin draft dot-matrix printout
Omnifont Multi-Lingual
Omnifont Matrix
OCR-A
OCR-A filling method
Omnifont Multi-Lingual
Omnifont Matrix
PaperVision® Capture Administration Guide
131
Chapter 8 Zonal OCR
Matrix Matching Recognition
OCR-B
OCR-B filling method
Omnifont Multi-Lingual
Omnifont Matrix
Matrix Matching Recognition
Magnetic Ink
Character
Recognition
Magnetic ink character filling method
Matrix Matching Recognition
Dash-Digit
Dash-digit zone filling method
Matrix Matching Recognition
Dot-Digit
Indicates the dot-digit zone filling method
Matrix Matching Recognition
Ignore Blank Spaces
If this setting is enabled, white space characters (including white space created by the Spacebar and Tab
keys) will be excluded (ignored) during OCR processing.
Ignore Character Case
If this setting is enabled, upper and lower case characters will be ignored during OCR processing. If this
setting is disabled, upper and lower case characters will be discerned during OCR processing.
Included Punctuation
If this setting is enabled, punctuation will be recognized during OCR processing.
Recognition Module
All zones must have a recognition module assigned before OCR processing can be successfully completed.
See OCR Recognition Modules for a complete list and descriptions for each module.
Verify Complete Lines
If you enable this setting, entire lines of text (instead of individual words) will be processed through OCR.
Select False to pass individual words through OCR processing.
PaperVision® Capture Administration Guide
132
Chapter 8 Zonal OCR
Zone Type
This setting describes the area inside the OCR zone, and whether that area should be recognized or ignored.
You can assign zone types to be treated as text, a table, or a form.
l
Auto automatically performs a parsing algorithm, and may create several OCR zone types including Flow,
Table, and Form.
l
Flow contains flowed text without a table structure inside the zone.
l
Form represents an unfilled form.
l
Table contains a table with rows and columns, with or without a grid.
Nuance OCR Recognition Modules
A Nuance OCR license includes all recognition modules except the Constrained Handprint Recognition
(Numeric) and Constrained Handprint Recognition (Alphanumeric) modules that require a separate Intelligent
Character Recognition (ICR) license.
Omnifont Matrix
The Omnifont Matrix recognition module recognizes machine-printed text from printed publications, laser and
ink-jet printers, and electric typewriters. Mechanical typewriters may also produce readable output. This
module can also be used with Letter or Near Letter Quality (NLQ) output from dot-matrix printers, and can also
be used for Draft Quality.
Omnifont Matrix detects and transmits bold, italic, and underlined text (including combinations). This module
also detects and transmits character size and classifies font types into the serif, sans serif, and monospaced categories.
Supported Filling Methods
l
Omnifont
l
Draft-Dot 9
l
Draft-Dot 24
l
OCR-A
l
OCR-B
Supported Filter Types
l
All
l
Digit
l
Alphanumeric
Supported Recognition Processing Settings
l
Fast
l
Balanced and Accurate merged into one value
PaperVision® Capture Administration Guide
133
Chapter 8 Zonal OCR
Ominfont Multi-Lingual
The Omnifont Multi-Lingual module recognizes machine printed text from printed publications, laser and ink
jet printers, and electric typewriters. Mechanical typewriters may produce readable output. Additionally, dot
matrix printers with NLQ and LQ output may produce readable results. Use the DRAFTDOT24 filling method
for draft quality 24-pin dot-matrix documents. NLQ and LQ output can be better recognized without using the
filling method DRAFTDOT24. A maximum of 500 OCR zones can be defined on one image for this module.
Omnifont Multi-Lingual detects and transmits bold, italic, and underlined text (including combinations). This
module also detects and transmits character size and classifies font types into serif, sans serif, and
monospaced categories.
Character Range
l
Latin, Greek, and Cyrillic alphabets and accented letters
l
500 characters
Supported character set includes
Characters
Non-accented
Accented
Latin alphabet upper-case letters
26
89
Latin alphabet lower-case letters
26
91
Digits
10
Punctuation
29
Miscellaneous symbols
55
Cyrillic upper-case letters
33
14
Cyrillic lower-case letters
33
14
Greek upper-case letters
24
9
Greek lower-case letters
25
11
OCR (OCR-A and MICR) characters
3
Supported Filling Methods
l
Omnifont
l
Draft Dot-24
l
OCR-A
l
OCR-B
Supported Filter Types
l
Default
l
Digit
l
Upper-Case
l
Lower-Case
PaperVision® Capture Administration Guide
134
Chapter 8 Zonal OCR
l
Punctuation
l
Miscellaneous
l
Plus
l
All
l
Alphanumeric
l
Number
Supported Recognition Process Settings
l
Fast
l
Balanced
l
Accurate
Draft Dot-Matrix
The Draft Dot-Matrix recognition module is only designed for draft-quality 9-pin dot-matrix text. No
Recognition Processing settings are supported, but all filters are supported in the module. Expanded
characters are not recognized, but condensed characters can be (although their accuracy may be low).
For NLQ or LQ text, the following Omnifont modules produce better results
l
Omnifont Plus (2W)
l
Omnifont Plus (3W)
l
Omnifont Matrix
l
Omnifont Multi-Lingual
Supported Character Range (Accented)
Upper and Lower Case
Lower Case Only
A Acute (A’ )
A Circumflex (a^)
AE (Ae)
A Macron (a-)
A Ring (Ao)
A Grave (a`)
A Umlaut (A:)
E Umlaut (e:)
A Tilde (A˜ )
E Circumflex (e^)
C Cedilla (C,)
E Grave (e`)
E Acute (E')
I Umlaut (I:)
I Acute (I')
I Circumflex (I^)
N Tilde (N~)
I Grave (I`)
O Double Acute (O")
O Circumflex (O^)
O Acute (O')
O Macron (O-)
PaperVision® Capture Administration Guide
135
Chapter 8 Zonal OCR
O Umlaut (O:)
O Grave (O`)
O Tilde (O~)
S Hacek (Sv)
O Slash (O/)
U circumflex (U^)
AE (OE)
U Grave (U`)
U Double Acute (U")
U Acute (U')
U Umlaut (U:)
Optical Mark Recognition
The Optical Mark Recognition (OMR) module recognizes optical marks, such as check marks used in
educational tests, voting ballots, questionnaires, and ordering sheets. The processed documents are
considered form-like and are usually filled out by hand. No filters are recognized, and no Recognition Process
Settings are supported in this module. Only the OMR filling method supported in this module.
Although the optical marks may be surrounded by visible frames on the input document, these frames may
not be visible on the document passed through OMR because of the use of dropout colors during scanning.
OMR processing requires a high degree of accuracy; it is recommended that documents display a solid
design with clear instructions for respondents. Dark blue or black pens are recommended; pencils should be
avoided in documents.
Constrained Handprint Recognition (Numeric)
The Constrained Handprint Recognition (Numeric) module recognizes hand-printed numeric characters and
four calculation signs. You can use the Constrained Handprint Recognition (Numeric) module in conjunction
with this module since both modules are included with the ICR license.
For better recognition, characters should not touch one another, and each character must be between 30-180
pixels in height. Well-formed numbers written in pen are best recognized; pencil and felt-tip pens result in
poorer recognition.
The maximum number of characters that can be contained in a zone is 3000; maximum number of lines, 40;
maximum number of characters per line, 600. Each OCR zone can contain only one character or each zone
can contain several lines of characters. Optimally, the OCR zone region should be 5 mm x 6 mm separated
by 3 mm.
Character range includes
l
Digits (0-9)
l
Plus sign (+)
l
Minus sign (-)
l
Period or full-stop (.)
l
Comma (,)
PaperVision® Capture Administration Guide
136
Chapter 8 Zonal OCR
Supported Filter Types
l
All
l
Digit
l
Punctuation
l
Miscellaneous
NOTE: You can use Digit filter to exclude the Plus Sign, Minus Sign, Period, and Comma during
processing.
Supported Recognition Processing Settings
l
Fast
l
Balanced and Accurate (merged into one value)
Constrained Handprint Recognition (Alphanumeric)
The Constrained Handprint Recognition (Alphanumeric) module recognizes hand-printed alphanumerical
characters such as upper- and lower-case letters, digits, and others. You can use the Constrained Handprint
Recognition (Numeric) module in conjunction with this module since both modules are provided under one
license. This module can read flowed text, but is applied mainly in hand-printed forms.
The Constrained Handprint Recognition (Alphanumeric) module differentiates over 150 characters, including
digits, punctuation marks, miscellaneous characters, English alphabet letters, and accented characters.
NOTE: Cyrillic and Greek languages are not supported in this module.
The only supported Filling Method is Handprint, but all filter types are supported. Hand-printed text is more
difficult to recognize, but enhanced character quality can improve recognition. Structured forms and zone
filters can improve OCR processing for this module.
l
For better recognition, characters should not touch one another.
l
Each character must be between 30-180 pixels in height.
l
Well-formed characters written in pen are best recognized.
l
Pencil and felt-tip pens result in poorer recognition.
l
The maximum number of characters per line is 200.
l
An infinite number of lines can be assigned per zone.
Recognized Punctuation and Miscellaneous Characters
l
Exclamation Mark (!)
l
Question Mark (?)
l
Apostrophe or Single Quote (')
l
Quotation Mark (")
PaperVision® Capture Administration Guide
137
Chapter 8 Zonal OCR
l
Semicolon (;)
l
Comma (,)
l
Colon (:)
l
Period or full-stop (.)
l
Hyphen or Minus Sign (-)
l
Opening and Closing Parentheses ( )
l
Opening and Closing Square Brackets [ ]
l
Opening and Closing Curly Brackets { }
l
Number Sign (#)
l
Percent Sign (%)
l
At (@)
l
Ampersand (&)
l
Vertical Bar (|)
l
Dollar Sign ($)
l
Asterisk (*)
l
Plus Sign (+)
l
Equals Sign (=)
l
Underscore (_)
l
Slash (/)
l
Backslash (\)
l
Less Than (<)
l
Greater Than (>)
Supported Recognition Process Settings
l
Fast
l
Balanced
l
Accurate
Matrix Matching Recognition
The Matrix Matching Recognition module reads groups of fixed-font characters designed specifically for OCR
or imaging applications in which no two characters have similar shapes. Relevant applications include
banking, check handling, product distribution, and document validation, where accuracy is critical. Each
character group has its own filling method. Additionally, some non-fixed print styles are also recognized. No
recognition processing settings are supported, but all filters (except the Lower-Case filter) are supported in
the module.
PaperVision® Capture Administration Guide
138
Chapter 8 Zonal OCR
Character Range
Character Type
OCR-A*
OCR-B
Magnetic Ink Character*
Characters Included
l
Uppercase English letters
l
Digits
l
Some punctuation
l
OCR symbols (Chair, Hook, and Fork):
l
Uppercase English letters
l
Digits
l
Some punctuation
l
Digits
l
Some punctuation
l
Dot-Digit Zone
Dash-Digit Zone
Magnetic Ink Character symbols (OCR Branch Bank, OCR
Amount of Check, OCR Dash, and OCR Customer Account
Number:
l
Ten digits and period
l
Commas are read, but converted to periods
l
Ten digits and period
l
Commas are read, but converted to periods
* Only recognized when enabled in the Additional Language Filter
PaperVision® Capture Administration Guide
139
Chapter 8 Zonal OCR
Supported Filling Methods
l
OCR-A
l
OCR-B
l
Magnetic Ink Character Recognition
l
Dot-Digit
l
Dash-Digit
Ominfont Plus (2W) and (3W)
The Omnifont Plus (2W) and (3W) modules recognize machine-printed text from printed publications, laser
and ink-jet printers, and electric typewriters. Mechanical typewriters may also produce good output. These
modules provide improved recognition results and combine results from the Omnifont Multi-Lingual and
Omnifont Matrix modules (2W) and Omnifont Multi-Lingual, Omnifont Matrix, and Omnifont Multi-Lingual
(FRX) modules (3W). Only the Omnifont filling method is supported in these modules. Both modules detect
and transmit bold, italic, and underlined text (including combinations). They also detect and transmit
character size and classify font types into serif, sans serif, and mono-spaced categories.The supported
character set includes:
Characters
Non-accented
Accented
Latin alphabet upper-case letters
26
89
Latin alphabet lower-case letters
26
91
Digits
10
Punctuation
29
Miscellaneous symbols
55
Cyrillic upper-case letters
33
14
Cyrillic lower-case letters
33
14
Greek upper-case letters
24
9
Greek lower-case letters
25
11
OCR (OCR-A and MICR) characters
3
Supported Filters
l
All
l
Digit
l
Alpha
Supported Recognition Processing Settings
l
Fast
l
Balanced
l
Accurate
PaperVision® Capture Administration Guide
140
Chapter 8 Zonal OCR
Ominfont Multi-Lingual
The Omnifont Multi-Lingual (FRX) module recognizes machine-printed text from printed publications, laser
and ink jet printers, and electric typewriters. Mechanical typewriters may produce readable output.
Additionally, dot-matrix printers with NLQ and LQ output may produce readable results. No Recognition
Processing Settings are supported, but all filters are supported in this module. Only the Omnifont filling
method is supported in this module.
This module supports Latin, Greek, and Cyrillic alphabets with accented letters. Omnifont Multi-Lingual
(FRX) detects and transmits bold, italic, and underlined text (including combinations). This module also
detects and transmits character size and classifies font types into serif, sans serif, and mono-spaced
categories.
You can select multiple languages for OCR recognition, but languages are only recognized if they belong to
the same code page. For example, OCR can process English, Spanish, and French since they belong to the
Latin 1 code page. OCR may fail to recognize both English and Russian since they belong to different code
pages. The following table outlines the supported languages for each code page:
Code Page
Supported Languages
Latin 1
English, German, French, Spanish, Italian, Dutch, Swedish, Norwegian, Finnish,
Danish, Portuguese, Portuguese Brazilian, Catalan, Afrikaans, Aymara, Basque,
Breton, Faroese, Friulian, Gaelic, Galician, Eskimo, Icelandic, Indonesian, Latin,
Malaysian, Pidgin English, Swahili, Tahitian, Welsh, Frisian, Zulu
Latin 2
Polish, Czech, Hungarian, Romanian, Albanian, Croatian, Wend (Sorbian),
Slovak, Slovenian
Cyrillic
Russian, Ukranian, Byelorussian, Bulgarian, Macedonian, Serbian
Greek
Greek
Turkish
Turkish, Kurdish (written in Latin alphabet)
Baltic
Estonian, Hawaiian, Latvian, Lithuanian
Nuance OCR Spelling Languages
Your selected language comprises the language environment of the character set. If you narrow the search
for specific spelling languages, the Nuance OCR engine will process more rapidly during OCR recognition.
The following table displays the supported Nuance spelling languages available in PaperVision Capture.
Spelling Languages
Afrikaans - spoken in South Africa
Albanian
Automatic language selection for spell-checking only
PaperVision® Capture Administration Guide
141
Chapter 8 Zonal OCR
Aymara - spoken in Bolivia and Peru
Basque
Byelorussian (Cyrillic) - includes the characters of the English language; other spellings are Belarusian
and Whire Russian
Bemba - alternate names are Chibemba, Ichibemba, Wemba, Chiwemba; spoken in Zambia and
Democratic Republic of Congo
Blackfoot - alternate name is Blackfeet, Siksika, and Pikanii; spoken in Canada and USA
Portuguese (Brazilian)
Breton
Bugotu - spoken in Solomon Islands
Bulgarian (Cyrillic) - includes the characters of the English language
Catalan
Chamorro - spoken in Guam and Northern Mariana Islands
Chechen
Chuana or Tswana - spoken in Botswana and South Africa
Corsican
Croatian
Crow - spoken in USA
Danish
Dutch
English
Eskimo
Esperanto
Estonian
Faroese
PaperVision® Capture Administration Guide
142
Chapter 8 Zonal OCR
Fijian
Danish
French
Frisian - macrolanguage of three Frisian languages in Germany
Friulian - spoken in Italy
Galician (alternate names Gallegan and Gallego) - spoken in Spain and Portugal
Ganda or Luganda - spoken in Uganda
German
Gaelic Irish
Gaelic Scottish
Greek - includes the characters of the English language
Guarani (macrolanguage of the Chiripa and some Guarani languages) - spoken in Paraguay, Argentina,
Bolivia, and Brazil
Hani (alternate names are Hanhi, Haw, and Hani Proper) - spoken in China, Laos, and Vietnam
Hawaiian
Hungarian
Icelandic
Ido - constructed language
Finnish
Indonesian
Interlingua - constructed language
Italian
Kabardian (alternate name is Beslenei) - spoken in Russia and Turkey
Kashubian - spoken in Poland
PaperVision® Capture Administration Guide
143
Chapter 8 Zonal OCR
Kawa (alternate names are Wa, Va, Vo, Wa Pwo, and Wakut) - spoken in China
Kikuyu - spoken in Kenya
Kongo (macrolanguage of Laari and Kongo languages) - spoken in the Democratic Republic of the Congo,
Angola, and Congo
Kpelle (macrolanguage of Kpelle languages) - spoken in Liberia and Guinea
Kurdish (if written in the Latin alphabet) - macrolanguage of the Kurdish languages
Latvian
Lithuanian
Latin
Luba (alternate names are Luba-Lulua, Luba-Kasai, Tshiluba, Luva, and Western Luba) - spoken in the
Democratic Republic of the Congo
Luxembourgian (alternate names are Luxembourgeois and Letzburgish) - spoken in Luxembourg
Macedonian (Cyrillic) - includes the characters of the English language
Maltese
Maori - spoken in New Zealand
Mayan
Miao (macrolanguage of Hmong languages and alternate name is Hmong) - spoken in China, Laos,
Thailand, Myanmar, and Vietnam
Minankabaw
Malagasy (macrolanguage of Malagasy languages) - spoken in Madagascar
Malinke (alternate names are Western Maninkakan, Malinka, and Maninga) spoken in Senegal, Gambia,
and Mali
Malay
Mohawk - spoken in Canada and USA
Moldavian (Cyrillic) - includes the characters of the English language
Nahuatl
NO No language selection (for spell checking only) - this value can be used to specify that the checking
module will not use the Language dictionary
PaperVision® Capture Administration Guide
144
Chapter 8 Zonal OCR
Norwegian
Nyanja (alternate names are Chichewa and Chinyanja) - spoken in Malawi, Mozambique, Zambia, and
Zimbabwe
Occidental - constructed language
Ojibway (macrolanguage of Ojibwa, Chippewa, and Ottawa languages and alternate names are Ojibwa
and Ojibwe) - spoken in Canada and USA
Papiamento - spoken in Netherlands Antilles, Aruba
Pidgin English (alternate names are Tok Pisin, Naomalanesian, and New Guinean Pidgin English) spoken in Papua New Guinea
Polish
Portuguese
Provencal (alternate name is Occitan) - spoken in France, Italy, and Monaco
Quechua (macrolanguage of the Quechua languages) - spoken in Peru
Rhaetic (alternate names are Romansch and Rhaeto-Romance) - spoken in Switzerland
Romanian
Romany - spoken all over Europe
Ruanda (alternate names are Kinyarwanda and Rwanda) - spoken in Rwanda, the Democratic Republic
of Congo, and Uganda
Rundi - spoken in Burundi and Uganda
Russian (Cyrillic) - includes the characters of the English language
Samoan - spoken in Samoa and American Samoa
Sardinian - macrolanguage of the Sardinian languages
Shona - spoken in Zimbabwe, Botswana, and Zambia
Sioux (alternate name is Dakota) - spoken in USA and Canada
Slovak
Slovenian
Sami - combination of the Sami language family
PaperVision® Capture Administration Guide
145
Chapter 8 Zonal OCR
Lule Sami
Northern Sami
Southern Sami
Somali
Sotho, Suto, or Sesuto language selection - spoken in Lesotho and South Africa
Spanish
Serbian (Cyrillic)
Serbian (Latin)
Sundanese (alternate names are Sunda and Priangan) - spoken in Java and Bali in Indonesia
Swahili (macrolanguage of the Swahili languages) - spoken in the Democratic Republic of the Congo,
Tanzania, Kenya, and Somalia
Swedish
Swazi (alternate names are Swati, Siswati, and Tekela) - spoken in Swaziland, Lesotho, Mozambique,
and South Africa
Tagalog - spoken in Philippines
Tahitian
Tinpo
Tongan (alternate names are Tonga, Siska, and Nyasa) - spoken in Malawi
Tun (alternate names are Tunia and Tunya) - spoken in Chad
Turkish
Ukrainian (Cyrillic) - includes the characters of the English language
Visayan consists of Cebuano, Hiligaynon, Samaran, or Waray-waray languages - spoken in the
Philippines
Welsh
Wend or Sorbian
Wolof - spoken in Senegal and Mauritania
PaperVision® Capture Administration Guide
146
Chapter 8 Zonal OCR
Xhosa - spoken in South Africa and Lesotho
Zapotec (macrolanguage of the Zapotec languages) - spoken in Mexico
Zulu - spoken in South Africa, Lesotho, Malawi, Mozambique, and Swaziland
Open Text Zonal OCR
PaperVision Capture enables you to customize Optical Character Recognition (OCR) settings for individual
index fields. The Open Text OCR job step allows you to configure an OCR process that executes
automatically in the PaperVision Capture Operator Console or by the PaperVision Capture Automation
Service. You can also configure OCR to insert document breaks.
The Open Text Zonal OCR step contains a disparate set of properties available for configuration. Open Text®
OCR processing recognizes machine-printed text, but handwritten text is not recognized. New line
characters are removed during Open Text OCR processing. Properties available for configuration in the Open
Text Zonal OCR step are described below.
Maximum Supported Image Sizes
The maximum supported image dimensions that can be processed through the Open Text engine vary with
resolution. The approximate maximum width is approximately 32,000 pixels, and the maximum height is
approximately 24,000 pixels. For example, the maximum supported image dimensions at 300 dpi are
approximately 106 inches x 80 inches. Images that are processed through the Open Text OCR engine must
contain matching horizontal and vertical resolutions.
DISCLAIMER: These dimensions are provided only as estimates to identify size limits processing
images in PaperVision Capture. Variations in technical environments may cause maximum image sizes
to fluctuate across systems.
To configure Open Text OCR zones
1. In the Job Definitions workspace, select the Open Text Zonal OCR job step.
2. In the Properties grid, expand the Indexes node, and then click the ellipsis button next to the Indexes field.
Proceed to step 4.
3. Or, expand the Auto Document Break node to configure OCR zones that will automatically break
documents. Proceed to step 7.
4. In the Index Configuration dialog box, click the Add button.
5. Under the Index Properties section, expand the General (Step Level) node.
6. Click the ellipsis button to the right of the OCR Zones field. The Edit OCR Zones screen appears.
7. Drag the cursor around the OCR zone on the image, and the properties appear in the grid.
PaperVision® Capture Administration Guide
147
Chapter 8 Zonal OCR
Open Text OCR Zone Properties
The following Open Text OCR settings can be configured for each OCR zone.
Maximum Supported Image Sizes
You can configure custom code that reports specific OCR statistics when an OCR zone is processed through
the Open Text OCR engine. For example, you can configure custom code to record statistics when an OCR
zone populates an index value by using the OCRIndexZonesStatistics sample script. Custom code samples
are located in the Library\Samples directory (as text or XML files), where PaperVision Capture was
installed. The following OCR sample scripts are available for configuration:
l
OCRFullTextPageStatistics
l
OCRIndexZoneStatistics
l
OCRMarkSenseZoneStatistics
For more information, see the Custom Code Configuration topic.
To configure custom code OCR statistics
1. In the Edit OCR Zones screen, click the ellipsis button next to the OCR Statistics field. The Select
Custom Code Generator dialog appears.
2. Select the Basic custom code generator, and then click OK. The Script Editor opens.
3. If desired, you can import code from the OCRIndexZoneStatistics or OCRMarkSenseZonescript into the
Script Editor. Click the Import
icon, and then browse to the Library\Samples directory where
PaperVision Capture was installed.
4. Otherwise, insert your custom code into the Script Editor.
5. Click OK.
Auto Rotate
By default, this property is set to True, and the Open Text Zonal OCR engine will attempt to recognize text in
all orientations (vertically and horizontally) within the zone. If you do not want the Open Text Zonal OCR
engine to recognize text in all orientations (vertically only) within the zone, set this property to False.
Brightness Sample Size
This value (indicating both width and height) specifies the rectangle size used to calculate the brightness
threshold. You can specify a value between 1 and 32, and the default value is 15. You can assign a
brightness threshold value (between 0 and 255) for the image. The default value is 75.
NOTE: Smaller brightness sample sizes may cause the OCR engine to recognize extraneous noise on
the image.
Brightness Threshold
You can assign a brightness threshold value (between 0 and 255) for the image. The default value is 75.
PaperVision® Capture Administration Guide
148
Chapter 8 Zonal OCR
Country/Language
When you select from the Country/Language property, your selection may reflect not only a country or
language, but country groups (e.g., Western Europe), language groups (e.g., Latin), and character sets (e.g.,
OCR). Each country corresponds to one or more languages, and countries are automatically expanded into
language sets (e.g., German corresponds to the German language; Switzerland corresponds to the German,
French, Italian, and Rhaeto-Romantic languages).
Specific languages are also available for selection under the Country/Language property (e.g., English,
German, Dutch, Italian, etc.). It is recommended to narrow your selection as much as possible since OCR
recognition may become slower with a greater number of selected countries or languages. It is also
recommended to select a country rather than a language or country group (e.g., Western Europe, South
America, Scandinavia) since the recognition of certain types of addresses and money transfer forms may
improve.
NOTE: You cannot select the OCR character set individually; it must be selected with another language,
language group, country, or country group. For a complete list of Open Text supported countries,
languages, country groups, and character sets, see the Open Text OCR Supported Countries/Languages
(Groups)/Character Sets topic.
Language Groups
If you select a language group, it is recommended to select only one, since they encompass multiple
languages, countries, and code pages:
1. Cyrillic: Code page 1251
2. Greek: Code page 1253
3. Latin: Code pages 1250, 1252, 1254 and 1257 (i.e. Central Europe, Western Europe, Turkey, Baltic)
4. Azerbaijanian
NOTE: For language groups, recognition results are always represented by Unicode characters. The
English character set (A-Z, a-z) is implicitly available with all country-language selections, even Greek
or Cyrillic.
To select a country or language for full-text OCR output
1. After selecting an output type, click the ellipsis button to the right of the Country/Language property. The
Country/Language dialog box appears.
2. Highlight one or more countries/languages from the Available list, and then click the right arrow.
NOTE: If a country or language appears crossed out, it does not belong to the same code page as the
selected country or language. Therefore, countries or languages containing strikethroughs cannot be
added to the Selected list.
PaperVision® Capture Administration Guide
149
Chapter 8 Zonal OCR
3. To remove one or more selections from the Selected list, highlight the countries/languages, and then click
the left arrow.
4. When finished with your selections, click OK.
Minimum Confidence
The confidence level reflects the reliability of the OCR recognition results. Values range from zero (the
default setting), the lowest confidence level, to 255, the highest confidence level indicating the most reliable
recognition results. Characters with lower confidence levels than your specified value will display as the
rejection symbol, which is the tilde (~) character by default.
Timeout Value (sec)
This property allows you to define the maximum amount of time that the Open Text OCR engine processes a
single image before it fails. By default, this property is set to 180 seconds (3 minutes). You can assign a
timeout between one second and 3,600 seconds (1 hour).
NOTE: Raising the timeout setting may increase the amount of time to process all images.
Reader Engine
Two internal OCR reader engines, RecoStar and AEGReader, are available for selection in the Open Text
Zonal OCR step. Document content may cause one engine to generate more accurate recognition results, so
the Voter option is selected by default. The Voter option automatically "votes" between the recognition
results of both engines, and then generates results from the engine with the highest confidence level.
Rejection Symbol
This property represents rejected characters in output documents. A rejected character is not recognized by
the active OCR recognition engine configuration. The default value is the Tilde character ( ~ ). Only a single
character can be entered in this field.
Tip: To prevent unrecognized characters from appearing in output documents, leave this field blank.
Syntax Mode
When you assign the syntax mode to alphanumerical, the default character set is alphanumeric. If a character
is ambiguous, the OCR engine will attempt to process the character as a letter before a number. For example,
the OCR engine will process a "G" before "6", "S" before "5", etc.
When you assign the syntax mode to numerical, the default character set is numeric. If a character is
ambiguous, the OCR engine will attempt to process the character as a number before a letter. For example,
the OCR engine will process a "6" before "G", "5" before "S", etc.
Open Text OCR: Supported Countries and Languages
The table in this content shows the supported Open Text countries, languages, country groups, language
groups, and character sets available in PaperVision Capture. If you narrow the search for specific languages
or countries, the Open Text OCR engine will process more rapidly during OCR recognition.
PaperVision® Capture Administration Guide
150
Chapter 8 Zonal OCR
Each language, country, language group, country group, and character set is compatible with specific code
pages. When you select from the Country/Language property, you can only select combinations of countries,
languages, etc. within the same code page or code page group (i.e., Latin). For example, a valid Latin
combination can be Poland, Hungary, and Germany. A valid Cyrillic combination can be Bulgaria and Russia.
A valid Greek combination can be Greek and OCR.
1.
Cyrillic: Code page 1251
2.
Greek: Code page 1253
3.
Latin: Code pages 1250, 1252, 1254 and 1257 (i.e. Central Europe, Western Europe, Turkey, Baltic)
4.
Azerbaijanian
Note: Code page 0 (OCR) can be added to any combination above.
Supported Open Text Countries and Languages
Australia
Austria
Azerbaijan
Baltic
Belgium
Brazil
Bulgaria
Canada
Central America
Central Europe
Croatia
Cyrillic
Czech
Denmark
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
Ireland
Italy
Liechtenstein
Lithuania
Luxembourg
Netherlands
New Zealand
PaperVision® Capture Administration Guide
Code Page
1252
1252
1254
1257
1252
1252
1251
1252
1252
1250
1250
1251
1250
1252
1257
1252
1252
1252
1252
1253
1250
1252
1252
1252
1257
1252
1252
1252
151
Chapter 8 Zonal OCR
Supported Open Text Countries and Languages
Norway
Poland
Portugal
Romania
Russia
Scandinavia
Slovakia
Slovenia
South Africa
South America
South America Spanish
Spain
Sweden
Switzerland
Turkey
USA
Western Europe
OCR
Afrikaans
Albanian
Azerbaijani Latin
Basque
Bosnian Latin
Bulgarian
Catalan
Croatian
Czech Language
Danish
Dutch
English
Estonian
Faroese
Finnish
French
Frisian
German
Greek
Guarani
Hani
Hungarian
Icelandic
PaperVision® Capture Administration Guide
Code Page
1252
1250
1252
1250
1251
1252
1250
1250
1252
1252
1252
1252
1252
1252
1254
1252
1252
0
1252
1250
1254
1252
1250
1251
1252
1250
1250
1252
1252
1252
1257
1252
1252
1252
1252
1252
1253
1252
1252
1250
1252
152
Chapter 8 Zonal OCR
Supported Open Text Countries and Languages
Indonesian
Irish
Italian
Kirundi
Latin
Latvian
Lithuanian
Luxembourgish
Malay
Norwegian
Polish
Portuguese
Quechua
Rhaeto Romanic
Romanian
Russian
Rwanda
Serbian Latin
Shona
Slovak
Slovenian
Somali
Sorbian
Spanish
Swahili
Swedish
Turkish
Wolof
Xhosa
Zulu
PaperVision® Capture Administration Guide
Code Page
1252
1252
1252
1252
1252
1257
1257
1252
1252
1252
1250
1252
1252
1252
1250
1251
1252
1250
1252
1250
1250
1252
1250
1252
1252
1252
1254
1252
1252
1252
153
Chapter 9 Nuance Full-Text OCR
The Nuance Full-Text OCR job step lets you configure an automated process that reads pages of text, and
then converts recognized results into one or multiple file types. Once configured, this step runs automatically
in the PaperVision Capture Automation Service. A Capture Full-Text license is required to run the Nuance
Full-Text OCR step.
The Nuance Full-Text OCR step converts extracted text into various file types such as .txt, .rtf, .csv, .pdf,
.doc (and .docx) .htm, .xls (and .xlsx), and others. Each converter output type contains unique settings that
you can configure to support your full-text OCR requirements. Prior to activating the job, you can test and
preview the full-text OCR results. After the Nuance Full-Text OCR step is run, a maximum of 500 pages will
comprise each full-text document before a subsequent full-text output file is created for that same document.
NOTE: The Nuance OCR engine supports incoming images ranging from 75 to 2400 dots per inch (DPI).
In pixels, this ranges from 16 x 16 to 8400 x 8400 pixels. Larger images can be ingested into PaperVision
Capture provided that no Full-Text OCR will be performed on the images (unless they are processed
using the Image Fit filter and cropped to meet size requirements); no image processing will be performed
on the images (unless they are processed using the Image Fit filter and cropped to meet size
requirements); and, images will not be viewed as thumbnails. Additionally, if you process multiple pages
containing large amounts of text, testing and running the Nuance Full-Text OCR step may take a few
minutes.
Configuring a Nuance Full-Text OCR Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. Add the Nuance Full-Text OCR step to the job using one of the following methods.
l
Select the job step that you want Nuance Full-Text OCR to follow. On the Job Step Toolbox tab,
double-click Nuance Full-Text OCR.
l
On the Job Step Toolbox tab, drag Nuance Full-Text OCR
l
On the workspace, right-click, point to Insert Job Step, and then select Nuance Full-Text OCR.
PaperVision® Capture Administration Guide
on to the workspace.
154
Chapter 9 Nuance Full-Text OCR
7. Double-click the Nuance Full-Text OCR step to display the Properties tab on the left pane.
8. On the Properties tab, expand Full-Text OCR Step.
Full-Text OCR Step Properties
9. For information about each Full-Text OCR Step property, see the following sections:
l
"Setting the Auto Image Orientation Property" on page 155.
l
"Setting the Outputs Property" on page 156.
l
"Setting the Override Invalid Pages Property" on page 156.
l
"Setting the Timeout (sec) Property" on page 157.
Setting the Auto Image Orientation Property
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Auto Image Orientation.
3. From the list, select one of the following options.
l
True - When this property is set to True, it allows the Nuance Full-Text OCR engine to automatically
rotate some images so it can recognize text. The resulting output images may also be rotated.
PaperVision® Capture Administration Guide
155
Chapter 9 Nuance Full-Text OCR
l
False - When this property is set to False,it will not allow the Nuance Full-Text OCR engine to
automatically rotate images.
Setting the Outputs Property
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
to open the Edit Nuance Full-Text OCR Settings window.
4. To learn about using the Edit Nuance Full-Text OCR Settings window, go to the following sections:
l
"Editing Nuance Full-Text OCR Settings" on page 157.
l
"Configuring Output Types" on page 159.
Setting the Override Invalid Pages Property
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Override Invalid Pages.
3. From the list, you can select True or False. These options consider the values of the Recognition
Process Setting and the Timeout (sec) properties. The Recognition Process Setting property
determines whether accuracy or speed is a priority during processing. You can set the value to Speed,
Accuracy, or Balanced.(See "Setting OCR Page Properties" on page 159 for more information.) The
Timeout (sec) property determines the number of seconds that the OCR engine can spend processing a
single image before it fails. (See "Setting the Timeout (sec) Property" on page 157 for more information.)
l
l
True - When you select True, the Nuance Full-Text OCR engine processes each image using the priority
specified in the Recognition Process Setting property within the time specified in the Timeout (sec)
property. If the image cannot be processed with the specified priority, then PaperVision Capture
attempts to process the image using the remaining values. If it still cannot process the image, the page is
processed as a picture for image-based outputs or a blank page for text-based outputs. (In both cases,
these pages are tagged for future review with the "Skipped Full Text Processing" QC tag.) As a result,
the remaining documents are processed. If an error occurs during the conversion to the selected output
format, the entire batch will be processed as images and not full-text (therefore, no error will be returned).
In this scenario, the Override Invalid Pages property must also be set to True. As a result, all batches
will be processed through the Nuance Full-Text OCR step without requiring any user intervention.
False - When you select False, the Nuance Full-Text OCR engine processes each image using the
priority specified in the Recognition Process Setting property within the time specified in the Timeout
(sec) property. If the image cannot be processed with the specified priority, then PaperVision Capture
attempts to process the image using the remaining values. If it still cannot process the image, a timeout
error appears in the Administration Console and is logged in the Event Viewer. As a result, the remaining
documents are not processed.
PaperVision® Capture Administration Guide
156
Chapter 9 Nuance Full-Text OCR
NOTE: A batch can potentially stop processing in a full-text OCR step only if the Override Invalid
Pages property is set to False.
Setting the Timeout (sec) Property
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Timeout (sec).
3. In the box to the right, type the maximum number of seconds that the OCR engine can spend processing a
single image before it fails. By default, this property is set to 180 seconds (3 minutes). You can assign a
value from one second to 86,400 seconds (24 hours).
NOTE: Raising the timeout setting may increase the amount of time to process all images.
Editing Nuance Full-Text OCR Settings
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
following.
to open the Edit Nuance Full-Text OCR Settings window similar to the
Edit Nuance Full-Text OCR Settings
PaperVision® Capture Administration Guide
157
Chapter 9 Nuance Full-Text OCR
4. On the Edit Nuance Full Text OCR Settings window, you can use the components described in the
following table to complete tasks.
Component
Thumbnails pane
Description
You can right-click on the Thumbnails pane to access the Cut, Copy,
Paste, Delete, and Select All commands. You can drag-and-drop
thumbnails to a different location. Images viewed as thumbnails can
have maximum dimensions of 32,768 x 32,768 pixels.
Click Save Full-Text OCR Configuration to save the full-text OCR
configuration for the job step.
Click Exit to close the Edit Nuance Full-Text OCR Settings window.
Click Configure Scanner to open the Scanner Settings dialog box.
Click Start Scanning to begin the scanning process.
Click Stop Scanning to stop the scanning process.
Click Rotate Image 90° Counter-Clockwise to rotate the selected image
90 degrees counterclockwise.
Click Rotate Image 90° Clockwise to rotate the selected image 90
degrees clockwise.
Click Delete Single Image to delete the selected image.
Click Remove All Images to remove all loaded images. If you have defined
OCR zones prior to clearing all images, these zones are retained.
Click Import Images to access the Open dialog box where you can locate
the file you want to import.
Click Test Full Text OCR to test the selected filter on the current page
only. (See "Testing Full-Text OCR Filters" on page 161 for more
information.)
Click Test Full Text OCR to test the selected filter on all pages. (See
"Testing Full-Text OCR Filters" on page 161 for more information.)
Click Zoom In to zoom in on the selected image.
Click Zoom Out to zoom out on the selected image.
Click Zoom Reset to reset the selected image to the original view.
PaperVision® Capture Administration Guide
158
Chapter 9 Nuance Full-Text OCR
Configuring Output Types
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
following.
to open the Edit Nuance Full-Text OCR Settings window similar to the
4. On the Output Configuration pane, from the Available Outputs list select an output type.
Output Configuration
5. Click the right arrow to move your selection to the Selected Outputs list.On the right side of the Output
Configuration pane, you can set properties for the output type you selected and the OCR page.
l
Go to "Setting OCR Page Properties" on page 159 for a description of each property.
l
Go to "Nuance Full-Text OCR Output Types" on page 163 for property descriptions for each output type.
6. Go to "Testing Full-Text OCR Filters" on page 161 if you want to test the output type(s) you selected.
Setting OCR Page Properties
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
to open the Edit Nuance Full-Text OCR Settings window.
4. On the Output Configuration pane, from the Available Outputs list select an output type.
5. Click the right arrow to move your selection to the Selected Outputs list.
NOTE: You must select an output type for the OCR Page Properties to appear.
6. On the right side of the Output Configuration pane, expand OCR Page Properties.
PaperVision® Capture Administration Guide
159
Chapter 9 Nuance Full-Text OCR
OCR Page Properties
7. Click the property that you want to set, and then select or type its value in the right column. The following
table describes each property.
OCR Page Property
Description
Additional Character Filters
This property lets you define additional characters to recognize during OCR
processing that are not included in other character filters.
Additional Language Filter
This property lets you assign additional characters not included in your
selected language (as specified by the Spelling Language property).
Brightness
This property lets you assign the brightness value (between 0 and 100) for
the page. A value of 0 is lightest, and a value of 100 results in the darkest
image. The default value is 50.
Brightness Threshold
This property lets you assign a brightness threshold value (between 0 and
255) when converting an image to black and white. The default value is 128.
Enable Fax Handling (MOR)
This property enables fax handling. Set this property to True if you are
processing a scanned image that was faxed in draft mode (200 x 100 dpi).
Hand-Printed Character Height
This property lets you specify the expected character height (in 1/1200 of an
inch) for the Hand-Printed Numerals module. The default value is 0. 1/1200
of an inch is equivalent to approximately 0.021 mm.
Hand-Printed Character Width
This property lets you specify the expected character width (in 1/1200 of an
inch) for the Hand-Printed Numerals module. The default value is 0. 1/1200
of an inch is equivalent to approximately 0.021 mm.
Hand-Printed Detect Spaces
This property determines whether the Hand-Printed Numerals module
detects spaces between characters. Select True to detect spaces.
PaperVision® Capture Administration Guide
160
Chapter 9 Nuance Full-Text OCR
OCR Page Property
Description
Hand-Printed Leading Spaces
This property lets you specify the expected leading spaces (in 1/1200 of an
inch) for the Hand-Printed Numerals module. The default value is 0. 1/1200
of an inch is equivalent to approximately 0.021 mm.
Hand-Printed Style
This property lets you specify the writing style for the Hand-Printed (HNR)
Recognition module. For example, the number seven is crossed in
European style and uncrossed in American style.You can choose US or
European from the list.
This property lets you specify the recognition language. Click the ellipsis
Recognition Languages
button
to open the Select OCR Recognition Languages dialog box
where you can select the languages to include during the OCR process. To
get faster readings, set the language for this property to match the one
specified in the Spelling Language property.
This property lets you specify whether speed or accuracy is a priority during
the OCR process. You can choose one of the following options:
l
Recognition Process Setting
l
l
Accuracy- This option specifies that accuracy is most important and
produces the most accurate recognition.
Balanced - This option applies average accuracy and speed
recognition.
Speed- The option specifies that speed is most important and
produces the fastest recognition, but accuracy may be
compromised.
Rejection Symbol
This property lets you specify a rejection symbol in output documents. A
rejection symbol is not recognized by the active OCR recognition engine
configuration. The default value is the tilde character (~). To prevent
unrecognized characters from appearing in output documents, leave this
value blank.
Spelling Language
This property lets you specify the spelling language. When set to
Automatic, it matches the language specified in the Recognition
Language property.
Vertical Dictionaries
This property lets you specify vertical dictionaries. Click the ellipsis button
to open the Select OCR Vertical Dictionaries dialog box where you
can select the vertical dictionaries to include during the OCR process.
Testing Full-Text OCR Filters
After you select an output type, you can test it to verify that the loaded pages can be read successfully and that the
output file can be opened correctly.
1. If you haven’t already done so, complete the procedure under "Configuring a Nuance Full-Text OCR Job
Step" on page 154.
PaperVision® Capture Administration Guide
161
Chapter 9 Nuance Full-Text OCR
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
to open the Edit Nuance Full-Text OCR Settings window.
4. Scan or import the documents you want to test.
4. On the Output Configuration pane, from the Available Outputs, list select the output type you want to
test.
5. Click the right arrow to move your selection to the Selected Outputs list.
6. Do one of the following:
l
If you want to test the selected output type on the current page, click Test Full-Text OCR (Selected
Filter, Current Page Only)
l
.
If you want to test the selected output type on all pages, click Test Full-Text OCR (Selected Filter, All
Pages)
.
The Specify Output Files dialog box appears.
Specify Output Files
7. Click the ellipsis button
to open the Save As dialog box.
8. Specify the location and name for the test file, and then click Save.
9. Or, click the ellipsis button to browse to the location. Proceed to the next step.
10. If you browsed to the file location, enter the file name in the Save As dialog box, and then click Save.
11. To view the generated file, select the Open check box.
12. Click OK. The full-text OCR engine will process the results. If you opted to open the resulting output file, it
will open in its respective application or editor.
PaperVision® Capture Administration Guide
162
Chapter 9 Nuance Full-Text OCR
Nuance Full-Text OCR Output Types
Each full-text OCR output type has unique properties that you can set. The tables in this section describe
each OCR output type and its associated properties. To access the available OCR output types, use the
following procedure.
To access Nuance Full Text OCR output types
1. If you haven’t already done so, complete the procedure under "Nuance Full-Text OCR Output Types" on
page 163.
2. On the Properties tab, under Full-Text OCR Step, click Outputs.
3. Click the ellipsis button
to open the Edit Nuance Full-Text OCR Settings window.
4. On the Output Configuration pane, from the Available Outputs list select an output type.
5. On the right side of the Output Configuration pane, expand the output type you selected to access its
properties.
OCR Output Type Properties
6. Click the property that you want to set, and then select or type its value in the right column.
The following tables describe the properties for each item that appears in the Available Outputs list on the Output
Configuration pane.
eBook
The eBook output type generates the eBook output (.opf file type) packaged in a .zip file that you can upload
to hand-held devices.
eBook Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
PaperVision® Capture Administration Guide
163
Chapter 9 Nuance Full-Text OCR
eBook Property
Description
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
Line Numbering Zones
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property determines how tables are handled in the output file.
Tables
l
l
PaperVision® Capture Administration Guide
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
164
Chapter 9 Nuance Full-Text OCR
HTML 3.2
The HTML 3.2 output type creates a clear, small, HTML file format that is supported by many HTML editors.
After it is processed, the HTML output is packaged in a .zip file for easy transmission.
HTML 3.2 Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Horizontal Rule Line
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
This property determines whether to insert a horizontal rule line between
sections. Select True to place a horizontal rule line between sections.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines how an index page is created in the output file.
l
Index Page
l
l
PaperVision® Capture Administration Guide
Select In Frame for the index page to appear in a separate column
on the same page as the full-text output file.
Select None for no index page.
Select Simple HTML for the index page to display as a thumbnail
preview with a hyperlink to the full-text output file.
165
Chapter 9 Nuance Full-Text OCR
HTML 3.2 Property
Description
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Navigation (Next)
This property determines what displays for the “Next” navigation text for
Simple HTML or In Frame index pages. Type the text you want to appear.
Navigation (Previous)
This property determines what displays for the “Previous” navigation text
for Simple HTML or In Frame index pages. Type the text you want to
appear.
Navigation (TOC)
This property determines what displays for the “Table of Contents”
navigation text for Simple HTML or In Frame index pages. Type the text
you want to appear.
This property determines the format retention for the output file.
l
Output Format
l
l
Page Breaks
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Spreadsheet to export results in a tabular form suitable for
spreadsheet use. Each document is place in a separate worksheet.
Select Ignore All to disregard all format styles in the original file.
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
HTML 4.0
The HTML 4.0 output type uses Cascading Style Sheet technology for box-like absolute positioned objects,
styles and manipulating all paragraph and character attributes. After it is processed, the HTML output is
packaged in a .zip file for easy transmission.
HTML 4.0 Property
Description
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
CSS (External)
This property determines whether an external Cascading Style Sheet is
enabled. Select True to use an external Cascading Style Sheet.
File (Subdirectory)
This property determines whether files are placed in a sub-directory. Select
True to place every file into a sub-directory.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
PaperVision® Capture Administration Guide
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
166
Chapter 9 Nuance Full-Text OCR
HTML 4.0 Property
Horizontal Rule Line
Description
This property determines whether to insert a horizontal rule line between
sections. Select True to place a horizontal rule line between sections.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines how an index page is created in the output file.
l
Index Page
l
l
Select In Frame for the index page to appear in a separate column
on the same page as the full-text output file.
Select None for no index page.
Select Simple HTML for the index page to display as a thumbnail
preview with a hyperlink to the full-text output file.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Name (Output File)
This property determines the name of the output file. Type the name you
want to use.
Navigation (Next)
This property determines what displays for the “Next” navigation text for
Simple HTML or In Frame index pages. Type the text you want to appear.
Navigation (Previous)
This property determines what displays for the “Previous” navigation text
for Simple HTML or In Frame index pages. Type the text you want to
appear.
Navigation (TOC)
This property determines what displays for the “Table of Contents”
navigation text for Simple HTML or In Frame index pages. Type the text
PaperVision® Capture Administration Guide
167
Chapter 9 Nuance Full-Text OCR
HTML 4.0 Property
Description
you want to appear.
This property determines the format retention for the output file.
l
Output Format
l
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Ignore All to disregard all format styles in the original file.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Styles
This property determines whether styles are retained in the output file.
Select True to include styles from the original document.
InfoPath
The InfoPath output type supports the saving of various form elements such as check boxes and input lines.
It generates a Microsoft InfoPath file (.xsn file type).
InfoPath Property
Cross-References
Description
This property determines whether cross references are retained in the
output file. Select True to include cross references.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
PaperVision® Capture Administration Guide
168
Chapter 9 Nuance Full-Text OCR
InfoPath Property
Line Numbering Zones
Description
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
Rule Lines
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Ignore All to disregard all format styles in the original file.
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
This property determines how tables are handled in the output file.
Tables
l
l
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
Microsoft Excel 2007
The Microsoft Excel 2007 output type generates a Microsoft Excel 2007 binary file (.xlsx file type).
Microsoft Excel 2007
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
PaperVision® Capture Administration Guide
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
169
Chapter 9 Nuance Full-Text OCR
Microsoft Excel 2007
Property
Description
l
Select Tabulated Form to place tab stops between header and
footer text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Leader Dots
This property determines whether leader dots are retained in the output file.
Select True to include leader dots.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
Overview Sheet Name
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.) Each detected table or
spreadsheet is saved to a separate worksheet. Remaining content
is placed on the last worksheet, representing an index. Each table is
subsequently replaced by hyperlinks to its own worksheet.
Select Spreadsheet to export results in a tabular form suitable for
spreadsheet use. Each document is place in a separate worksheet.
Select Ignore All to disregard all format styles in the original file.
This property determines the name of the overview sheet. Type the name
you want to use.
PaperVision® Capture Administration Guide
170
Chapter 9 Nuance Full-Text OCR
Microsoft Excel 2007
Property
Description
Overview Sheet Name (Include)
This property determines whether to include an overview sheet as the last
sheet when the Output Format property is set to Formatted Text. Select
True to include the overview sheet.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Microsoft Excel 97
The Microsoft Excel 97 output type generates a Microsoft Excel 97 binary file (.xls file type).
Microsoft Excel 97
Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
l
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form to place tab stops between header and
footer text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
PaperVision® Capture Administration Guide
171
Chapter 9 Nuance Full-Text OCR
Microsoft Excel 97
Property
Line Numbering Zones
Description
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.) Each detected table or
spreadsheet is saved to a separate worksheet. Remaining content
is placed on the last worksheet, representing an index. Each table is
subsequently replaced by hyperlinks to its own worksheet.
Select Spreadsheet to export results in a tabular form suitable for
spreadsheet use. Each document is place in a separate worksheet.
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Microsoft Excel XP
The Microsoft Excel XP output type generates a Microsoft Excel XP binary file (.xls file type).
Microsoft Excel XP
Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
l
PaperVision® Capture Administration Guide
Select Convert to Plain Text to convert headers and footers to
plain text.
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form to place tab stops between header and
172
Chapter 9 Nuance Full-Text OCR
Microsoft Excel XP
Property
Description
footer text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines the format retention for the output file.
l
Output Format
l
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.) Each detected table or
spreadsheet is saved to a separate worksheet. Remaining content
is placed on the last worksheet, representing an index. Each table is
subsequently replaced by hyperlinks to its own worksheet.
Select Spreadsheet to export results in a tabular form suitable for
spreadsheet use. Each document is place in a separate worksheet.
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Read-Only
This property determines whether the output file is marked as read-only.
Select True to make the output file read-only.
PaperVision® Capture Administration Guide
173
Chapter 9 Nuance Full-Text OCR
Microsoft PowerPoint 2007
The Microsoft PowerPoint 2007 output type generates a Microsoft PowerPoint 2007 file (.pptx file type).
Microsoft
PowerPoint 2007
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Image Color
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
PaperVision® Capture Administration Guide
174
Chapter 9 Nuance Full-Text OCR
Microsoft
PowerPoint 2007
Property
Description
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Margins
This property determines whether to retain the page margins in the output
file. Select True to retain the page margins.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
PaperVision® Capture Administration Guide
175
Chapter 9 Nuance Full-Text OCR
Microsoft
PowerPoint 2007
Property
Description
This property determines the title for the output file. Type the title you want
to use.
Title
Microsoft PowerPoint 97
The Microsoft PowerPoint 97 output type generates a Microsoft PowerPoint 97 file (.ppt file type).
Microsoft
PowerPoint 97
Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
Line Numbering Zones
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines whether the line numbering zones are retained in
PaperVision® Capture Administration Guide
176
Chapter 9 Nuance Full-Text OCR
Microsoft
PowerPoint 97
Property
Description
output files. Select True to include the line numbering zones.
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Tabs
Microsoft Publisher 98
The Microsoft Publisher 98 output type generates a Microsoft Publisher file (.rtf file type).
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
Microsoft
Publisher 98
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Image Color
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
PaperVision® Capture Administration Guide
177
Chapter 9 Nuance Full-Text OCR
Microsoft
Publisher 98
Property
Description
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
No Text Box
This property determines whether to exclude text boxes from the output
files. Select True to exclude text boxes.
This property determines the format retention for the output file.
l
Output Format
l
Page Breaks
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
This property determines how tables are handled in the output file.
Tables
l
l
Tabs
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
PaperVision® Capture Administration Guide
178
Chapter 9 Nuance Full-Text OCR
Microsoft Reader
The Microsoft Reader output type generates a Microsoft Reader file (.lit file type) that you can upload to
Windows-based, hand-held devices.
Microsoft Reader
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
Line Numbering Zones
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
PaperVision® Capture Administration Guide
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
179
Chapter 9 Nuance Full-Text OCR
Microsoft Reader
Property
Description
This property determines how tables are handled in the output file.
Tables
l
l
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
Microsoft Word 2000
The Microsoft Word 2000 output type generates a Microsoft Word 2000 file (.doc file type).
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
Microsoft Word 2000
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
PaperVision® Capture Administration Guide
180
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2000
Property
Description
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
l
PaperVision® Capture Administration Guide
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
181
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2000
Property
Description
l
l
l
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Microsoft Word 2003 (WordML)
The Microsoft Word 2003 (WordML) output type generates a .doc file type and uses features supported by
Microsoft Word 2003 and later.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
Microsoft Word 2003
(WordML) Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the output
file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
PaperVision® Capture Administration Guide
182
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2003
(WordML) Property
Description
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Image in Text Box
This property determines whether to keep images in text boxes. Select
True to retain images in text boxes.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
PaperVision® Capture Administration Guide
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
183
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2003
(WordML) Property
Description
l
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Read-Only
This property determines whether the output file is marked as read-only.
Select True to make the output file read-only.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Microsoft Word 2007
The Microsoft Word 2007 output type generates a .docx file type and uses features supported by Microsoft
Word 2007.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
Microsoft Word 2007
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
PaperVision® Capture Administration Guide
184
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2007
Property
Description
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Columns
This property determines whether columns are retained in the output file.
Select True to retain columns.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
PaperVision® Capture Administration Guide
185
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2007
Property
Description
l
DPI 200
l
DPI 300
l
None
l
Original
Image in Text Box
This property determines whether to keep images in text boxes. Select
True to retain images in text boxes.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
l
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Read-Only
This property determines whether the output file is marked as read-only.
Select True to make the output file read-only.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Styles
This property determines whether styles are retained in the output file.
Select True to include styles from the original document.
This property determines how tables are handled in the output file.
Tables
l
l
PaperVision® Capture Administration Guide
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
186
Chapter 9 Nuance Full-Text OCR
Microsoft Word 2007
Property
Description
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Tabs
Microsoft Word 97
The Microsoft Word 97 output type generates a .doc file type and uses features supported by Microsoft
Word 97.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
Microsoft Word 97
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
Headers/Footers
l
l
PaperVision® Capture Administration Guide
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
187
Chapter 9 Nuance Full-Text OCR
Microsoft Word 97
Property
Description
and footer text and encase the headers and footer in text boxes.
l
l
l
l
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Image in Text Box
This property determines whether to keep images in text boxes. Select
True to retain images in text boxes.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
PaperVision® Capture Administration Guide
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
188
Chapter 9 Nuance Full-Text OCR
Microsoft Word 97
Property
Description
l
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
PaperFlow Full-Text
The PaperFlow Full-Text output type generates a .txt file containing the full-text results that you can
subsequently import into the OCRFlow application. You can configure the OCR Page Properties. See
"Setting OCR Page Properties" on page 159 for more information.
PaperVision Full-Text
The PaperVision Full-Text output type generates a .txt file containing the full-text results that you can
subsequently import into the PaperVision Enterprise application. You can configure the OCR Page
Properties. See "Setting OCR Page Properties" on page 159 for more information.
PDF
The PDF output type supports several PDF features and is dependent upon the positions of recognized
characters. When the Output Format property is set to True Page, the resulting PDF is viewable,
searchable and editable in a PDF viewer.
PDF Property
Description
This property determines the color quality in the output file.
Color Quality
Compress (Contents)
l
Select Minimum for the least color quality.
l
Select Good for better than minimal color quality.
l
Select Lossless (Best Quality) for the best possible color quality.
This property determines whether text content and line art are
compressed. Select True to compress contents and line art.
PaperVision® Capture Administration Guide
189
Chapter 9 Nuance Full-Text OCR
PDF Property
Description
Compress (Embedded Files)
This property determines whether embedded files are compressed. Select
True to compress embedded files.
Compress (Flate)
This property determines whether flate compression is applied. This
compression is suitable for use on images with large areas of single colors
or repeating patterns. Select True to apply flate compression.
Compress (JBIG2)
This property determines whether JBIG2 compression is applied. This
compression is suitable for use on highly-compressed black and white
images or monochrome images. Select True to apply JBIG2 compression.
Compress (JPEG2000)
This property determines whether JPEG2000 compression is applied. This
compression is suitable for photographs or images with gradual color
changes. Select True to apply JPEG2000 compression.
Compress (LZW)
This property determines whether LZW compression is applied. This
compression is suitable for compressing text files. It reduces file size and
is suitable for use with .gif images from web sites and TIFF images. Select
True to apply LZW compression.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Enable Assembly
This property determines whether document assembly (insert, rotate, and
delete pages) is enabled. Select True to enable document assembly.
Enable Commenting
This property determines whether the ability to change or edit comments
and forms fields is enabled. Select True to enable document commenting.
Enable Copying
This property determines whether the ability to copy content is enabled.
Select True to enable content copying.
Enable Extraction
This property determines whether content can be copied for accessibility.
Select True to enable the copying of content for accessibility.
Enable Form Filling
This property determines whether form fields can be filled in. Select True
to enable the filling in of forms.
Enable HQ Print
This property determines whether high-quality printing is enabled. Select
True to enable high-quality printing.
Enable Modification
This property determines whether the document can be changed. Select
True to enable document modification.
Enable Print
This property determines whether the document can be printed. Select
True to enable document printing.
This property determines the level of encryption.
l
Encryption Level
l
l
PaperVision® Capture Administration Guide
Select None to apply no encryption.
Select 40-bit RC4 to apply the lowest encryption level (used in
Adobe Acrobat 3.x and 4.x).
Select 128-bit RC4 to apply a medium encryption level (used in
190
Chapter 9 Nuance Full-Text OCR
PDF Property
Description
Adobe Acrobat 5.x and later).
l
Select 128-bit AES to apply the highest encryption level (used in
Adobe Acrobat 7.x and later).
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Image Substitutes
This property determines whether suspect words are covered with small
images. Select True to enable image substitutes.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Linearized PDF
This property determines whether PDF files are optimized for efficient web
display. When set to True, the first page will load quickly into a web page,
and the remaining pages will load while the PDF file is being viewed. The
browser determines which page elements appear first (typically, headings
and text), and the elements that follow, for example, larger pictures.
Efficiency is also optimized when you skip to another page in the PDF file.
Mixed Raster Content
This property determines the level of Mixed Raster Content (MRC) in the
output file. (MRC is a process that uses image segmentation methods to
improve contrast resolution of raster images comprised of pixels.)
PaperVision® Capture Administration Guide
191
Chapter 9 Nuance Full-Text OCR
PDF Property
Description
l
Select No MRCto apply no compression.
l
Select Medium Compression for medium quality.
l
l
Outline Props
Select Lossless Compression (Best Quality) for the best
possible quality.
Select Best Compression (Smallest File Size) to produce the
smallest file size.
This property determines whether to retain bookmarks for pages. Select
True to retain bookmarks.
This property determines the format retention for the output file.
Output Format
l
Select True Page to retain the original page and column layout
(with text, pictures, table boxes, and frames).
Password (Open)
This property defines a password that must be entered to open the file.
Type the password you want to use.
Password (Permissions)
This property defines a password required to set or change the restricted
features found under the Enable options of the PDF file. Type the
password you want to use.
NOTES for PDF Password Use:
If you apply any password (Open or Permissions), you must also select the appropriate Encryption Level setting.
If you apply both types of passwords, either password will open the PDF. However, only the Permissions
password will allow the recipient to set or change the applicable restricted features. If you apply only a
Permissions password, a password is not required to open the document, but the user must type the
Permissions password to set or change the restricted features. By default, the "Enable" options are set to "True,"
and cannot be set to "False" until a Permissions password is assigned. So, after you assign a Permissions
password, you must then set the appropriate "Enable" options to "False."
This property lets you select the compatible PDF version or optimization
type. You can choose from the following options:
l
Optimize for Quality
l
Optimize for Size
l
PDF 1.0
l
PDF 1.1
l
PDF 1.2
l
PDF 1.3
l
PDF 1.4
l
PDF 1.5
l
PDF 1.6
l
PDF-A
PDF Compatibility
PaperVision® Capture Administration Guide
192
Chapter 9 Nuance Full-Text OCR
PDF Property
Description
PDF Form Visuality
This property determines whether the visual components are displayed for
a PDF form. Select True to display the visual components.
PDF Form Visuality (User Set)
This property determines whether the visual components are set by the
user. Select True to have the PDF for visuality set by the user.
PDF Thumbnails
This property determines whether thumbnail images are created in the
output file. Select True to create thumbnail images.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Signature (Certification
Description)
This property defines a description for the signature’s certificate. Type the
description you want to use.
Signature (SHA Thumbprint)
This property defines an SHA thumbprint for the signature. Type the SHA1
thumbprint that you want to use.
This property lets you select a signature handler type. You can choose
from the following options:
Signature Type
l
None
l
PPKLite
l
VeriSign
URL (Highlight)
This property determines whether the URL address is highlighted in the
output file. Select True to highlight the URL.
URL (Underline)
This property determines whether the URL address is underlined in the
output file. Select True to underline the URL.
PDF Edited
Unlike the PDF output type, the PDF Edited output type does not rely on the position of recognized
characters, so you can insert sections of text in the editor. The PDF Edited output type is recommended if
you have made significant edits in the recognition results. The resulting PDF file is viewable, searchable, and
editable.
PDF Edited Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines the color quality in the output file.
Color Quality
l
Select Minimum for the least color quality.
l
Select Good for better than minimal color quality.
l
Select Lossless (Best Quality) for the best possible color quality.
PaperVision® Capture Administration Guide
193
Chapter 9 Nuance Full-Text OCR
PDF Edited Property
Description
Compress (Contents)
This property determines whether text content and line art are compressed.
Select True to compress contents and line art.
Compress (Embedded Files)
This property determines whether embedded files are compressed. Select
True to compress embedded files.
Compress (Flate)
This property determines whether flate compression is applied. This
compression is suitable for use on images with large areas of single colors
or repeating patterns. Select True to apply flate compression.
Compress (JBIG2)
This property determines whether JBIG2 compression is applied. This
compression is suitable for use on highly-compressed black and white
images or monochrome images. Select True to apply JBIG2 compression.
Compress (JPEG2000)
This property determines whether JPEG2000 compression is applied. This
compression is suitable for photographs or images with gradual color
changes. Select True to apply JPEG2000 compression.
Compress (LZW)
This property determines whether LZW compression is applied. This
compression is suitable for compressing text files. It reduces file size and
is suitable for use with .gif images from web sites and TIFF images. Select
True to apply LZW compression.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Enable Assembly
This property determines whether document assembly (insert, rotate, and
delete pages) is enabled. Select True to enable document assembly.
Enable Commenting
This property determines whether the ability to change or edit comments
and forms fields is enabled. Select True to enable document commenting.
Enable Copying
This property determines whether the ability to copy content is enabled.
Select True to enable content copying.
Enable Extraction
This property determines whether content can be copied for accessibility.
Select True to enable the copying of content for accessibility.
Enable Form Filling
This property determines whether form fields can be filled in. Select True
to enable the filling in of forms.
Enable HQ Print
This property determines whether high-quality printing is enabled. Select
True to enable high-quality printing.
Enable Modification
This property determines whether the document can be changed. Select
True to enable document modification.
Enable Print
This property determines whether the document can be printed. Select
True to enable document printing.
Encryption Level
This property determines the level of encryption.
PaperVision® Capture Administration Guide
194
Chapter 9 Nuance Full-Text OCR
PDF Edited Property
Description
l
l
l
l
Select None to apply no encryption.
Select 40-bit RC4 to apply the lowest encryption level (used in
Adobe Acrobat 3.x and 4.x).
Select 128-bit RC4 to apply a medium encryption level (used in
Adobe Acrobat 5.x and later).
Select 128-bit AES to apply the highest encryption level (used in
Adobe Acrobat 7.x and later).
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
Fonts (External)
This property determines whether external fonts are included in the output
file. Select True to include external fonts.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
PaperVision® Capture Administration Guide
195
Chapter 9 Nuance Full-Text OCR
PDF Edited Property
Linearized PDF
Description
This property determines whether PDF files are optimized for efficient web
display. When set to True, the first page will load quickly into a web page,
and the remaining pages will load while the PDF file is being viewed. The
browser determines which page elements appear first (typically, headings
and text), and the elements that follow, for example, larger pictures.
Efficiency is also optimized when you skip to another page in the PDF file.
This property determines the level of Mixed Raster Content (MRC) in the
output file. (MRC is a process that uses image segmentation methods to
improve contrast resolution of raster images comprised of pixels.)
Mixed Raster Content
l
Select No MRCto apply no compression.
l
Select Medium Compression for medium quality.
l
l
Outline Props
Select Lossless Compression (Best Quality) for the best
possible quality.
Select Best Compression (Smallest File Size) to produce the
smallest file size.
This property determines whether to retain bookmarks for pages. Select
True to retain bookmarks.
This property determines the format retention for the output file.
l
Output Format
l
l
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
Password (Open)
This property defines a password that must be entered to open the file.
Type the password you want to use.
Password (Permissions)
This property defines a password required to set or change the restricted
features found under the Enable options of the PDF file. Type the
password you want to use.
NOTES for PDF Password Use:
If you apply any password (Open or Permissions), you must also select the appropriate Encryption Level setting.
If you apply both types of passwords, either password will open the PDF. However, only the Permissions
password will allow the recipient to set or change the applicable restricted features. If you apply only a
Permissions password, a password is not required to open the document, but the user must type the
Permissions password to set or change the restricted features. By default, the "Enable" options are set to "True,"
and cannot be set to "False" until a Permissions password is assigned. So, after you assign a Permissions
password, you must then set the appropriate "Enable" options to "False."
PDF Compatibility
This property lets you select the compatible PDF version or optimization
PaperVision® Capture Administration Guide
196
Chapter 9 Nuance Full-Text OCR
PDF Edited Property
Description
type. You can choose from the following options:
l
Optimize for Quality
l
Optimize for Size
l
PDF 1.0
l
PDF 1.1
l
PDF 1.2
l
PDF 1.3
l
PDF 1.4
l
PDF 1.5
l
PDF 1.6
l
PDF-A
PDF Form Visuality
This property determines whether the visual components are displayed for
a PDF form. Select True to display the visual components.
PDF Form Visuality (User Set)
This property determines whether the visual components are set by the
user. Select True to have the PDF for visuality set by the user.
PDF Forms
This property determines whether form layers are shown in the output file.
Select True to show form layers.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Signature (Certification
Description)
This property defines a description for the signature’s certificate. Type the
description you want to use.
Signature (SHA Thumbprint)
This property defines an SHA thumbprint for the signature. Type the SHA1
thumbprint that you want to use.
This property lets you select a signature handler type. You can choose
from the following options:
Signature Type
l
None
l
PPKLite
l
VeriSign
Styles
This property determines whether styles are retained in the output file.
Select True to include styles from the original document.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Title
This property defines the title for the output file.Type the title you want to
use.
URL (Highlight)
This property determines whether the URL address is highlighted in the
PaperVision® Capture Administration Guide
197
Chapter 9 Nuance Full-Text OCR
PDF Edited Property
Description
output file. Select True to highlight the URL.
URL (Underline)
This property determines whether the URL address is underlined in the
output file. Select True to underline the URL.
PDF Searchable Image
Suitable for archiving and indexing, the PDF Searchable Image output type retains the original image in the
foreground and preserves recognized text in the background. This output type allows the OCR contents of an
image-based PDF to remain searchable without compromising the original (hidden) text layer. Text is
positioned directly behind corresponding image text, making it searchable and selectable in most PDF
viewers. The resulting PDF file is viewable only and cannot be modified in a PDF editor. Words recognized in
a document are highlighted in the image.
PDF Searchable
Image Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines the color quality in the output file.
Color Quality
l
Select Minimum for the least color quality.
l
Select Good for better than minimal color quality.
l
Select Lossless (Best Quality) for the best possible color quality.
Compress (Contents)
This property determines whether text content and line art are compressed.
Select True to compress contents and line art.
Compress (Embedded Files)
This property determines whether embedded files are compressed. Select
True to compress embedded files.
Compress (Flate)
This property determines whether flate compression is applied. This
compression is suitable for use on images with large areas of single colors
or repeating patterns. Select True to apply flate compression.
Compress (JBIG2)
This property determines whether JBIG2 compression is applied. This
compression is suitable for use on highly-compressed black and white
images or monochrome images. Select True to apply JBIG2 compression.
Compress (JPEG2000)
This property determines whether JPEG2000 compression is applied. This
compression is suitable for photographs or images with gradual color
changes. Select True to apply JPEG2000 compression.
Compress (LZW)
This property determines whether LZW compression is applied. This
compression is suitable for compressing text files. It reduces file size and
is suitable for use with .gif images from web sites and TIFF images. Select
True to apply LZW compression.
PaperVision® Capture Administration Guide
198
Chapter 9 Nuance Full-Text OCR
PDF Searchable
Image Property
Description
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Enable Assembly
This property determines whether document assembly (insert, rotate, and
delete pages) is enabled. Select True to enable document assembly.
Enable Commenting
This property determines whether the ability to change or edit comments
and forms fields is enabled. Select True to enable document commenting.
Enable Copying
This property determines whether the ability to copy content is enabled.
Select True to enable content copying.
Enable Extraction
This property determines whether content can be copied for accessibility.
Select True to enable the copying of content for accessibility.
Enable Form Filling
This property determines whether form fields can be filled in. Select True
to enable the filling in of forms.
Enable HQ Print
This property determines whether high-quality printing is enabled. Select
True to enable high-quality printing.
Enable Modification
This property determines whether the document can be changed. Select
True to enable document modification.
Enable Print
This property determines whether the document can be printed. Select
True to enable document printing.
This property determines the level of encryption.
l
l
Encryption Level
l
l
Fonts (External)
Select None to apply no encryption.
Select 40-bit RC4 to apply the lowest encryption level (used in
Adobe Acrobat 3.x and 4.x).
Select 128-bit RC4 to apply a medium encryption level (used in
Adobe Acrobat 5.x and later).
Select 128-bit AES to apply the highest encryption level (used in
Adobe Acrobat 7.x and later).
This property determines whether external fonts are included in the output
file. Select True to include external fonts.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Image Color
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
PaperVision® Capture Administration Guide
24-bit Color (True Color)
199
Chapter 9 Nuance Full-Text OCR
PDF Searchable
Image Property
Description
l
Grayscale
l
Black and White
l
Original
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Layer Type
This property determines whether the layer type is displayed in the output
file. Select True to display the layer type.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Linearized PDF
This property determines whether PDF files are optimized for efficient web
display. When set to True, the first page will load quickly into a web page,
and the remaining pages will load while the PDF file is being viewed. The
browser determines which page elements appear first (typically, headings
and text), and the elements that follow, for example, larger pictures.
Efficiency is also optimized when you skip to another page in the PDF file.
This property determines the level of Mixed Raster Content (MRC) in the
output file. (MRC is a process that uses image segmentation methods to
improve contrast resolution of raster images comprised of pixels.)
Mixed Raster Content
l
Select No MRCto apply no compression.
l
Select Medium Compression for medium quality.
l
l
Outline Props
Select Lossless Compression (Best Quality) for the best
possible quality.
Select Best Compression (Smallest File Size) to produce the
smallest file size.
This property determines whether to retain bookmarks for pages. Select
True to retain bookmarks.
This property determines the format retention for the output file.
Output Format
l
PaperVision® Capture Administration Guide
Select True Page to retain the original page and column layout (with
200
Chapter 9 Nuance Full-Text OCR
PDF Searchable
Image Property
Description
text, pictures, table boxes, and frames).
Password (Open)
This property defines a password that must be entered to open the file.
Type the password you want to use.
Password (Permissions)
This property defines a password required to set or change the restricted
features found under the Enable options of the PDF file. Type the
password you want to use.
NOTES for PDF Password Use:
If you apply any password (Open or Permissions), you must also select the appropriate Encryption Level setting.
If you apply both types of passwords, either password will open the PDF. However, only the Permissions
password will allow the recipient to set or change the applicable restricted features. If you apply only a
Permissions password, a password is not required to open the document, but the user must type the
Permissions password to set or change the restricted features. By default, the "Enable" options are set to "True,"
and cannot be set to "False" until a Permissions password is assigned. So, after you assign a Permissions
password, you must then set the appropriate "Enable" options to "False."
This property lets you select the compatible PDF version or optimization
type. You can choose from the following options:
l
Optimize for Quality
l
Optimize for Size
l
PDF 1.0
l
PDF 1.1
l
PDF 1.2
l
PDF 1.3
l
PDF 1.4
l
PDF 1.5
l
PDF 1.6
l
PDF-A
PDF Compatibility
PDF Thumbnail
This property determines whether thumbnail images are created in the
output file. Select True to create thumbnail images.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Signature (Certification
Description)
This property defines a description for the signature’s certificate. Type the
description you want to use.
Signature (SHA Thumbprint)
This property defines an SHA thumbprint for the signature. Type the SHA1
thumbprint that you want to use.
Signature Type
This property lets you select a signature handler type. You can choose
PaperVision® Capture Administration Guide
201
Chapter 9 Nuance Full-Text OCR
PDF Searchable
Image Property
Description
from the following options:
l
None
l
PPKLite
l
VeriSign
Styles
This property determines whether styles are retained in the output file.
Select True to include styles from the original document.
URL (Highlight)
This property determines whether the URL address is highlighted in the
output file. Select True to highlight the URL.
URL (Underline)
This property determines whether the URL address is underlined in the
output file. Select True to underline the URL.
PDF with Image Substitutes
With the PDF with Image Substitutes output type, reject and suspect characters contain image overlays in
the resulting output file, so uncertain characters display as they appeared in the original document. The
resulting PDF file is viewable, editable, and searchable.
PDF with Image
Substitutes Property
Bullets
Description
This property determines whether bullets are retained in the output file.
Select True to include bullets.
This property determines the color quality in the output file.
Color Quality
l
Select Minimum for the least color quality.
l
Select Good for better than minimal color quality.
l
Select Lossless (Best Quality) for the best possible color quality.
Compress (Contents)
This property determines whether text content and line art are compressed.
Select True to compress contents and line art.
Compress (Embedded Files)
This property determines whether embedded files are compressed. Select
True to compress embedded files.
Compress (Flate)
This property determines whether flate compression is applied. This
compression is suitable for use on images with large areas of single colors
or repeating patterns. Select True to apply flate compression.
Compress (JBIG2)
This property determines whether JBIG2 compression is applied. This
compression is suitable for use on highly-compressed black and white
images or monochrome images. Select True to apply JBIG2 compression.
PaperVision® Capture Administration Guide
202
Chapter 9 Nuance Full-Text OCR
PDF with Image
Substitutes Property
Description
Compress (JPEG2000)
This property determines whether JPEG2000 compression is applied. This
compression is suitable for photographs or images with gradual color
changes. Select True to apply JPEG2000 compression.
Compress (LZW)
This property determines whether LZW compression is applied. This
compression is suitable for compressing text files. It reduces file size and
is suitable for use with .gif images from web sites and TIFF images. Select
True to apply LZW compression.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Enable Assembly
This property determines whether document assembly (insert, rotate, and
delete pages) is enabled. Select True to enable document assembly.
Enable Commenting
This property determines whether the ability to change or edit comments
and forms fields is enabled. Select True to enable document commenting.
Enable Copying
This property determines whether the ability to copy content is enabled.
Select True to enable content copying.
Enable Extraction
This property determines whether content can be copied for accessibility.
Select True to enable the copying of content for accessibility.
Enable Form Filling
This property determines whether form fields can be filled in. Select True
to enable the filling in of forms.
Enable HQ Print
This property determines whether high-quality printing is enabled. Select
True to enable high-quality printing.
Enable Modification
This property determines whether the document can be changed. Select
True to enable document modification.
Enable Print
This property determines whether the document can be printed. Select
True to enable document printing.
This property determines the level of encryption.
l
l
Encryption Level
l
l
Select None to apply no encryption.
Select 40-bit RC4 to apply the lowest encryption level (used in
Adobe Acrobat 3.x and 4.x).
Select 128-bit RC4 to apply a medium encryption level (used in
Adobe Acrobat 5.x and later).
Select 128-bit AES to apply the highest encryption level (used in
Adobe Acrobat 7.x and later).
Fonts (External)
This property determines whether external fonts are included in the output
file. Select True to include external fonts.
Headers/Footers
This property determines how headers and footers appear in the output file.
PaperVision® Capture Administration Guide
203
Chapter 9 Nuance Full-Text OCR
PDF with Image
Substitutes Property
Description
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Image Substitutes
This property determines whether suspect words are covered with small
images. Select True to enable image substitutes.
Layer Type
This property determines whether the layer type is displayed in the output
file. Select True to display the layer type.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Linearized PDF
This property determines whether PDF files are optimized for efficient web
display. When set to True, the first page will load quickly into a web page,
and the remaining pages will load while the PDF file is being viewed. The
browser determines which page elements appear first (typically, headings
and text), and the elements that follow, for example, larger pictures.
Efficiency is also optimized when you skip to another page in the PDF file.
Mixed Raster Content
This property determines the level of Mixed Raster Content (MRC) in the
output file. (MRC is a process that uses image segmentation methods to
improve contrast resolution of raster images comprised of pixels.)
PaperVision® Capture Administration Guide
204
Chapter 9 Nuance Full-Text OCR
PDF with Image
Substitutes Property
Description
l
Select No MRCto apply no compression.
l
Select Medium Compression for medium quality.
l
l
Outline Props
Select Lossless Compression (Best Quality) for the best
possible quality.
Select Best Compression (Smallest File Size) to produce the
smallest file size.
This property determines whether to retain bookmarks for pages. Select
True to retain bookmarks.
This property determines the format retention for the output file.
Output Format
l
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Password (Open)
This property defines a password that must be entered to open the file.
Type the password you want to use.
Password (Permissions)
This property defines a password required to set or change the restricted
features found under the Enable options of the PDF file. Type the
password you want to use.
NOTES for PDF Password Use:
If you apply any password (Open or Permissions), you must also select the appropriate Encryption Level setting.
If you apply both types of passwords, either password will open the PDF. However, only the Permissions
password will allow the recipient to set or change the applicable restricted features. If you apply only a
Permissions password, a password is not required to open the document, but the user must type the
Permissions password to set or change the restricted features. By default, the "Enable" options are set to "True,"
and cannot be set to "False" until a Permissions password is assigned. So, after you assign a Permissions
password, you must then set the appropriate "Enable" options to "False."
This property lets you select the compatible PDF version or optimization
type. You can choose from the following options:
PDF Compatibility
l
Optimize for Quality
l
Optimize for Size
l
PDF 1.0
l
PDF 1.1
l
PDF 1.2
l
PDF 1.3
l
PDF 1.4
PaperVision® Capture Administration Guide
205
Chapter 9 Nuance Full-Text OCR
PDF with Image
Substitutes Property
Description
l
PDF 1.5
l
PDF 1.6
l
PDF-A
PDF Form Visuality
This property determines whether the visual components are displayed for
a PDF form. Select True to display the visual components.
PDF Thumbnail
This property determines whether thumbnail images are created in the
output file. Select True to create thumbnail images.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Signature (Certification
Description)
This property defines a description for the signature’s certificate. Type the
description you want to use.
Signature (SHA Thumbprint)
This property defines an SHA thumbprint for the signature. Type the SHA1
thumbprint that you want to use.
This property lets you select a signature handler type. You can choose
from the following options:
Signature Type
l
None
l
PPKLite
l
VeriSign
Styles
This property determines whether styles are retained in the output file.
Select True to include styles from the original document.
URL (Highlight)
This property determines whether the URL address is highlighted in the
output file. Select True to highlight the URL.
URL (Underline)
This property determines whether the URL address is underlined in the
output file. Select True to underline the URL.
RTF 2000 ExactWord
The RTF 2000 ExactWord output type corrects pagination errors by making minor modifications to spacing
values.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
PaperVision® Capture Administration Guide
206
Chapter 9 Nuance Full-Text OCR
RTF 2000
ExactWord Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
Image Color
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
PaperVision® Capture Administration Guide
207
Chapter 9 Nuance Full-Text OCR
RTF 2000
ExactWord Property
Description
l
Original
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
No Textbox
This property determines whether to exclude text boxes from the output
files. Select True to exclude text boxes.
This property determines the format retention for the output file.
l
Output Format
l
l
l
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
l
Auto
l
Always
l
Never
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
PaperVision® Capture Administration Guide
208
Chapter 9 Nuance Full-Text OCR
RTF 2000
ExactWord Property
Description
Page Margins
This property determines whether to retain the page margins in the output
file. Select True to retain the page margins.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
RTF Word 2000
The RTF Word 2000 output type generates files interpreted by most .rtf readers and uses features only
supported by Word 2000 and later.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
RTF Word 2000
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
Headers/Footers
l
PaperVision® Capture Administration Guide
Select Ignore Headers/Footers to disregard header and footer
text.
209
Chapter 9 Nuance Full-Text OCR
RTF Word 2000
Property
Description
l
l
l
l
l
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
PaperVision® Capture Administration Guide
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
210
Chapter 9 Nuance Full-Text OCR
RTF Word 2000
Property
Description
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
l
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
l
Auto
l
Always
l
Never
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
RTF Word 6.0/95
Based on Version 1.3 of the RTF Specification, the RTF Word 6.0/95 output type generates a file interpreted
by most RTF editors, but may be significantly larger than more recent RTF converters.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
RTF Word 6.0/95
Property
Description
Anchor Paragraphs
This property determines whether paragraphs are anchored in the output
file. Select True to anchor paragraphs.
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
PaperVision® Capture Administration Guide
211
Chapter 9 Nuance Full-Text OCR
RTF Word 6.0/95
Property
Description
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
Image DPI
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
PaperVision® Capture Administration Guide
212
Chapter 9 Nuance Full-Text OCR
RTF Word 6.0/95
Property
Description
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Image in Text Box
This property determines whether to keep images in text boxes. Select
True to retain images in text boxes.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
l
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
l
Auto
l
Always
l
Never
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
PaperVision® Capture Administration Guide
213
Chapter 9 Nuance Full-Text OCR
RTF Word 6.0/95
Property
Description
output file. Select True to include the original tab positions.
Title
This property determines the title for the output file. Type the title you want
to use.
Word 2000 and Later
This property determines whether output file is compatible with Word 2000
and later versions. Select True to make the output file compatible with
Word 2000 and later.
RTF Word 97
The RTF Word 97 output type generates a file that uses features interpreted by Microsoft Word 97 and later
or by RTF readers with similar compatibility.
NOTE: The page width and height must be between 0.1 and 22 inches for all Microsoft Word output
types and those that create files in the .rtf file format. Otherwise, an error will appear if you set the Output
Format property to Flowing Page or True Page output formats with .doc(x) and .rtf file extensions.
RTF Word 97
Property
Description
Anchor Paragraphs
This property determines whether paragraphs are anchored in the output
file. Select True to anchor paragraphs.
Bookmark in Every Paragraph
This property determines whether a bookmark is inserted in every
paragraph. Select True to insert bookmarks.
Box Wrapping
This property determines whether content is wrapped around text boxes.
Select True to wrap content around text boxes.
Boxes
This property determines whether text boxes are included in the output file.
Select True to include text boxes.
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
Character Scaling
This property determines whether character scaling is retained in the
output file. Select True to retain character scaling.
Character Spacing
This property determines whether character spacing is retained in the
output file. Select True to retain character spacing. When this property is
set to True, text characters can be expanded or condensed in the output
file. If images contain text with approximately two spaces between words,
PaperVision® Capture Administration Guide
214
Chapter 9 Nuance Full-Text OCR
RTF Word 97
Property
Description
a single space is generated; if four or five spaces exist between words, a
tab is generated.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
PaperVision® Capture Administration Guide
215
Chapter 9 Nuance Full-Text OCR
RTF Word 97
Property
Description
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
l
Output Format
l
l
l
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
l
Auto
l
Always
l
Never
Page Color
This property determines whether to retain the background color in the
output file. Select True to include the background color.
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
PaperVision® Capture Administration Guide
216
Chapter 9 Nuance Full-Text OCR
Text
The Text output type writes recognized text into a simple text (.txt) file that can be interpreted by most text
editors and word processors.
Text Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
l
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Tabs
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Tabs (Convert to Spaces)
This property determines whether to convert tabs into spaces in the output
file. Select True to convert tabs into spaces.
Text - Comma Separated
The Text - Comma Separated output type writes the recognized text into a comma-delimited .csv file that
can be interpreted by Microsoft Excel. If you enable the List Separator property, you can configure it to
separate the cells in the output file.
Text - Comma
Separated Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM Multi-
PaperVision® Capture Administration Guide
217
Chapter 9 Nuance Full-Text OCR
Text - Comma
Separated Property
Description
Lingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
List Separator
This property lets you enter a string to separate cells in the .csv file. Enter
the string you want to use, for example, r;\t.
List Separator (Include)
This property determines whether to include the list separator in the output
file. Select True to include the list separator.
Text - Formatted
The Text - Formatted output type writes the recognized text into a text file while attempting to retain the page
layout by inserting extra spaces.
Text - Formatted Property
Code Page
Description
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Line Numbering Zones
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
l
PaperVision® Capture Administration Guide
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
218
Chapter 9 Nuance Full-Text OCR
Text with Line Breaks
The Text with Line Breaks output type inserts line breaks at the end of each line, rather than inserting them
at the end of each paragraph.
Text with Line Breaks
Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
l
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Tabs (Convert to Spaces)
This property determines whether to convert tabs into spaces in the output
file. Select True to convert tabs into spaces.
Unicode Text
The Unicode Text output type writes recognized text into a simple text (.txt) file that can be interpreted by
most text editors and word processors. The Unicode Text output type uses two-byte Unicode characters.
Unicode Text Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Line Breaks
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property determines whether to insert line breaks in the output file.
PaperVision® Capture Administration Guide
219
Chapter 9 Nuance Full-Text OCR
Unicode Text Property
Description
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
l
Select Ignore All to disregard all format styles in the original file.
Page Breaks
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
Tabs (Convert to Spaces)
This property determines whether to convert tabs into spaces in the output
file. Select True to convert tabs into spaces.
Unicode Text - Comma Separated
The Unicode Text - Comma Separated output type writes the recognized text (using two-byte Unicode
characters) into a comma-delimited .csv file that can be interpreted by Microsoft Excel. If you enable the List
Separator property, you can configure it to separate the cells in the output file.
Unicode Text - Comma
Separated Property
Description
Application Extension
This property determines the application extension for the output file, for
example, .csv and .txt. Type the application extension you want to use.
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
List Separator
This property lets you enter a string to separate cells in the .csv file. Enter
the string you want to use, for example, r;\t.
List Separator (Include)
This property determines whether to include the list separator in the output
file. Select True to include the list separator.
This property determines the format retention for the output file.
Output Format
l
Page Breaks
Select Ignore All to disregard all format styles in the original file.
This property determines whether page breaks are retained in the output
file. Select True to include page breaks.
PaperVision® Capture Administration Guide
220
Chapter 9 Nuance Full-Text OCR
Unicode Text - Formatted
The Unicode Text - Formatted output type writes the recognized text (using two-byte Unicode characters)
into a text file while attempting to retain the page layout by inserting extra spaces.
Unicode Text - Formatted
Property
Code Page
Description
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Line Numbering Zones
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Unicode Text with Line Breaks
The Unicode Text with Line Breaks output type inserts line breaks at the end of each line (using two-byte
Unicode characters), rather than inserting them at the end of each paragraph.
Unicode Text with
Line Breaks Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Code Page
This property lets you specify a code page (for example, IBM MultiLingual and Mac Central EU) whose language will be recognized in the
output file.Select the code page you want to use from the list.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
PaperVision® Capture Administration Guide
221
Chapter 9 Nuance Full-Text OCR
Unicode Text with
Line Breaks Property
Tabs (Convert to Spaces)
Description
This property determines whether to convert tabs into spaces in the output
file. Select True to convert tabs into spaces.
Wave Audio
The Wave Audio output type generates a Microsoft .wav audio file that reads recognized text aloud with an
English (U.S. or U.K.), French, or German speaking voice.
NOTE: In addition to the Capture Full-Text OCR license, the Wave Audio converter requires an
additional software license to run.
Wave Audio Property
Description
This property determines the mode in which the output file is saved.You can
choose from the following options:
Save Mode
l
Separated Pages
l
Job Separator
l
All Pages
This property determines the speed of the speaking voice in the output file.
You can choose from the following options:
Speech Rate
l
Slowest
l
Slow
l
Normal
l
Fast
l
Fastest
This property determines the language for the speaking voice. You can
choose from the following options:
Voice Name
l
English
l
US English
l
French
l
German
The language used in the Wave Audio speaking voice is determined by
the order in which folders appear in the PaperVision
Capture\OCR\speech\rssolov4 directory where PaperVision Capture was
PaperVision® Capture Administration Guide
222
Chapter 9 Nuance Full-Text OCR
Wave Audio Property
Description
installed. Folders residing in this directory include the following:
1. eng (English-U.K.)
2. enu (English-U.S.)
3. frf (French)
4. ged (German)
NOTE: Do not rename any language folders in the PaperVision
Capture\OCR\speech\rssolov4 directory. Otherwise, the Wave
Audio output type may not function properly.
WordPad
The WordPad output type generates an.rtf file that can be interpreted by Microsoft WordPad and most other
RTF readers.
WordPad Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Character Colors
This property determines whether character colors are retained in the
output file. Select True to retain character colors.
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
PaperVision® Capture Administration Guide
223
Chapter 9 Nuance Full-Text OCR
WordPad Property
Description
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
No Textbox
This property determines whether to exclude text boxes from the output
files. Select True to exclude text boxes.
This property determines the format retention for the output file.
l
Output Format
l
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
Tabs
l
Auto
l
Always
l
Never
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
WordPerfect 12
The WordPerfect 12 output type generates a WordPerfect binary file that supports features of WordPerfect
12 and later.
WordPerfect 12 Property
Description
Bullets
This property determines whether bullets are retained in the output file.
Select True to include bullets.
Column Breaks
This property determines whether column breaks are inserted in the output
file. Select True to insert column breaks.
Cross-References
This property determines whether cross references are retained in the
output file. Select True to include cross references.
Drop Caps
This property determines whether drop caps are retained in the output file.
PaperVision® Capture Administration Guide
224
Chapter 9 Nuance Full-Text OCR
WordPerfect 12 Property
Description
Select True to include drop caps.
Field Codes
This property determines whether field codes are retained in the output file.
Select True to include field codes.
This property determines how headers and footers appear in the output file.
l
l
Headers/Footers
l
l
l
l
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
This property lets you assign an image color for the output file. You can
choose from the following options:
l
24-bit Color (True Color)
l
Grayscale
l
Black and White
l
Original
Image Color
This property lets you select the resolution in Dots Per Inch (DPI) or
images in the output file. You can choose from the following options:
Image DPI
l
DPI 72
l
DPI 100
l
DPI 150
l
DPI 200
l
DPI 300
l
None
l
Original
Line Breaks
This property determines whether to insert line breaks in the output file.
Select True to insert line breaks between lines of recognized text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
Output Format
This property determines the format retention for the output file.
PaperVision® Capture Administration Guide
225
Chapter 9 Nuance Full-Text OCR
WordPerfect 12 Property
Description
l
l
l
l
Select Flowing Page to preserve the original page and column
layout so text flows across columns. Boxes and frames are used
only when necessary.
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
Select Formatted Text to retain text (without columns), paragraph
format, font, graphics, table styles, highlights,and strikeouts.
(Layout-related formatting is ignored.)
Select Ignore All to disregard all format styles in the original file.
This property lets you specify how page breaks are handled in the output
file. You can choose from the following options:
Page Breaks
l
Auto
l
Always
l
Never
Page Consolidation
This property determines whether page are consolidated in the output file.
Select True to consolidate pages.
Rule Lines
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
This property determines how tables are handled in the output file.
l
Tables
l
Select Convert to Separated by Tabs to convert table content to
columns separated by tab stops.
Select Retain Tables to retain all tables from the original file.
This property determines whether to retain the original tab positions in the
output file. Select True to include the original tab positions.
Tabs
XML
The XML output type generates a standard, plain-text .xml file.
XML Property
Description
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
l
PaperVision® Capture Administration Guide
Select Ignore Headers/Footers to disregard header and footer
text.
Select Tabulated Form In Box to place tab stops between header
and footer text and encase the headers and footer in text boxes.
Select In Boxes to encase the headers and footers in text boxes.
226
Chapter 9 Nuance Full-Text OCR
XML Property
Description
l
l
l
Select Auto Format to automatically format the headers and
footers to match the original style.
Select Tabulated Form to place tab stops between header and
footer text.
Select Convert to Plain Text to convert headers and footers to
plain text.
Line Numbering Zones
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
XSD Schema
This property determines whether to use an XML Schema Definition (XSD)
in the output file. Select True to use an XML Schema Definition (XSD).
XPS
The XPS output type generates a Microsoft XML-based Paper Specification (XPS) file, yielding the same
appearance on every output device.
NOTE: To view an XPS file, the .NET 3.5 Framework must be installed, which is included on the
PaperVision Capture installation media.
XPS Property
Description
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Line Numbering Zones
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
This property determines the format retention for the output file.
Output Format
Rule Lines
l
Select True Page to retain the original page and column layout (with
text, pictures, table boxes, and frames).
This property determines whether rule lines are retained in the output file.
Select True to include rule lines.
PaperVision® Capture Administration Guide
227
Chapter 9 Nuance Full-Text OCR
XPS Searchable Image
The XPS Searchable Image output type generates a Microsoft XML-based Paper Specification (XPS) file,
yielding all text as searchable.
NOTE: To view an XPS file, the .NET 3.5 Framework must be installed, which is included on the
PaperVision Capture installation media.
XPS Searchable
Image Property
Description
This property determines how headers and footers appear in the output file.
l
Headers/Footers
l
Line Numbering Zones
Select Ignore Headers/Footers to disregard header and footer
text.
Select Auto Format to automatically format the headers and
footers to match the original style.
This property determines whether the line numbering zones are retained in
output files. Select True to include the line numbering zones.
PaperVision® Capture Administration Guide
228
Chapter 10 Open Text Full-Text OCR
In PaperVision Capture, full-text OCR processing can be performed by the Open Text® engine that
recognizes machine-printed text (handwritten text is not recognized). Additionally, new line characters will be
removed during Open Text OCR processing. Within the Open Text Full-Text OCR step, you can configure an
automated process that reads pages of text and converts recognized results to one or multiple file types.
Each output type contains unique settings that you can configure to support your full-text OCR requirements.
During full-text processing, documents can be converted to several PDF versions, including those
compatible with PDF-A, 1.4, 1.5, 1.6, and 1.7. The engine also converts documents to PaperVision
Enterprise, PaperFlow, and text (.txt) output file types.
When you configure full-text OCR outputs and their associated properties, you can preview the full-text OCR
results before you process the batch of documents. Thumbnail previews display the document's images and allow
you to navigate through the document and perform basic operations including the cut/paste, copy/paste, and
delete operations.
To configure the Open Text Full-Text OCR settings
1. On the workspace of the Job Definitions window, select the Open Text Full-Text OCR job step.
2. On the Properties tab, expand Full-Text OCR Step.
3. Click the Outputs property, and then click the ellipsis button. The Edit Open Text Full-Text OCR
Settings screen appears.
Edit Open Text Full-Text OCR Settings
PaperVision® Capture Administration Guide
229
Chapter 10 Open Text Full-Text OCR
Supported Output File Types
PaperVision Capture supports the following Open Text full-text OCR output file types:
l
l
PaperFlow: The PaperFlow output is a text-based full-text output file that you can subsequently import into
OCRFlow.
PaperVision Enterprise: The PaperVision Enterprise output is a text-based full-text output file that you
can subsequently import into PaperVision Enterprise.
l
PDF: The PDF output produces a searchable PDF (.pdf) file compatible with your specified PDF version.
l
Text: The Text output produces a text (.txt) file.
Custom Code
You can configure custom code that reports OCR statistics when a page is processed through the Open Text
Full-Text OCR engine. For example, you can configure custom code to record each character's confidence
level by using the OCRFullTextPageStatistics sample script. Other custom code samples are located in the
Library\Samples directory (as text or XML files), where PaperVision Capture was installed.
To configure custom code Open Text Full-Text OCR statistics
1. In the Edit OCR Zones screen, click the ellipsis button next to the OCR Statistics field. The Select
Custom Code Generator dialog appears.
2. Select the Basic custom code generator, and then click OK. The Script Editor opens.
3. If desired, you can import the OCRFullTextPageStatistics script into the Script Editor. Click the Import
icon, and then browse to the Library\Samples directory where PaperVision Capture was installed.
4. Otherwise, insert your custom code into the Script Editor.
5. Click OK.
Auto Rotate
By default, this property is set to True, and the Open Text Full-Text OCR engine may automatically rotate
some images in order to recognize text. If you do not want the Open Text Full-Text OCR engine to
automatically rotate images prior to text recognition, set this property to False.
NOTE: Since the engine may automatically rotate some images in order to recognize text, the resulting
output images may also be rotated.
Brightness Sample Size
This value (indicating both width and height) specifies the rectangle size used to calculate the brightness
threshold. You can specify a value between 1 and 32, and the default value is 15.
NOTE: Smaller brightness sample sizes may cause the OCR engine to recognize extraneous noise on
the image.
PaperVision® Capture Administration Guide
230
Chapter 10 Open Text Full-Text OCR
Brightness Threshold
You can assign a brightness threshold value (between 0 and 255) for the image. The default value is 75.
Country/Language
When you select from the Country/Language property, your selection may reflect not only a country or
language, but country groups (e.g., Western Europe), language groups (e.g., Latin), and character sets (e.g.,
OCR). Each country corresponds to one or more languages, and countries are automatically expanded into
language sets (e.g., German corresponds to the German language; Switzerland corresponds to the German,
French, Italian, and Rhaeto-Romantic languages).
Specific languages are also available for selection under the Country/Language property (e.g., English,
German, Dutch, Italian, etc.). It is recommended to narrow your selection as much as possible since OCR
recognition may become slower with a greater number of selected countries or languages. It is also
recommended to select a country rather than a language or country group (e.g., Western Europe, South
America, Scandinavia) since the recognition of certain types of addresses and money transfer forms may
improve.
NOTE: You cannot select the OCR character set individually; it must be selected with another language,
language group, country, or country group. For a complete list of Open Text supported countries,
languages, country groups, and character sets, see the Open Text OCR Supported Countries/Languages
(Groups)/Character Sets topic.
Language/Groups
If you select a language group, it is recommended to select only one, since they encompass multiple
languages, countries, and code pages:
1. Cyrillic: Code page 1251
2. Greek: Code page 1253
3. Latin: Code pages 1250, 1252, 1254 and 1257 (i.e. Central Europe, Western Europe, Turkey, Baltic)
4. Azerbaijanian
NOTE: For language groups, recognition results are always represented by Unicode characters. The
English character set (A-Z, a-z) is implicitly available with all country-language selections, even Greek or
Cyrillic.
To select a country or language for full-text OCR output
1. After selecting an output type, click the ellipsis button to the right of the Country/Language property. The
Country/Language dialog box appears.
PaperVision® Capture Administration Guide
231
Chapter 10 Open Text Full-Text OCR
NOTE: If a country or language appears crossed out, it does not belong to the same code page as the
selected country or language. Therefore, countries or languages containing strikethroughs cannot be
added to the Selected list.
2. Highlight one or more countries/languages from the Available list, and then click the right arrow.
3. To remove one or more selections from the Selected list, highlight the countries/languages, and then click
the left arrow.
4. When finished with your selections, click OK.
Minimum Confidence
The confidence level reflects the reliability of the OCR recognition results. Values range from zero (the
default setting), the lowest confidence level, to 255, the highest confidence level indicating the most reliable
recognition results. Characters with lower confidence levels than your specified value will display as the
rejection symbol, which is the tilde (~) character by default.
NOTE: The Rejection Symbol property is available for configuration in text-based outputs (PaperFlow,
PaperVision Enterprise, and Text).
Remove Line System
The Remove Line System property determines whether the remove line system is enabled. Selecting True
will remove any lines from an image before the image is submitted for processing. Selecting False will keep
the lines as they are.
Timeout Value (in seconds)
This property allows you to define the maximum amount of time that the Open Text OCR engine processes a
single image before it fails. By default, this property is set to 180 seconds (3 minutes). You can assign a
timeout between one second and 3,600 seconds (1 hour).
NOTE: Raising the timeout setting may increase the amount of time to process all images.
Compression
You can set the level of compression applied to PDF outputs. The higher the compression, the smaller the
output file size. The default level of compression is medium. You can select from the following compression
levels:
l
None (no compression will be applied)
l
Low (low level of compression is applied)
l
Medium (medium level of compression is applied)
l
High (highest level of compression is applied)
PaperVision® Capture Administration Guide
232
Chapter 10 Open Text Full-Text OCR
PDF Version
You can select the compatible PDF version for PDF output files. The following versions are supported by the
full-text OCR engine:
l
PDF/A: Format for long-term archiving of electronic documents - with Level B compliance in Part 1 (1b)
l
PDF 1.4: Acrobat 5.0
l
PDF 1.5: Acrobat 6.0
l
PDF 1.6: Acrobat 7.0
l
PDF 1.7: Acrobat 8 and 9
Rejection Symbol
This property represents rejected characters in output documents. A rejected character is not recognized by
the active OCR recognition engine configuration. The default value is the Tilde character (~). Only a single
character can be entered in this field. The Rejection Symbol property is available for configuration in textbased outputs (PaperFlow, PaperVision Enterprise, and Text).
Tip: To prevent unrecognized characters from appearing in output documents, leave this field blank.
PaperVision® Capture Administration Guide
233
Chapter 11 Image Processing
When you use an Image Processing job step, you can configure image processing filters that run
automatically. Numerous image processing filters are available, and include:
l
l
l
l
Binary image processing filters such as dilation, erosion, halftone and hole removal, invert image, line and
noise removal, and others.
Color image processing filters to adjust, detect, convert, and remove color.
Page deletion filters that let you specify criteria to determine whether pages are retained in a batch.
Many other filters, such as cropping and redaction, that let you control the content of images.
The Image Processing job step also provides the flexibility to apply filters to an entire image or only to the
specific zones that you define.
When you configure image processing filters, you can view a side-by-side comparison of the original image
alongside the filtered image. Thumbnail previews display the document's images and you can navigate
through the document and perform basic operations including the cut/paste, copy/paste, and delete
operations. On the IP Filters grid, you can assign the page ranges to which a filter will be applied.
NOTE: Incoming color images can have maximum dimensions of 10,000 x 10,000 pixels when they are
processed through the Image Processing step. Bitonal (black and white) images can have slightly larger
dimensions. Larger images can be ingested into PaperVision Capture provided that no OCR will be
performed on the images; no image processing will be performed on the images; or, images will not be
viewed as thumbnails.
NOTE: See "Setting Common Job Step Properties" on page 53 for information on the settings applicable
to all job steps.
Configuring an Image Processing Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
l
l
To edit an existing job, select it, and then click Edit Job
To add a new job, click Create New Job
OK.
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
PaperVision® Capture Administration Guide
234
Chapter 11 Image Processing
6. Add the Image Processing step to the job using one of the following methods.
l
Select the job step that you want Image Processing to follow. On the Job Step Toolbox tab, doubleclick Image Processing.
l
On the Job Step Toolbox tab, drag Image Processing
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select Image Processing.
7. Double-click the Image Processing step to display the Properties tab on the left pane.
8. On the Properties tab, expand Image Processing.
9. Click the property you want to set, and then click the down arrow or the ellipsis button
option. The following properties are available.
l
l
to select an
Black and White Image File Type - This option specifies the file type for storing black and white
images. You can set this option to TIF or PNG. The default setting is TIF. TIF files are compressed
using Group 4 compression which treats an image as a series of horizontal black strips on a white page.
PNG files are compressed, so the file size is smaller. However, the applied compression is exactly
reversible, so the image is recovered exactly.
Color Image File Type - This option specifies the file type for storing grayscale and color images. You
can set this option to BMP, JPG or PNG. BMP files are not compressed and can be large. These files
contain pixels and can degrade when you increase resolution. JPG images are compressed, so they
contain less data and smaller file sizes than other image types. PNG files are compressed, so the file
size is smaller. However, the applied compression is exactly reversible, so the image is recovered
exactly.
NOTE: If you change the Black and White Image File Type or Color Image File Type property after
images are scanned or imported into the batch, the file type will change for only those images
subsequently added to the batch. For example, if you change the Black and White Image File Type
property setting from TIF to PNG after scanning or importing 10 out of 20 images in the batch, then
images 1-10 will be TIF file types, and images 11-20 will be PNG file types.
l
l
l
Filters- This option specifies the filters you want to apply to images. See "Configuring Image
Processing Filters" on page 236 for more information.
Prefer Bitonal - This option specifies whether the preference is to use bitonal (black and white) images
for image processing. When using only dual stream scanners, set this property to True.If you do not
prefer bitonal images for processing, set this option to False.
Save Image - This option specifies whether to save the processed image. If you want to keep only the
original image (before filters are applied), select False. The processed images will not be added to the
batch. For example, select False when you run an Image Processing step to delete all blank pages. To
save the processed image (after the filters are applied), select True. As a result, two copies of the
image will be in the batch: the original image and the processed image.
10. On the toolbar, click Save Job
to save the Image Processing step configuration.
PaperVision® Capture Administration Guide
235
Chapter 11 Image Processing
Configuring Image Processing Filters
You can configure, preview, and test image processing filters before applying them to the job. Zooming,
rotation, and scanning operations are available, as well as image import and removal functions. You can also
draw and configure IP zones if you only want specific regions to be processed.
1. If you haven’t already done so, complete the procedure under "Configuring an Image Processing Job Step"
on page 234 .
2. On the Job Definitions window, on the workspace, double-click the Image Processing job step to open
the Properties tab.
3. On the Properties tab, expand Image Processing.
4. Click Filters, and then click the ellipsis button
to open the Edit IP Filters window.
5. On the Edit IP Filters window, you can use the components described in the following table to complete
tasks.
Component
Thumbnails
pane
Source Image
pane
Resulting Image
pane
Description
You can right-click on the Thumbnails pane to access the Cut, Copy,
Paste, Delete, and Select All commands. You can drag-and-drop
thumbnails to a different location. Images viewed as thumbnails can
have maximum dimensions of 32,768 x 32,768 pixels. If there are more
images loaded than can be displayed on the Thumbnails pane, use the
scroll bar on the right side of the pane to view them.
The Source Image pane displays the original, unfiltered image.
The Resulting Image pane displays the filtered image, after you test the
filter(s) applied to the image.
IP Filters
grid
The IP Filters grid displays all page ranges and configured filters for
each page range.
Filter Output
tab
Click the Filter Output tab on the IP Filters grid to view a log of filter
output. See "Image Processing Filter Output" on page 242 for more
information.
Status Bar
The status bar on the bottom of the window displays each image’s page
number, page size (in KB), and page dimensions (in mm). The page
dimensions 215 x 279 mm are approximately equivalent to 8.5 x 11
inches.
Click Save IP Filters to save the image processing filter(s) configuration for
the job step.
Click Exit to close the Edit IP Filters window.
Click Configure Scanner to open the Scanner Settings dialog box where
you can specify scanner settings. See "Scanner Setup" on page 101 for
more information.
PaperVision® Capture Administration Guide
236
Chapter 11 Image Processing
Component
Description
Click Start Scanning to begin the scanning process.
Click Stop Scanning to stop the scanning process.
Click Rotate Image 90° Counter-Clockwise to rotate the selected image
90 degrees counter-clockwise.
Click Rotate Image 90° Clockwise to rotate the selected image 90
degrees clockwise.
Click Remove Selected Images to remove the image(s) you selected.
Click Remove All Images to remove all loaded images. If you have defined
barcode zones prior to clearing all images, these barcode zones are
retained.
Click Import Images to access the Open dialog box where you can locate
the file you want to import.
Click Save Filtered Image to save the image to which you have applied a
filter.
Click Test Filters (Current Page) to test the selected filter on the current
page only. The page with the filter(s) applied appears on the Resulting
Image pane. Test information is saved on the Filter Output tab on the IP
Filters grid. See "Image Processing Filter Output" on page 242 for more
information.
Click Test Filters (All Pages) to test the selected filter on all pages. The
pages with the filter(s) applied appear on the Resulting Image pane. Use
the scroll bar on the right side of the Resulting Image pane to navigate
through the pages to ensure the filters are acceptable. Test information is
saved on the Filter Output tab on the IP Filters grid. See "Image
Processing Filter Output" on page 242 for more information.
Click Clear IP Filter Output to clear the information on the Filter Output
tab on the IP Filters grid. See "Image Processing Filter Output" on page 242
for more information.
Click Draw IP Zone to draw an image processing zone. See "Working with
Image Processing Zones" on page 239 for more information.
Click Remove IP Zone to remove an existing image processing zone. See
"Working with Image Processing Zones" on page 239 for more information.
Click Zoom In to zoom in on the selected image.
Click Zoom Out to zoom out on the selected image.
PaperVision® Capture Administration Guide
237
Chapter 11 Image Processing
Component
Description
Click Zoom Reset to reset the selected image to the original view.
6. To import images, do one of the following:
Click Import Images
to access the Open dialog box. Locate and select the image you want to use,
and then click Open. The image you selected appears on the Source Image pane.
l
Click Start Scanning
Thumbnails pane.
l
to scan the image(s) you want to use. Each scanned image appears on the
7. On the IP Filters grid, click the Page Range column to specify the page(s) to which you want to apply
image processing filters and zones. You can do one of the following:
l
l
From the Page Range list, you can select All, Odd, Even, or Last.
In the Page Range column, you can type a single page number (for example, 8), multiple page numbers
separated by commas ( for example, 12, 17, 25),or a page range (for example, 10-15).
NOTE: Binary filters can be applied only to bitonal (1 bit per pixel) images; color and grayscale are
ignored. Therefore, you cannot apply both color and binary filters to the same page range (same
row in the IP Filters grid).
NOTE: If desired, you can set the Page Range column after you draw image processing zones and
select image processing filters.
8. See "Working with Image Processing Zones" on page 239 if you want to define image processing zones.
9. To configure the filters for the pages you specified, click the ellipsis button
The Image Processing Filters dialog box appears.
next to the Filters column.
10. From the Available Filters list, select the filter you want to use, and then click Add. Filters supported in
zones are marked with asterisks ( * ).
NOTE: See "Image Processing Filters" on page 242 for filter descriptions and configuration information.
11. To configure a filter, select it from the Selected Filters list, and then click Configure.
NOTE: The Configure button is available only if there are configuration options for the selected filter. If
there are no configuration options for the selected filter, the Configure button is unavailable.
12. Selected filters are applied in the order that they appear in the Selected Filters list. If you want to change
the order, select the filter you want to move, and then click Move Up or Move Down.
PaperVision® Capture Administration Guide
238
Chapter 11 Image Processing
13. After you have configured all filters, click OK to return to the Edit IP Filters window.
NOTE: See the table under "Configuring Image Processing Filters" on page 236 for information about all
the functions you can perform on the Edit IP Filters window.
Working with Image Processing Zones
You can apply certain binary image processing filters to zones within bitonal images. For example, you may
want to apply the Binary Hole Removal filter only to the left two inches of a bitonal image, or the Binary
Invert Image filter to expose a specific area of a bitonal image. You can apply the following filters to zones
that you define on the image.
l
Binary Dilation
l
Binary Erosion
l
Binary Halftone Removal
l
Binary Hole Removal
l
Binary Invert Image
l
Binary Line Removal
l
Binary Noise Removal
l
Binary Skeleton
l
Binary Smoothing
l
Scaling
NOTE: See "Image Processing Filters" on page 242 for filter descriptions and configuration information.
To draw an IP zone and configure the filters
1. If you haven’t already done so, complete the procedures under "Configuring an Image Processing Job Step"
on page 234 and "Configuring Image Processing Filters" on page 236.
2. On the workspace of the Job Definitions window, double-click the Image Processing step.
3. On the Properties tab, expand Image Processing.
4. Click Filters, and then click the ellipsis button
to open the Edit IP Filters window.
5. If you need to import images, do one of the following:
l
l
Click Import Images
to access the Open dialog box. Locate and select the image you want to use,
and then click Open. The image you selected appears on the Source Image pane.
Click Start Scanning
Thumbnails pane.
to scan the image(s) you want to use. Each scanned image appears on the
PaperVision® Capture Administration Guide
239
Chapter 11 Image Processing
6. On the Thumbnails pane, select the image you want to use.The image you selected appears on the
Source Image pane.
7. On the toolbar, click Draw IP Zone
.
8. Position the cursor where you want to draw the IP zone, and then click and drag the cursor to draw a border
around the area you want to define. The dimensions of the zone appear on the IP Filters grid in the Zone
column. If desired, you can manually edit these dimensions.
9. If you want to move an IP zone, select it, and then rest the mouse on the center of the zone until the cursor
turns into a four sided arrow.Click the zone and drag it to the desired location.
10. If you want to delete an IP zone, select it, and then click Remove IP Zone
.
11. On the IP Filters grid, click the Page Range column to specify the page(s) to which you want to apply
image processing filters and zones. You can do one of the following:
l
l
From the Page Range list, you can select All, Odd, Even, or Last.
In the Page Range column, you can type a single page number (for example, 8), multiple page numbers
separated by commas ( for example, 12, 17, 25),or a page range (for example, 10-15).
NOTE: Binary filters can be applied only to bitonal (1 bit per pixel) images; color and grayscale are
ignored. Therefore, you cannot apply both color and binary filters to the same page range (same
row in the IP Filters grid).
NOTE: If desired, you can set the Page Range column after you draw image processing zones and
select image processing filters.
12. To configure the filters for the pages you specified, click the ellipsis button
The Image Processing Filters dialog box appears.
next to the Filters column.
13. From the Available Filters list, select the filter you want to use, and then click Add. Filters supported in
zones are marked with asterisks ( * ).
NOTE: See "Image Processing Filters" on page 242 for filter descriptions and configuration information.
14. To configure a filter, select it from the Selected Filters list, and then click Configure.
NOTE: The Configure button is available only if there are configuration options for the selected filter. If
there are no configuration options for the selected filter, the Configure button is unavailable.
15. Selected filters are applied in the order that they appear in the Selected Filters list. If you want to change
the order, select the filter you want to move, and then click Move Up or Move Down.
16. After you have configured all filters, click OK to return to the Edit IP Filters window.
PaperVision® Capture Administration Guide
240
Chapter 11 Image Processing
Edit IP Filters
NOTE: See the table under "Configuring Image Processing Filters" on page 236 for information about all
the functions you can perform on the Edit IP Filters window.
Image Processing for Duplex Documents
You can apply image processing filters to duplex documents by manipulating the page range property for the
applicable pages. For example, to rotate the last duplex image, you can apply the Rotation filter with the
Page Range set to Last, and then apply another Rotation filter with the Page Range set to Last -1.
Image Processing - Duplex Documents
PaperVision® Capture Administration Guide
241
Chapter 11 Image Processing
Image Processing Filter Output
Each time you test image processing filters, a log is generated that indicates whether images are deleted or
retained, and includes a summary of filter parameters applied to each page. To view a detailed log of all tests
performed per page, click the Filter Output tab on the IP Filters grid. The information appears similar to the
following.
Filter Output Log
To remove the contents on the Filter Output tab, on the toolbar, click Clear IP Filter Output
.
Image Processing Filters
Image Processing filters improve image quality by removing unnecessary borders, lines, and noise;
enhancing text readability; and reducing file size. Additional image processing filters evaluate images, and
then keep or discard them based on your defined criteria. Color detection filters identify your specified colors
and convert the image to black and white or remove the page containing the color image. Binary filters can
only be applied to bitonal (1 bit per pixel) images; color and grayscale are ignored.
If you haven’t already done so, complete the procedures under "Configuring an Image Processing Job Step"
on page 234 and "Configuring Image Processing Filters" on page 236.
The following sections describe each image processing filter and the items you can configure for each one.
PaperVision® Capture Administration Guide
242
Chapter 11 Image Processing
Background Dropout
The Background Dropout filter is intended to be used on color images with contrasting text or a uniform
background of the same color or similar colors. The background is a set of pixels of the same or similar color
that covers the majority of the image, contrasting with other informative pixels. Background detection is
based on the image histograms of red, green, and blue (RGB) channels. Only the margins of the image are
used for histogram analysis, assuming that margins are free from any information and clearly represent the
background of the image.
To adjust Background Dropout settings
1. From the Selected Filters list, select Background Dropout, and then click Configure.
2. Use the following Background Dropout dialog box to adjust the settings.
Background Dropout
3. Click Load Sample to access the Open dialog box.
4. Locate and select the image you want to use, and then click Open. The image you selected appears in the
Image box. As you apply settings, the results appear in the Image with Dropout box.
5. You can apply one of the following options.
l
Select Smooth background to smooth the background color and make it appear more uniform.
l
Select Replace with color to apply the color you choose by clicking Pick Color.
6. To zoom in or out on the image, select a larger or smaller percentage from the Scaling list.
7. To apply a more noticeable background dropout, you can move the Sensitivity slider to the right to increase
the value, or to the left to decrease the value. Alternatively, you can enter a value between -20 and 20 in the
Sensitivity box.
PaperVision® Capture Administration Guide
243
Chapter 11 Image Processing
Binary Dilation
The Binary Dilation filter expands the area of black objects in an image using your specified direction
(horizontal, vertical, and/or diagonal) and the number of times (passes) to apply the dilation. Using this filter
may improve image quality and the legibility of text, but can also increase file size.
To adjust Binary Dilation settings
1. From the Selected Filters list, select Binary Dilation, and then click Configure.
2. Use the following Binary Dilation dialog box to adjust the settings.
Binary Dilation
3. In the Direction area, select which direction(s) to perform the dilation.
4. In the Number of Passes box, type or select the number of times to apply the filter to the page.
Before Binary Dilation
PaperVision® Capture Administration Guide
After Binary Dilation
244
Chapter 11 Image Processing
Binary Erosion
The Binary Erosion filter trims the area of black objects using your specified direction (horizontal, vertical,
and/or diagonal) and the number of times (passes) to apply the erosion. Using this filter can reduce the file
size, but causes a loss of detail from the image.
To adjust Binary Erosion settings
1. From the Selected Filters list, select Binary Erosion, and then click Configure.
2. Use the following Binary Erosion dialog box to adjust the settings.
Binary Erosion
3. In the Direction area, select which direction(s) to perform the erosion.
4. In the Number of Passes box, type or select the number of times to apply the filter to the page.
Before Binary Erosion (Horizontal)
PaperVision® Capture Administration Guide
After Binary Erosion (Horizontal)
245
Chapter 11 Image Processing
Binary Halftone Removal
The Binary Halftone Removal filter removes the background, such as a halftone or dither pattern, from an
image or a graphics object on the image.
After Binary Halftone Removal
Before Binary Halftone Removal
Binary Hole Removal
The Binary Hole Removal filter identifies objects that look like punched binder holes around the edges of the
image, and then deletes any instances it finds. Similar objects in other areas of the image will not be
removed.
To adjust Binary Hole Removal settings
1. From the Selected Filters list, select Binary Hole Removal, and then click Configure.
2. Use the following Binary Hole Removal dialog box to adjust the settings.
Binary Hole Removal
3. In the Minimum box, type the minimum diameter (in millimeters) that a hole must measure to be removed.
Circles that are smaller in diameter than the value you enter are not removed. You can increase this number
if the filter is removing small circles that are not actually binder holes.
PaperVision® Capture Administration Guide
246
Chapter 11 Image Processing
4. In the Maximum box, type the maximum diameter (in millimeters) that a hole must measure to be removed.
Circles that are larger in diameter than the value you enter are not removed. You can decrease this number
if the filter is removing large circles that are not actually binder holes.
Before Binary Hole Removal
PaperVision® Capture Administration Guide
After Binary Hole Removal
247
Chapter 11 Image Processing
Binary Invert Image
The Binary Invert Image filter reverses the polarity of the image. Black pixels become white pixels, and white
pixels become black pixels.
After Binary Invert Image
Before Binary Invert Image
Binary Line Removal
The Binary Line Removal filter removes lines from or reconstructs lines on a form-based image. Removing
lines can reduce the file size and improve OCR results.
To adjust Binary Line Removal settings
1. From the Selected Filters list, select Binary Line Removal, and then click Configure.
2. Use the following Binary Line Removal dialog box to adjust the settings.
Binary Line Removal
PaperVision® Capture Administration Guide
248
Chapter 11 Image Processing
3. In the Mode area, You can select one of the following options.
l
l
l
Select Remove Lines to take out all objects considered as lines.
Select Reconstruct Lines to remove lines, repair overlapped graphics and text, and redraw straight lines
in place of removed lines.
Select Reconstruct Forms to remove lines, redraw straight lines, and reconnect lines that were
previously connected. This type of line correction is commonly used for tables and forms.
4. If you want the filter to detect horizontal lines, in the Horizontal (mm) area, select Enable. You can set the
following options.
l
l
l
l
Select Straight Line Algorithm to enable faster processing of straight lines that are longer than 100
pixels (suitable for forms and light paper). This filter evaluates the height or width of the bounding
rectangles around line-like objects to determine if the object is a line. If this setting is not used, the filter
breaks the line-like object into small segments and uses the maximum gap, and curvature values to
determine whether the segments comprise a line.
In the Min Length box, type or select the minimum detectable length (in millimeters) of horizontal lines.
Lines shorter than this length are ignored.
In the Max Gap box, type or select the maximum white space (in millimeters) allowed between two
horizontal line-like objects for them to be considered a single line. (This setting is ignored when Straight
Line Algorithm is selected.)
From the Curvature list, select the maximum allowable amount of deviation from a straight line for a
horizontal line-like object to be considered a line. If the calculated curvature is greater than the value you
select, then the object is not considered a line. The higher the value, the more curved an object the filter
will remove. A lower value restricts the filter to removing lines with less curvature. Choose a value that
removes unwanted lines while preserving other desirable features on your pages. You can choose from
Straight (contains a curvature value of 5), Low (contains a value of 15), Medium (contains a value of
30), or High (contains a value of 40). (This setting is ignored when Straight Line Algorithm is
selected.)
5. If you want the filter to detect vertical lines, in the Vertical (mm) area, select Enable. You can set the
following options.
l
l
l
l
Select Straight Line Algorithm to enable faster processing of straight lines that are longer than 100
pixels (suitable for forms and light paper). This filter evaluates the height or width of the bounding
rectangles around line-like objects to determine if the object is a line. If this setting is not used, the filter
breaks the line-like object into small segments and uses the maximum gap, and curvature values to
determine whether the segments comprise a line.
In the Min Length box, type or select the minimum detectable length (in millimeters) of vertical lines.
Lines shorter than this length are ignored.
In the Max Gap box, type or select the maximum white space (in millimeters) allowed between two
vertical line-like objects for them to be considered a single line. (This setting is ignored when Straight
Line Algorithm is selected.)
From the Curvature list, select the maximum allowable amount of deviation from a straight line for a
vertical line-like object to be considered a line. If the calculated curvature is greater than the value you
select, then the object is not considered a line. The higher the value, the more curved an object the filter
will remove. A lower value restricts the filter to removing lines with less curvature. Choose a value that
PaperVision® Capture Administration Guide
249
Chapter 11 Image Processing
removes unwanted lines while preserving other desirable features on your pages. You can choose from
Straight (contains a curvature value of 5), Low (contains a value of 15), Medium (contains a value of
30), or High (contains a value of 40). (This setting is ignored when Straight Line Algorithm is
selected.)
Before Binary Line Removal
After Binary Line Removal
Binary Noise Removal
Noise can originate from carbon or dirt particles found on scanners, fax machines, or copiers. Noise removal
takes out extraneous specks from an image. If the image contains text, the Binary Noise Removal filter
may remove periods and dots from sentences and letters. To avoid removing essential parts of text
characters, make sure that the minimum separation value you assign is greater than the distance between
dots and the lower parts of letters. If you want to apply cropping and noise removal to an image, perform the
noise removal first for best results.
To adjust Binary Noise Removal settings
1. From the Selected Filters list, select Binary Noise Removal, and then click Configure.
2. Use the following Binary Noise Removal dialog box to adjust the settings.
Binary Noise Removal
3. In the Max Height box, type the maximum height (in millimeters) of objects to be removed as noise.
4. In the Max Width box, type the maximum width (in millimeters) of objects to be removed as noise.
PaperVision® Capture Administration Guide
250
Chapter 11 Image Processing
5. In the Max Area % box,type or select the maximum percentage of the area defined by the Max Height and
Max Width values that an object can occupy to be removed as noise.
This setting is especially useful for detecting long narrow objects that may appear both vertically and
horizontally on a page, such as lines, decorative banners, and highlight areas. For example, to remove
colored banners that are either 50 (mm) x 10 (mm) or 10 (mm) x 50 (mm), assign the Max Height and Max
Width value to 50 (mm). However, a 50 (mm) x 50 (mm) picture would also be detected as noise and
removed. To avoid this problem, set the Maximum Area % value to 20 so that only the banner area is
detected as noise, regardless of its orientation.
6. In the Min Separation box, type the minimum distance (in millimeters) that separates a noise object from
other areas on the page.
A value of zero removes all noise objects that fit within the area specified by the values you typed in the
Max Height, Max Width, and Max Area % boxes. Assigning a zero value may remove small text
elements, such as broken characters, punctuation, and the dots above letters. Assigning a value greater
than zero preserves elements that would otherwise be considered noise that occur in the vicinity of text
characters. This may improve readability and OCR accuracy.
Before Binary Noise Removal
After Binary Noise Removal
Binary Skeleton
Use the Binary Skeleton filter with caution, since it can significantly distort the image. This filter can reduce
the file size, and is recommended only to use when performing certain types of OCR.
PaperVision® Capture Administration Guide
251
Chapter 11 Image Processing
Before Binary Skeleton
After Binary Skeleton (also Zoomed
1X)
Binary Smoothing
The Binary Smoothing filter removes bumps and spurs that appear on text characters or graphics in an
image. This filter looks for any pixel surrounded by five or six connected pixels of the opposite color, and then
inverts that center pixel based on the filter's configuration. Smoothing improves legibility and can reduce the
file size without compromising detail.
To adjust Binary Smoothing settings
1. From the Selected Filters list, select Binary Smoothing, and then click Configure.
2. Use the following Binary Smoothing dialog box to adjust the settings.
Binary Smoothing
3. In the Options area, you can select the following items.
l
Select Trim First to remove black noise pixels before removing white ones. If this option is not selected,
white noise pixels are removed before black ones.
PaperVision® Capture Administration Guide
252
Chapter 11 Image Processing
l
Select Corner Black to remove black noise pixels from the corners of objects in the image.
l
Select Corner White to remove white noise pixels from the corners of objects in the image.
Before Binary Smoothing
After Binary Smoothing
Black Overscan Removal
The Black Overscan Removal filter removes the black area around an image when a page is scanned using
an overscan option. This filter reduces the physical size of the scanned image and the image file size by
eliminating the black border generated by scanners with black backgrounds. To maximize results, apply the
Deskew filter with a black fill color prior to applying the Black Overscan Removal filter.
To adjust Black Overscan Removal settings
1. From the Selected Filters list, select Black Overscan Removal, and then click Configure.
2. Use the following Black Overscan Removal dialog box to adjust the settings.
Black Overscan Removal
PaperVision® Capture Administration Guide
253
Chapter 11 Image Processing
3. In the Operation Mode area, select one of the following options.
l
l
Select Remove to remove the black border generated by scanners with black backgrounds. The
physical size of the scanned image and the image file size are reduced.
Select Clear to invert the overscan area of an image. The image dimensions remain unchanged.
4. In the Processing Limits (mm) area, specify the processing limits for each side of the target image to
ensure that the removed border won’t exceed defined limits from each side during processing.
l
In the Top box, type the size limit in millimeters for the top border.
l
In the Bottom box, type the size limit in millimeters for the bottom border.
l
In the Left box, type the size limit in millimeters for the left border.
l
In the Right box, type the size limit in millimeters for the right border.
5. If you want to prevent overscan removal on color-inverted images, then select Process Inverted Images.
If you want to remove black overscan areas from images that contain large black borders, and you are not
processing color-inverted images, clear this option.
Before Black Overscan Removal
PaperVision® Capture Administration Guide
After Black Overscan Removal
254
Chapter 11 Image Processing
Color Adjustments
The Color Adjustments filter provides complex contrast, brightness, and tone adjustments to an image.
To adjust Color Adjustments settings
1. From the Selected Filters list, select Color Adjustments, and then click Configure.
2. Use the following Color Adjustments dialog box to adjust the settings.
Color Adjustments
3. Click Load Sample to access the Open dialog box.
4. Locate and select the image you want to use, and then click Open. The image you selected appears in the
Image box. As you apply settings, the results appear in the Image with Color Adjustments box.
5. To zoom in or out on the image, select a larger or smaller percentage from the Scaling list.
6. In the Mode area, You can select one of the following options.
l
l
Select Automatic to apply an automatic adjustment algorithm (linear or non-linear) for processing the
image.
Select Manual to apply a polynomial transformation algorithm. When you select Manual, in the Options
area you can specify the following settings.
a.
From the Preset list, you can select one of the following preset variables.
l
Same - leaves the image the same.
l
Invert - inverts the image.
l
Lighten - lightens the image.
PaperVision® Capture Administration Guide
255
Chapter 11 Image Processing
l
l
b.
Darken - darkens the image.
Contrast - increases the contrast of the image.
From Channel list, you can select one of the following channel variables.
l
Rgb - sets the transformation of an image for all three (red, green, and blue) channels.
l
Red - sets the transformation of an image for the red channel only.
l
Green - sets the transformation of an image for the green channel only.
l
Blue - sets the transformation of an image for the blue channel only.
7. In the Algorithm area, you can select one of the following options.
l
l
Select Linear to remove the black border generated by scanners with black backgrounds. The physical
size of the scanned image and the image file size are reduced.
Select Non-Linear to invert the overscan area of an image. The image dimensions remain unchanged.
8. In the Apply Mode area, you can select one of the following options.
l
l
Select Channel by channel to apply the algorithm to the R, G, and B channels separately. Using this
setting may damage the color balance.
Select Luminosity/Lightness to apply the algorithm to the pixel luminosity channel.
9. In the Linear Mode area, you can select one of the following options. These options apply only when you
select Linear for the algorithm.
l
l
Select Cutoff Fraction to specify that variants of the linear algorithm are calculated using a cut-off
fraction.
Select Percentiles to specify that variants of the linear algorithm are calculated using percentiles.
Color Detection and Conversion
The Color Detection and Conversion filter detects the colorfulness of an image, and then returns either a
binary or a color image based on your assigned threshold settings. Select Ignore Paper Color to change the
paper's background color to white. The filter then counts the number of white (or nearly-white) and black (or
nearly-black) pixels and excludes them from the count of pixels that are “colorful.” The colorfulness of the
image is then computed according to your selected Color Detect Type. If the resulting colorfulness value is
less than your assigned threshold, the resulting image displays as a binary (black and white) version of the
original image. If the input image is more colorful that your specified threshold value, this filter does not output
an image.
To adjust Color Detection and Conversion settings
1. From the Selected Filters list, select Color Detection and Conversion, and then click Configure.
2. Use the following Color Detection and Conversion dialog box to adjust the settings.
PaperVision® Capture Administration Guide
256
Chapter 11 Image Processing
Color Detection and Conversion
3. To set the Color Threshold Percentage, move the slider to the right to increase the value, or to the left to
decrease the value.The default value is 5. A value of 5 (when used with the Ratio color detect type) causes
the filter to detect binary if the image has less than 5% color pixels, and to detect color if the image has
more than 5% color pixels.
4. In the Color Detect Type area, select one of the following options.
l
Select Amount for color detection to be based on the amount of color pixels in the image.
l
Select Ratio for color detection to be based on the ratio of color and black pixels in the image.
5. Select Ignore Paper Color to detect and remove a colored background before performing automatic color
detection. Otherwise, clear this option.
6. The Brightness slider sets the brightness value at which pixels are converted to white rather than black.
You can set this value from 0 to 100, where 100 specifies the brightest setting. Move the slider to the right
to increase the value, or to the left to decrease the value.The default value is 65.
7. The Contrast slider sets the contrast value. You can set this value from 0 to 100, where 100 specifies the
highest setting. Move the slider to the right to increase the value, or to the left to decrease the value.The
default value is 50.
8. In the Algorithm area, select one of the following options.
l
l
Select Simple to use the simple algorithm that works uniformly on the image. This option works best for
images that do not have any obvious gradients, or light and dark areas. It tends to preserve graphics very
well.
Select Advanced to use the advanced algorithm that works better than the simple one for images that
might need different adjustments made to various areas of the page. It tends to preserve text well.
PaperVision® Capture Administration Guide
257
Chapter 11 Image Processing
9. If you selected Advanced in the previous step, in the Mode area, select one of the following options.
l
l
Select Fast to maximize the speed of image processing. Using this option could leave more stray marks
on the image.
Select Good to maximize the quality of image processing. Using this option is typically slower than
using the Fast mode, but usually produces the best quality image. Note that the aggressive removal of
stray marks to get a clean image can sometimes result in a small amount of image content also being
removed.
Color Dropout
The Color Dropout filter changes the color you specify in an image to white. This filter also maintains a list
of color mappings to make when it runs.
To adjust Color Dropout settings
1. From the Selected Filters list, select Color Dropout, and then click Configure.
2. Use the following Color Dropout dialog box to adjust the settings.
Color Dropout
PaperVision® Capture Administration Guide
258
Chapter 11 Image Processing
3. Click Load Sample to access the Open dialog box.
4. Locate and select the image you want to use, and then click Open. The image you selected appears in the
Image box. As you apply settings, the results appear in the Image with Dropouts box.
5. To zoom in or out on the image, select a larger or smaller percentage from the Scaling list.Scaling
6. To change the magnitude of the dropout, you can move the Magnitude slider to the right to increase the
value, or to the left to decrease the value. Alternatively, you can enter a value between 1 and 255 in the
Magnitude box.
7. Use of the following options to select the color you want to remove from the image.
l
l
Click Pick Color to access the Color dialog box where you can select a color, or define a custom color
by clicking Define Custom Colors.
One the image, click the color you want to remove.
The colors you select appear in the Color Mapping area.
NOTE: The colors you select appear with the magnitude that was set when you selected the color. If you
want to modify the magnitude for a selected color, you must delete the color from the list, and then change the
magnitude setting, and then select the color again.
8. Use of the following options to remove items from the Color Mapping list.
l
To remove your most recent color selection, click Undo.
l
Select the color(s) you want to remove, and then click Remove.
l
Click Clear All to remove all colors from the list.
Crop
The Crop filter removes borders around the edge of an image.
To adjust Crop settings
1. From the Selected Filters list, select Crop, and then click Configure.
2. Use the following Crop dialog box to adjust the settings.
Crop
PaperVision® Capture Administration Guide
259
Chapter 11 Image Processing
3. In the Image Margins (mm) area, specify the processing limits for each side of the target image to ensure
that the removed border won’t exceed defined limits from each side during processing.
l
In the Top box, type the size (in millimeters) for the top border after cropping.
l
In the Bottom box, type the size (in millimeters) for the bottom border after cropping.
l
In the Left box, type the size (in millimeters) for the left border after cropping.
l
In the Right box, type the size (in millimeters) for the right border after cropping.
NOTE: Positive margin values represent the white space between the edge of the image and the black pixel
closest to that edge. Negative margin values crop the specified amount from the black pixel closest to the
edge towards the center of the image.
4. If you want to specify that the top and bottom margins match and the left and right margins match, select
Force Symmetry. When this option is selected, the value for the Top margin is used for the top and bottom
margins, and the value for the Left margin is used for the left and right margins. The values for the Bottom
and Right margins are ignored.
Deskew
The Deskew filter straightens a slanted image. Skewing can occur when the original document was fed into
the scanner, fax machine, or photocopier. This filter examines the image and determines the skew angle,
which is measured between the edge of the image and the horizontal or vertical axis. The image data is then
rotated to correct the skew angle. The filter rotates an image from -44.9 degrees to +44.9 degrees, in 0.1
degree increments.
To adjust Deskew settings
1. From the Selected Filters list, select Deskew, and then click Configure.
2. Use the following Deskew dialog box to adjust the settings.
PaperVision® Capture Administration Guide
260
Chapter 11 Image Processing
Deskew
3. In the Mode area, You can select one of the following options.
l
Select Text if pages primarily contain text with some tables and lines.
l
Select Graphic if pages contain large blocks of black areas.
4. To set the color for the area at the edge of the image to be deskewed, in the Fill Color area, click Select
Color to access the Color dialog box where you can select a color, or define a custom color by clicking
Define Custom Colors. By default, this value is set to white.
5. In the Operating Mode area, select one of the following options.
l
l
l
Select Detect Angle and Deskew for the filter to automatically detect the skew angle and deskew the
images.
Select Detect Angle for the filter to only detect the skew angle.
Select Rotate by a Fixed Angle for images to be rotated by the fixed angle you specify. In the Fixed
Angle box, type or select the number of degrees for the fixed angle you want to use.
6. In the Direction area, set the direction of the skew angle measurement. You can select one of the following
options.
l
Select Horizontal if the text on the pages is horizontal.
l
Select Vertical if the text on the pages is vertical.
l
Select Both if the text may be a mix of both horizontal and vertical text.
PaperVision® Capture Administration Guide
261
Chapter 11 Image Processing
7. In the Quality area, select one of the following options.
l
Select Fast to maximize the speed of the deskew processing.
l
Select Good to maximize the quality of the deskew processing.
Before Deskew
After Deskew (also with Binary Border Removal)
Image Fit
The Image Fit filter is intended to crop images before they are processed through the Nuance Full-Text OCR
step. The minimum and maximum width and height dimensions that you can specify are 16 x 16 to 8400 x
8400 pixels. If the image size is less than 16 x 16 pixels, white space is added to the image from the bottom
and right corners until the minimum size (16 x 16 pixels) is reached. If the image size is greater than 8400 x
8400 pixels, the image is cropped from the bottom and right corners until the maximum size is reached.
To adjust Image Fit settings
1. From the Selected Filters list, select Image Fit, and then click Configure.
2. Use the following Image Fit dialog box to adjust the settings.
Image Fit
PaperVision® Capture Administration Guide
262
Chapter 11 Image Processing
3. In the Min Width box, type the minimum width (in pixels) for the image size.
4. In the Max Width box, type the maximum width (in pixels) for the image size.
5. In the Min Height box, type the minimum height (in pixels) for the image size.
6. In the Max Height box, type the maximum height (in pixels) for the image size.
Page Deletion - Always
The Page Deletion - Always filter removes the entire page from the batch.
Page Deletion - Blank
The Page Deletion - Blank filter lets you configure how blank pages are detected for deletion.
To adjust Page Deletion - Blank settings
1. From the Selected Filters list, select Page Deletion - Blank, and then click Configure.
2. Use the following Page Deletion - Blank dialog box to adjust the settings.
Page Deletion - Blank
3. In the Detection Mode area, select one of the following options.
l
Select Preset to determine page deletion based on one of the following options from the Preset list.
o
o
o
o
o
Dirty White - the default setting, considers pages blank when they contain some noise.
Pristine White - considers pages blank when they contain no noise.
Very Dirty White - considers pages blank when they contain a lot of noise.
One Line OK - considers pages blank when they contain one specified line of text.
Two Lines OK - considers pages blank when they contain two specified lines of text.
PaperVision® Capture Administration Guide
263
Chapter 11 Image Processing
l
Select Black Area Ratio to assign the ratio that determine when a page is blank. Move the slider to
assign the ratio. The ratio is calculated by dividing black pixels by the number of all region pixels.
4. In the Margins area, specify the margins (in millimeters using whole or decimal numbers) to exclude when
determining whether a page is blank.
5. To specify advanced settings, click the Advanced tab, and ensure that Extended Settings is selected.
You can specify the following options.
l
l
l
l
l
l
l
l
Select Auto Despeckle to automatically exclude noise before detecting blank pages.
In the Noise Size box, you can type or select the length (in pixels) of the dark areas of each line that will
be considered noise for despeckling purposes.
Select Black Run Reject to reject areas of black speckles that are larger than the value specified in the
Max Black Run box.
In the Max Black Run box, you can type or select the maximum length (in pixels) for an area of black
speckles.
In the Max Clumps box, you can type or select the maximum number of clumps detected for a page to
be considered blank. A “clump” is a sequence of non-empty lines. If the number of clumps detected is at
least the value you specify, the page is considered not blank. You can use the Noise Size and the Min
Clump Length boxes to customize the detection of clumps.
In the Min Clump Length box, you can type or select the minimum length (in pixels) of clumps. A
“clump” is a sequence of non-empty lines. The value you set for the Min Clump Length specifies how
many sequential lines must be non-empty to be considered a clump. If the number of non-empty lines
detected is less than the value you specify, the non-empty lines are considered noise.
In the Line Noise Level box, you can type or select the number of black pixels, except for excluded
blobs, for a line to be considered non-empty. A line that is considered non-empty is counted in the total
number of lines that may constitute a “clump.”
In the Max Transitions box, you can type or select the maximum threshold of transitions for a page to be
considered blank. A transition is a change from black to white pixels or from white to black pixels on one
line. If the average number of transitions in a line for the whole image is larger than the value you specify,
then the image is considered not blank.
PaperVision® Capture Administration Guide
264
Chapter 11 Image Processing
Page Deletion - Color Content
The Page Deletion - Color Content filter allows you to assign color threshold settings that specify whether
to delete color pages or non-colorful pages.
To adjust Page Deletion - Color Content settings
1. From the Selected Filters list, select Page Deletion - Color Content, and then click Configure.
2. Use the following Page Deletion - Color Content dialog box to adjust the settings.
Page Deletion - Color Content
3. In the Color Content area, specify the color range for pages to be kept. In the From box, type or select the
lowest value for the range. In the To box, type or select the highest value for the range. Pages detected
outside the range you specified are deleted.
4. In the Detection Threshold area, use the Threshold slider to set the amount of color a sample must
contain to be identified as “colorful.” The average threshold of a sample is compared to the threshold value
you set.
If you set the value to 0, then all samples are colorful because the average threshold of each sample is
probably greater than 0. If you set the value to 256, then no samples are colorful. To detect only bright
colors, try setting the value to 200. To detect very pale colors, try setting the value to 50.
5. In the Detection Threshold area, use the Sample Size slider to set the sample size. To perform color
content detection, the image is divided into areas. This option specifies the size of these areas.
If the resolution is especially low (100 dpi or lower), a value of 1 or 2 might produce better results. If the
resolution is high (600 dpi or higher), a value of 5 might be more satisfactory.
PaperVision® Capture Administration Guide
265
Chapter 11 Image Processing
Page Deletion - Dimensions
The Page Deletion - Dimensions filter lets you specify the dimensions for pages that will remain in the
batch.Images with dimensions that fall outside your specified ranges are deleted from the batch.
To adjust Page Deletion - Dimensions settings
1. From the Selected Filters list, select Page Deletion - Dimensions, and then click Configure.
2. Use the following Page Deletion - Dimensions dialog box to adjust the settings.
Page Deletion - Dimensions
3. In the Width Range (pixels) area, specify the width range for pages to be kept. In the From box, type or
select (in pixels) the lowest value for the range. In the To box, type or select the highest value (in pixels) for
the range.
4. In the Height Range (pixels) area, specify the height range for pages to be kept. In the From box, type or
select (in pixels) the lowest value for the range. In the To box, type or select the highest value (in pixels) for
the range.
Page Deletion - File Size
The Page Deletion - File Size filter lets you specify the file size for pages that will remain in the
batch.Pages that fall outside your specified file-size range are deleted from the batch.
To adjust Page Deletion - File Size settings
1. From the Selected Filters list, select Page Deletion - File Size, and then click Configure.
2. Use the following Page Deletion - File Size dialog box to adjust the settings.
PaperVision® Capture Administration Guide
266
Chapter 11 Image Processing
Page Deletion - File Size
3. In the Size Range (bytes, KB or MB) area, specify the file size range for pages to be kept. In the From
box, type the lowest value for the size range, followed by the unit of measure (bytes, KB, or MB). In the To
box, the highest value for the size range, followed by the unit of measure (bytes, KB, or MB).
NOTE: If you do not enter a specific unit of measure for the file size (bytes, KB, MB) after the numeric
value, the unit defaults to bytes. Therefore, for kilobytes and megabytes, you must enter KB and MB
after the numeric values.
Redaction
The Redaction filter allows you to cover confidential or sensitive data on images. To ensure redaction areas
consistently cover the same place on every image, test images with sizes similar to those that will be used in
production. For your reference, the size (in pixels) of each imported image appears in the title bar.
To adjust Redaction settings
1. From the Selected Filters list, select Redaction, and then click Configure.
2. Use the following Redaction dialog box to adjust the settings.
PaperVision® Capture Administration Guide
267
Chapter 11 Image Processing
Redaction
3. On the toolbar, click Import Image
to access the Open dialog box.
4. Locate and select the image you want to use, and then click Open. The image you selected appears on the
left pane of the Redaction dialog box.
5. To adjust the view of the image you can do the following.
l
On the toolbar, click Best Fit
l
On the toolbar, click Actual Size
to fit the entire image on the pane.
to view the image in its actual size.
6. To create a redaction area, place the cursor on the image, and then click and-drag the cursor over the area
you want to redact. The properties for the redaction area you created appear on the right pane.
7. On the right pane, you can modify the following redaction properties by selecting them, and making
changes in the column to the right of the property.
l
l
l
Under Appearance, click Color, and then click the down arrow to select the background color of the
redaction.
Under Position, the X property shows the X coordinate of the redaction area’s upper-left corner relative
to the container’s left edge. The Y property , shows the Y coordinate of the redaction area’s upper-left
corner relative to the container’s top edge.
Under Size, the Height and Width values (in pixels) of the redaction area are shown.
8. If you want to delete a redaction area, select it, and then click Delete
PaperVision® Capture Administration Guide
on the toolbar.
268
Chapter 11 Image Processing
Rotation
The Rotation filter automatically rotates scanned images by your specified direction, fixed amount of
degrees, or detected text orientation.
To adjust Rotation settings
1. From the Selected Filters list, select Rotation, and then click Configure.
2. Use the following Rotation dialog box to adjust the settings.
Rotation
3. In the Rotation area, you can select one of the following options.
l
Select None to apply no rotation to the image.
l
Select Clockwise to rotate the image 90 degrees clockwise.
l
Select Counter-Clockwise to rotate the image 90 degrees counterclockwise
l
Select 180 Degrees to rotate the image 180 degrees.
l
Select Auto Detect to automatically detect the correct image orientation and rotate the image
accordingly. When you select this option, you can specify the following settings.
a.
Select Orientation to specify the auto detection mode of the rotation filter.
l
Select Auto to have the filter automatically detect whether images are landscape
or portrait.
l
Select Portrait to use portrait orientation only.
l
Select Landscape to use landscape orientation only.
PaperVision® Capture Administration Guide
269
Chapter 11 Image Processing
b.
Select Text to detect the orientation of images based on their text.
l
Select Nuance to use the Nuance Full-Text OCR engine.
l
Select Open Text to use the Open Text Full-Text OCR engine.
NOTE: If you select Auto Detect and then Text, a Capture Nuance Full-Text OCR or Capture Open Text
Full-Text OCR license is used upon time of capture. Additionally, the Mirror option is unavailable
because both full-text engines automatically detect mirrored text.
l
Select Mirror to flip the page across the vertical axis so that it appears to be the mirror image of the
original.
Before Rotation
PaperVision® Capture Administration Guide
After 180-degree Rotation
270
Chapter 11 Image Processing
Scaling
The Scaling filter resizes images while preserving the original aspect ratio.
To adjust Scaling settings
1. From the Selected Filters list, select Scaling, and then click Configure.
2. Use the following Scaling dialog box to adjust the settings.
Scaling
3. From the Mode list, select one of the following options.
l
Select Scale to size to scale the image to the size you specify while preserving its original aspect ratio.
The image is scaled to either the height or the width you specify. If you specify both, the width is ignored
and the image is scaled to the specified height. If neither property is set, the image size is not changed.
This mode does not change the resolution of the image. When you select this option,in the Page Size
area, you can set the following values.
a.
From the Units list, select one of the following units of measure for the page size.
o
Pixels
o
Inches
o
Millimeters
o
Centimeters
o
Points
b.
In the Width box, type the width for the page.
c.
In the Height box, type the height for the page.
PaperVision® Capture Administration Guide
271
Chapter 11 Image Processing
l
l
l
l
Select Scale to resolution to change the resolution of the image by adding or removing pixels. This
preserves the measured dimensions of the image (in inches or millimeters, for example). When you
select this option, in the Resolution box, type or select the resolution value for the page. (If you do not
specify a value, the resolution is not changed.)
Select Scale with coefficient to scale the image by the factor you specify. Pixels are added to or
removed from the image, and the resolution information is adjusted according to the scale factor.When
you select this option, in the Scale Factor box, type or select the scale factor you want to use. (If you do
not specify a scale factor, the image size is not changed.)
Select Resolution alignment to adjust the horizontal and vertical resolutions of an image so that they
are the same. If the X and Y resolutions are not equal, the lower resolution is scaled up to match the
higher one.
Select Change resolution to change the resolution of the image without changing the image data. The
final image has the same number of pixels as the original image, but the filter modifies the pixel density
(number of pixels per inch) so that the image appears smaller or larger than the original based on the
resolution you specify. When you select this option, in the Resolution box, type or select the resolution
value for the page.
4. If you want to enable smoothing during the scaling process, then select Smoothing. Smoothing is the
removal of bumps and spurs that appear on text characters or graphics objects in an image.
Threshold
The Threshold filter converts a 24-bit color image to a binary or grayscale image.
To adjust Threshold settings
1. From the Selected Filters list, select Threshold, and then click Configure.
2. Use the following Threshold dialog box to adjust the settings.
Threshold
PaperVision® Capture Administration Guide
272
Chapter 11 Image Processing
3. In the Output area, select one of the following options.
l
l
Select Binary to convert the image from color into a black and white image.
Select Grayscale to convert the image from color into a gray image. When you select this option, there
are no other settings to specify. Click OK.
4. The Brightness slider sets the brightness value at which pixels are converted to white rather than black.
You can set this value from 0 to 100, where 100 specifies the brightest setting. Move the slider to the right
to increase the value, or to the left to decrease the value.The default value is 65.
5. The Contrast slider sets the contrast value. You can set this value from 0 to 100, where 100 specifies the
highest setting. Move the slider to the right to increase the value, or to the left to decrease the value.The
default value is 50.
6. In the Algorithm area, select one of the following options.
l
l
Select Simple to use the simple algorithm that works uniformly on the image. This option works best for
images that do not have any obvious gradients, or light and dark areas. It tends to preserve graphics very
well.
Select Advanced to use the advanced algorithm that works better than the simple one for images that
might need different adjustments made to various areas of the page. It tends to preserve text well.
7. If you selected Advanced in the previous step, in the Mode area, select one of the following options.
l
l
Select Fast to maximize the speed of image processing. Using this option could leave more stray marks
on the image.
Select Good to maximize the quality of image processing. Using this option is typically slower than
using the Fast mode, but usually produces the best quality image. Note that the aggressive removal of
stray marks to get a clean image can sometimes result in a small amount of image content also being
removed.
8. If you want to apply dithering to the image, select Dither.
PaperVision® Capture Administration Guide
273
Chapter 11 Image Processing
Before Threshold
PaperVision® Capture Administration Guide
After Threshold
274
Chapter 12 Custom Code Configuration
With PaperVision Capture’s custom code engine, you can write Visual Basic.NET or C# code that can be run
at any time during batch processing. Additionally, Digitech Systems provides a .NET Application
Programming Interface (API) that you can use for read/write access to batch metadata, documents, images,
OCR data, and index values.
Job steps within job definitions contain the custom code capabilities. Each job step can trigger custom code
events. These events differ by job step. For example, Indexing job steps can initiate the "Saving Indexes"
custom code event. So, on the Job Definitions window, you can configure the custom code that the system
will run when index values are being saved.
WARNING: Changes made to a batch via custom code that runs in a manual job step may not be
reflected in the Operator Console unless your custom code specifies the appropriate user-interface
refresh level. See public enum UIRefreshLevel under "Enumerations" on page 291 for more
information.
Digitech Systems also provides a Custom Code job step, which is not event-based. Instead, it will run any
code you specify. PaperVision Capture runs Custom Code job steps in the background as automatic
processes, so you do not see them running within the user interface in PaperVision Capture.You can also use
Custom Code job steps for validating or manipulating data and interfacing with an external application, such
as an external database or line-of-business application.
Custom Code Generators
When you configure the Custom Code step, you can select either the C# or Visual Basic programming
language and the custom code generator that runs automatically during batch processing. Custom Code
generators include all PaperVision Capture exports, the Match and Merge Wizard, and customizable scripts
that contain generic code that you can edit and compile directly on the Script Editor window. You can
configure custom code generators using dialog boxes that display only the applicable properties for your
selection. Default settings are provided for each generator within drop-down menus, editable fields, and
check boxes (indicating a default true or false setting). The Basic custom code generator provides a generic
code template, and the Export Template custom code generator provides a generic template for custom
exports that you can run automatically during batch processing.
IMPORTANT: You can use the Visual Basic programming language only with the Match and Merge Auto, Basic, and Export Template custom code generators.
PaperVision® Capture Administration Guide
275
Chapter 12 Custom Code Configuration
To select a Custom Code Generator
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job you want to edit, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace of the Job Definitions window, double-click the Custom Code job step to display the
Properties tab on the left pane.
6. On the Properties tab, expand Custom Code Events (Step Level).
7. Click Step Executing, and then click the ellipsis button
to open the Select Custom Code Generator
dialog box. Each custom code generator and corresponding description are listed.
Select Custom Code Generator
8. If you want to see only the generators that you can configure using the provided dialog boxes, rather than
editing code in the Script Editor, then clear the Advanced check box.
NOTE: To remove existing custom code, on the Properties tab, expand Custom Code Events
[Step Level]. Right-click Step Executing, and then click Reset . Additionally, to prevent the Select
Scripting Language dialog box from appearing each time you configure custom code, select
Suppress this dialog when creating new custom code.
PaperVision® Capture Administration Guide
276
Chapter 12 Custom Code Configuration
9. If the Advanced check box is selected, from the Language list, select the C# or Visual Basic
programming language. Your selected scripting language determines which generators are available for
configuration. (See "Exports" on page 306 for information about individual settings for PaperVision Capture
exports and the constant values that you can define for each one.)
l
l
l
The Basic generator lets you write your own custom code directly in the Script Editor. See
"Script Editor" on page 295 for more information.
The Match and Merge generator runs code from the Match and Merge Wizard, where you set
up connection properties for your SQL Server database. See "Match and Merge Wizard" on page
302 for information about configuring this generator.
The Export Template generator contains additional pre-defined code that will automatically
process batches. See "Exports" on page 306 for information about configuring PaperVision
Capture exports.
10. To configure a generator, double-click it to access it’s corresponding properties.In the dialog boxes that
appear, default values and applicable index fields are provided for your reference, and lists and menus
contain only the options specific to your selected generator. You can manually enter file paths or browse to
the appropriate directory.
11. After you have configured the appropriate properties, click OK to save the generator. Step Executing On
the Properties tab now appears as Enabled.
NOTE: The most recent template and programming language that you selected is retained for the next
time you create a custom code generator.
Digitech Systems' API
You can access Digitech Systems' API from the Script Editor. The API provides classes for reading/writing
documents and indexes within the current batch. For more information on the Digitech Systems API, launch
the PVCaptureBatchAPI.chm help file located in the Docs directory where PaperVision Capture is installed.
This help file provides Microsoft Developer's Network (MSDN)-style documentation on our DSI.Capture.API
namespace, including code samples.
Custom code samples (as text or XML files) are in the Library\Samples directory where PaperVision
Capture is installed. You can cut and paste the code directly into the Script Editor for a Custom Code step.
The following code samples are included:
l
AddPrefixValuetoBatchDocumentIndexes iterates through all documents comprising a batch and
appends prefixes to index values.
NOTE: This script is intended to be run in an automated custom code step.
l
AutoCreateBatches_Part1 and AutoCreateBatches_Part2 use the PaperVision Capture Automation
Server to create and populate batches on the fly through two custom code steps (for example, polling a
directory for .TIF files, and then automatically creating batches).
PaperVision® Capture Administration Guide
277
Chapter 12 Custom Code Configuration
NOTE: Creating and populating batches via automated Custom Code causes the Automation Server
to consume a PaperVision Capture Scan license.
l
l
l
l
l
CalltoCustomAssembly demonstrates one way to call out to code in your own assembly.
CopyIndexValues duplicates an index value from a source document to one or more subsequent
documents.
DisplayBatchPageCount displays the total number of pages in the batch (designed to be run in the
Operator Console from a manual custom code execute event).
ExportFullTextData copies full-text OCR data for each document stored in the batch to a specified
directory.
ImportASCIIwithImages imports images and index information from a number of other document imaging
systems.
NOTE: You must configure constants at the beginning of the script for the operator to successfully run
the script.
l
l
l
l
l
l
l
l
l
InspectBeforeAddPage examines the physical dimensions of a scanned image and inserts a document
break if the page is detected as an envelope.
MatchAndMergeOnIndexValidate executes custom code that will look up and populate index values
when the operator enters a index value and then tabs to the next field.
MultiPageTIFFConversion divides a multiple-page TIFF into separate images (one image per page).
OCRFullTextPageStatistics records Open Text Full-Text OCR statistics per selected output type.
Statistics are recorded when the Open Text Full-Text OCR step processes a page and converts the page
to the selected output format(s).
OCRIndexZoneStatistics records Open Text Zonal OCR statistics when an Open Text OCR zone
populates an index value.
OCRMarkSenseZoneStatistics records Open Text Zonal OCR statistics when an Open Text OCR zone
inserts an auto document break page between documents.
OpenBatchCustomCode executes custom code when the operator opens a batch in the Operator
Console.
QCDocumentPageCounts automatically applies a QC tag to every document in the batch that contains
fewer than four and greater than six pages. This script is designed to be run from within a manual job step
from the Custom Code Execute event.
QCTaggingIndexDocAndPageCustomCode automatically tags a document containing more than “x”
number of pages; pages less than “x” kilobytes; and, index fields containing specific text. For example, to
change the maximum number of pages per document to 6, change the following lines to:
if (pages.Length > 6) if(!this.Batch.TryAddDocumentTag(docId, "Document Size", "Document contains more than 6 pages", out error))
PaperVision® Capture Administration Guide
278
Chapter 12 Custom Code Configuration
l
l
l
l
RecordDailyDocumentAndPageCountStatistics when used in an automated Custom Code step
following a Capture step, totals the number of documents and pages for batches that flow through a job on
a daily basis. Results are available as custom statistics that are viewable/filterable from the Batch
Statistics screen.
Reformat Indexes attempts to parse all non-text barcode and OCR values that are ingested into index
fields without being formatted. The script calls the "TrySetValueFormatted" method after attempting to
parse the value, and then tags the index if the value cannot be formatted.
SendEmail runs custom code that sends an email, for example, you could modify this code to send an
email when an unclassified document is found in a batch.
SetScanDate automatically sets a scan date index value (document creation date) into the batch for every
document. The document’s creation date is the date/time the document entered the batch. The date/time
value is stored in Universal Time Coordinated (UTC), also known as Greenwich Mean Time (GMT). For
example, Denver, Colorado’s UTC time at 2:00 PM on April 9, 2009 will display as ”04/09/2009 20:00:00.”
To change the date/time value to your local time zone instead of UTC, change the code in line 46 to:
if (!this.Batch.TrySetIndexValue(id, "ScanDate", documentCreatedDate.ToLocalTime(), true, out error))
l
l
SubmitBatchCustomCode executes custom code when the operator submits a batch in the Operator
Console.
ValidateIndex provides an example of how to validate an index field value.
Batch Property
Within your custom code, you can access the Digitech Systems API via the Batch property. The Batch
property is of the type DSI.Capture.API.Batch and represents the primary entry point for the Digitech
Systems API.
For example, to insert a new document to a batch within your CallHandler method (C# in this case), you can
type:
this.Batch.TryInsertDocument(/*see API documentation for parameters*/)
Another approach is to call out to your own assembly and pass the instance of the Batch object to your code
(again, the instance is available as the "Batch" property inside the pre-written "Code" class.) This approach
would allow you to use Visual Studio for coding. Then, at run time, you would need to ensure that your
assembly is located in the same directory as the PaperVision Capture executable files.
Custom Code Event Arguments
Each custom code event exposes an argument parameter that is specific to the given event type. Within your
code, you can access these arguments to read event-specific data and to configure settings. For example,
your code can change a property that determines the action that is triggered in the PaperVision Capture
Operator Console after the event. The event-specific arguments are listed below.
NOTE: The following classes are derived from the .NET System.Data.DataSet class and support all
DataSet properties and functions. Additionally, DataSets are mapped to index values in the Operator
Console’s Index Manager.
PaperVision® Capture Administration Guide
279
Chapter 12 Custom Code Configuration
Add Page Event - CCustomCodeNewImageEventArgs
The Add Page event uses the CCustomCodeNewImageEventArgs class to pass every scanned image to the
custom code. Use of this argument is illustrated in the InspectBeforeAddPage sample script:
CCustomCodeNewImageEventArgs args = base.Parameter as
CCustomCodeNewImageEventArgs;
The following properties are located within the custom code:
1. Image.Attributes (hashtable containing the following image attributes):
a. PageSide: string (indicates the side of the page as "Front" or "Back")
b. DriverName: string (indicates the name of the scanner driver)
2. PageTags: TagInfo[]
This property can be used to specify one or more page tags to be added after the page has been appended to
the batch. Tags added to a break page (based on job configuration settings to delete break pages) will be
ignored.
Barcode Detected Event - BarcodeReadEventArgs
The Barcode Detected event uses the BarcodeReadEventArgs class to pass every barcode's data (from
each barcode zone) to the custom code. This event is triggered each time a barcode is successfully detected
during scanning (multiple barcodes can be detected per page).
The following properties are located within the custom code:
1. BarcodeItem Properties
These properties contain all barcode data, including barcode value, location, size, orientation, and
barcode type.
2. PageTags: TagInfo[]
This property can be used to specify one or more page tags to be added after the page has been
appended to the batch. Tags added to a break page (based on job configuration settings to delete break
pages) will be ignored.
Custom Code Execution Event - ManualCustomCodeEventArgs
The Custom Code Execution event uses the ManualCustomCodeEventArgs class to pass the operator’s
index values to the manual custom code event. This event is triggered when the operator triggers the Execute
Custom Code operation in the Operator Console.
ManualCustomCodeEventArgs args = base.Parameter as
ManualCustomCodeEventArgs;
Index Populated Event - IndexPopulateEventArgs
The Index Populated event uses the IndexPopulateEventArgs class to pass the operator’s index values to the
custom code. This event is triggered when an index value is populated.
IndexPopulateEventArgs args = base.Parameter as IndexPopulateEventArgs
PaperVision® Capture Administration Guide
280
Chapter 12 Custom Code Configuration
Index Validate Event - IndexValidateEventArgs
The Index Validate event uses the IndexValidateEventArgs class to pass the operator’s index values to the
custom code. This event is triggered once the operator proceeds or tabs to the next index field in the Index
Manager.
IndexValidateEventArgs args = base.Parameter as IndexValidateEventArgs;
OCR Statistics Event - OCRFullTextPageProcessedEventArgs
The OCR Statistics custom code event uses the OCRFullTextPageProcessedEventArgs class to pass Open
Text full-text data from each page (per selected output format) to the custom code. For each output type, this
event is triggered once a page has been converted to PDF, PaperVision Enterprise, PaperFlow, or Text fulltext output.
The following properties are located within the custom code:
1. DocumentId: string
2. PageId: Guid
3. PageIndex: int32
4. OCRWords: int32
The OCRWords property contains the following variables:
internal OCRCharacter[] characters = new OCRCharacter[] { };
internal Int32 line = 0;
internal System.Drawing.Point location = new System.Drawing.Point();
internal System.Drawing.Size size = new System.Drawing.Size();
The OCRCharacter variable contains the following properties:
public System.Drawing.Point Location
{
get
{
return location;
}
}
public System.Drawing.Size Size
{
get
{
return size;
}
PaperVision® Capture Administration Guide
281
Chapter 12 Custom Code Configuration
}
public Byte Confidence
{
get
{
return confidence;
}
}
public Char Code
{
get
{
return code;
}
}
public bool Rejected
{
get
{
return rejected;
}
}
public Char[] Alternatives
{
get
{
return alternatives;
}
}
5.
RecognitionTime: int32 (milliseconds)
6.
AdditionalValues: Hashtable
7.
ConverterName: string
PaperVision® Capture Administration Guide
282
Chapter 12 Custom Code Configuration
OCR Statistics Event - OCRIndexZoneProcessedEventArgs
The OCR Statistics custom code event uses the OCRIndexZoneProcessedEventArgs class to pass index
values populated by Open Text OCR zones to the custom code. This event is triggered once the contents of
an Open Text OCR zone populate an index value.
The following properties are located within the custom code:
1. DocumentId: string
2. PageId: Guid
3. PageIndex: int32
4. OCRWords: int32
The OCRWords property contains the following variables:
internal OCRCharacter[] characters = new OCRCharacter[] { };
internal Int32 line = 0;
internal System.Drawing.Point location = new System.Drawing.Point();
internal System.Drawing.Size size = new System.Drawing.Size();
The OCRCharacter variable contains the following properties:
public System.Drawing.Point Location
{
get
{
return location;
}
}
public System.Drawing.Size Size
{
get
{
return size;
}
}
public Byte Confidence
{
get
{
PaperVision® Capture Administration Guide
283
Chapter 12 Custom Code Configuration
return confidence;
}
}
public Char Code
{
get
{
return code;
}
}
public bool Rejected
{
get
{
return rejected;
}
}
public Char[] Alternatives
{
get
{
return alternatives;
}
}
5. RecognitionTime: int32 (milliseconds)
6. AdditionalValues: Hashtable
7. FieldName: string
OCR Statistics Event - OCRMarkSenseZoneProcessedEventArgs
The OCR Statistics custom code event uses the OCRMarkSenseZoneProcessedEventArgs class to pass
auto document break zone statistics to the custom code. This event is triggered when an Open Text OCR
zone inserts an auto document break page between documents.
The following properties are located within the custom code:
1. DocumentId: string
2. PageId: Guid
PaperVision® Capture Administration Guide
284
Chapter 12 Custom Code Configuration
3. PageIndex: int32
4. OCRWords: int32
The OCRWords property contains the following variables:
internal OCRCharacter[] characters = new OCRCharacter[] { };
internal Int32 line = 0;
internal System.Drawing.Point location = new System.Drawing.Point();
internal System.Drawing.Size size = new System.Drawing.Size();
The OCRCharacter variable contains the following properties:
public System.Drawing.Point Location
{
get
{
return location;
}
}
public System.Drawing.Size Size
{
get
{
return size;
}
}
public Byte Confidence
{
get
{
return confidence;
}
}
public Char Code
{
get
{
return code;
PaperVision® Capture Administration Guide
285
Chapter 12 Custom Code Configuration
}
}
public bool Rejected
{
get
{
return rejected;
}
}
public Char[] Alternatives
{
get
{
return alternatives;
}
}
5. RecognitionTime: int32 (milliseconds)
6. AdditionalValues: Hashtable
Saving Indexes Event - IndexSaveEventArgs
The Saving Indexes event uses the IndexSaveEventArgs class to pass the operator’s index values to the
custom code. The Saving Indexes event is triggered as index values are saved to the batch. This class
contains the BatchNavigation enumeration property that determines which document (in the Operator
Console) opens immediately after indexes are saved.
IndexSaveEventArgs args = base.Parameter as IndexSaveEventArgs;
NOTE: By default, the Saving Indexes event proceeds to the next document.
Within your custom code, you can use the following constants to set the BatchNavigation enumeration
property:
1. None: Remains on current document
2. NextDoc: Proceeds to next document
3. PreviousDoc: Returns to previous document
4. LastDoc: Proceeds to last document in batch
5. FirstDoc: Returns to first document in batch
PaperVision® Capture Administration Guide
286
Chapter 12 Custom Code Configuration
For example, you can configure the BatchNavigation enumeration property to remain on the current document
after index values are saved:
args.BatchNavigation = BatchNavigation.None;
Additional API Functions
In addition to the API Functions documented in the PVCaptureBatchAPI.chm help file, the API functions
described in this section can be used within your custom code.
Custom Code/Export Functions
protected string[] GetPageFiles(string documentID)
Returns path values for all images contained in a document (from all pages)
protected Stream GetFileStream(PVFile file)
Returns the stream for a specified PVFile
protected Stream[]GetDocumentStreams(string documentID)
Returns an array of streams for all files contained in a document (from all pages)
protected Stream[] GetDocumentStreams(string documentID, string
jobStepName, bool bitonal)
Returns streams for all files contained in a document (from all pages) based on job step name and
bitonal option
protected void CopyStreamToDisk(Stream stream, string path)
Copies content of a stream to disk
public string[] CopyFilesToDisk(string documentID, string rootPath)
Copies all files from a document (from all pages) to a folder and returns an array for all image path
values
protected void SetPersistValue(string key, string value, string
rootPath)
Copies all files from a document (from all pages) to a folder based on job step name and bitonal
option
protected string Get PersistValue(string key, string rootPath)
Reads persisted value for a key
protected string GetNextLockedPath(string root, Int32 maxExportSize,
bool exclusive)
Returns the next available path (path is locked before it is returned)
NOTE: If you set the EXCLUSIVE_EXPORT script constant to True, the function will
throw an exception if the last available folder is in use. If you set the EXCLUSIVE_
EXPORT script constant to True, it is strongly recommended to specify an automation
server that will process exports. The automation server can be assigned within each
export generator's Configuration > Options tab. See "Exports" on page 306 for more
information.
PaperVision® Capture Administration Guide
287
Chapter 12 Custom Code Configuration
String GetNextLockedPath(string root, Int32 maxExportSize,
ExcludePathDelegate excludeFunction, bool exclusive)
Returns the next available path (path is locked before it is returned)
NOTE: If you set the EXCLUSIVE_EXPORT script constant to True, the function will throw
an exception if the last available folder is currently is in use. The delegate is used to
determine which folders should be skipped. In addition, if you set the EXCLUSIVE_
EXPORT script constant to True, it is strongly recommended to specify an automation
server that will process exports. The automation server can be assigned within each export
generator's Configuration > Options tab. See "Exports" on page 306 for more information.
protected string GetNextLockedPath(string root, Int32 maxExportSize)
Returns the next available path (path is locked before it is returned)
NOTE: If using this custom code function in conjunction with the EXCLUSIVE_EXPORT
script constant (set to True), it is strongly recommended to specify an automation server
during export configuration. The automation server can be assigned within each export
generator's Configuration > Options tab. See "Exports" on page 306 for more information.
protected void UnlockPath(string path)
Deletes lock for a specified path
void ClearRootPath(string path)
Deletes all folders containing empty subfolders for all folders listed under ‘path’
protected void SetExportComplete(string path)
Flags folder as complete by dropping export.complete file
protected bool IsExportComplete(string path)
Checks whether export folder is flagged as complete
protected bool IsExported(string documentID)
Checks whether document was previously exported
protected bool SetExported(string documentID)
Sets the document's exported status
protected void DeleteDocument(string documentID)
Deletes document after it has been exported
protected void SetStatus(string status, Int32 percentage)
Returns percentage of custom code that has been executed
Protected int GetNonExportedDocumentCount();
Returns the number of non-exported documents
Full-Text OCR Functions
PaperVision® Capture Administration Guide
288
Chapter 12 Custom Code Configuration
protected string[] GetPageText (string filePath)
Returns text for each page
protected string[] GetOCRFiles (string documentID, string stepName,
string converterCode)
Returns Full-Text OCR files belonging to a specific converter
string[] GetOCRFiles (string documentID, string stepName, string
converterCode, string path)
Writes Full-Text OCR files belonging to a specific converter to directory ‘path’
Important! The caller is responsible for post-processing clean-up if the files are not required.
Image Processing Functions
string ConvertImages(string[] sourceFiles, string destinationFile, ConvertFileType convertFileType)
Converts one or more images to a single destination image file and returns the actual path
under which the file was saved
Int32 GetPageCount(string s
ourceFile)
Returns the number of pages found in a multiple-page image
string GetPageImage(string sourceFile, Int32 pageIndex, string
destinationFile, OutputFileType outputFileType)
Retrieves a specific image referenced by a specific page index in a multiple-page image
protected string[] GetPageFiles(string documentID)
Returns a path value for all images belonging to a document (from all pages)
bool IsMultipageFormat(ConvertFileType convertFileType)
Determines if the passed file type supports multiple-page format
PVBatch Helper Functions
Int32 GetBlankIndexCount()
Returns the number of blank indices
string[] GetAvailableFields()
Returns the set of fields that can be written to
string GetIndexValue(string fieldname)
Returns the field value for the specified field name
void SetIndexValue(string fieldname, string fieldValue)
Assigns a field value for a specified field name
NOTE: This function cannot be used with a detail set field; otherwise, an exception will
result. Also, when called from within an Index Validate event, this function can only be used
for the target index.
string[] GetDetailSetFields()
Returns the field names of the detail set in Match and Merge
PaperVision® Capture Administration Guide
289
Chapter 12 Custom Code Configuration
void AssignDetailSet(DataRow row)
Assigns a detail set field in automated match and merge using a single passed DataRow
void AssignDetailSet(DataSet dataset)
Assigns detail set values from a DataSet (returned from the database) - used in match and
merge
void AssignDetailSet(DataRow row, DataSet indices)
Assigns a detail set from a passed DataRow value (manual match and merge) - detail set is
not written to the batch; instead, it is written to the indices DataSet which passed from the UI
void AssignDetailSet(DataSet dataset, DataSet indices)
Assigns detail set values to passed indices (manual match and merge)
void UpdateCurrentIndex(DataRow row)
Updates the current index value from the passed DataRow - row is retrieved from a dataset
populated by the SQL database (match and merge)
Bool IsFieldDetailSet(string fieldName)
Checks whether the specified field is a detail set field
PVIndexMetadata GetIndexMetadata(string fieldName)
Returns metadata for an index
bool IsFieldEmpty(string fieldName)
Checks whether a field is empty
string GetMappedColumn(string fieldName)
Returns the mapped column to a specific field name (match and merge)
DataTable GetMapping()
Returns a mapping table between indices and table columns (match and merge)
string GetWhereClause()
Generates a WHERE clause to be used in the SQL query (match and merge)
string GetWhereClause(DataRow row)
Generates a WHERE clause to be used in the SQL query that uses the values in DataRow to
add conditions (match and merge)
string[] GetDocumentIDs()
Returns list of document id values
PVPage[] GetPages(string documentID)
Returns a list of pages for a specific document
string GetPath(PVFile file)
Returns a path for a specified file
PVIndex[] GetIndices(string documentID)
Returns a list of indices for a specific document
PVDetailSet[] GetDetailSets(string documentID)
Returns the detail set values for a specific document
PVFile GetPreferredFile(PVPage, string jobStepName, bool bitonal)
PaperVision® Capture Administration Guide
290
Chapter 12 Custom Code Configuration
Returns the file that matches the bitonal value (otherwise, first file in array is returned)
string GetExtension(string imagePath)
Returns the extension of an image path
Enumerations
The enumerations described in this section can be used within your custom code.
public enum ConvertFileType
This enumeration is used by the ConvertImages() function and specifies the conversion types that will be
applied to one or more images.
{
/// <summary>
/// No file conversion (returns image input path and appends an
extension if not passed in destinationFile variable)
/// </summary>
CVT_NO_CONVERSION,
/// <summary>
/// TIFF with Group IV and/or medium JPEG compression (single- or
multi-page)
/// </summary>
CVT_TIFF_G4_MEDJPG,
/// <summary>
/// TIFF with Group IV and/or LZW compression (single- or multi-page)
/// </summary>
CVT_TIFF_G4_LZW,
/// <summary>
/// TIFF with no compression (single- or multi-page)
/// </summary>
CVT_TIFF_NONE,
/// <summary>
/// PDF with Group IV and/or medium JPEG compression (single- or multipage)
/// </summary>
CVT_PDF_G4_MEDJPG,
/// <summary>
/// PDF with Group IV and/or LZW compression (single- or multi-page,
and image-only PDFs)
PaperVision® Capture Administration Guide
291
Chapter 12 Custom Code Configuration
/// </summary>
CVT_PDF_G4_LZW,
/// <summary>
/// JPEG with medium JPEG compression (single-page only)
/// </summary>
CVT_JPG_MEDJPG,
/// <summary>
/// GIF (single-page only)
/// </summary>
CVT_GIF,
/// <summary>
/// BMP (single-page only)
/// </summary>
CVT_BMP,
/// <summary>
/// PNG (single-page only)
/// </summary>
CVT_PNG
/// <summary>
/// JPEG 2000
/// </summary>
CVT_JPG2000
}
public enum OutputFileType
This enumeration is used by the GetPageImage() function, and specifies the output file types when single
pages are retrieved from a multiple-page image.
{
/// <summary>
/// JPEG
/// </summary>
OFT_JPG
/// <summary>
/// TIFF
PaperVision® Capture Administration Guide
292
Chapter 12 Custom Code Configuration
/// </summary>
OFT_TIFF
/// <summary>
/// Bitmap
/// </summary>
OFT_BMP
}
public enum UIRefreshLevel
This enumeration synchronizes the Operator Console’s user interface with any changes made to the batch
via custom code. Setting the UIRefreshLevel in custom code forces the user interface to refresh the selected
component specified by the enumeration value (None, Index, CurrentDocumentIndexes, etc.). If you use
either the Index Populated or Index Validate Custom Code Event to change an index value, the Operator
Console's Index Manager will remain synchronized using the UIRefreshLevel.Index value.
{
/// <summary>
/// no UI refresh required
/// </summary>
None = 0x00,
/// <summary>
/// index field needs to be refreshed (i.e., via IndexValidate or
IndexPopulate
event)
/// </summary>
Index = 0x01,
/// <summary>
/// all indexes for current document need to be refreshed (does not apply to
Match
and Merge)
/// </summary>
CurrentDocumentIndexes = 0x02,
/// <summary>
/// current page needs to be refreshed
/// </summary>
SinglePage = 0x04,
PaperVision® Capture Administration Guide
293
Chapter 12 Custom Code Configuration
/// <summary>
/// multiple pages need to be refreshed
/// </summary>
MultiPage = 0x08
}
Public Properties
The public properties listed in this section can be used within your custom code.
/// <summary>
/// Batch object
/// </summary>
public PVBatch Batch
/// <summary>
/// Parent window
/// </summary>
public Control Parent
/// <summary>
/// Control referencing the current index
/// </summary>
public Control Control
/// <summary>
/// Used to pass optional parameters
/// </summary>
public object Parameter
/// <summary>
/// Code result that returns status of custom code execution
/// </summary>
public CodeResult CodeResult
/// <summary>
/// PDF Resolution used when importing PDF files
PaperVision® Capture Administration Guide
294
Chapter 12 Custom Code Configuration
/// </summary>
public Int32 PDFResolution
/// <summary>
/// PDF Smoothing option used when importing PDF files
/// </summary>
public PDFSmoothing PDFSmoothing
Debugging Custom Code
Custom code that you type on the Script Editor window is compiled on-the-fly by PaperVision Capture, so
there is no way to debug or step through this code at run time. However, if you write code in your own
assemblies and call out to these pre-compiled assemblies, then you can debug this code by attaching your
debugger to the appropriate capture process.
For code that is run in a manual job step (for example,code running in a "Saving Indexes" event), then you
should attach your debugger to the CaptureClient.exe process.
To debug code that is executed in an automated custom code step:
1. On the machine where the code is going to be executed, stop the PaperVision Process Initiator
Windows service.
2. Set your debugger to start an external application for debugging.
3. From the directory where PaperVision Capture is installed, choose the
DSI.PVECommon.PVProcWork.exe executable and pass a command line argument of "0." When you
start this executable, it will execute any pending "Process Batch" operations (including executing custom
code steps) that have been appropriately scheduled in the "Automation Service Scheduling" on page 27
screen.
4. When you are finished debugging, restart the PaperVision Process Initiator Windows service.
WARNING: Do not attempt to debug code in a production environment. Doing so may adversely
impact system performance and have unpredictable impacts on customer data and end-user
functionality.
Script Editor
The Script Editor launches with pre-written, generic code that you can edit and compile directly in the
window. The Script Editor window contains the "CallHandler" pre-written method. Although you can add
new methods or properties to the "Code" class or call out to other classes (even those defined in your own,
separately-compiled assemblies), you should not remove the "CallHandler" method since it is the entry point
for executing your custom code. If you call out to other namespaces, remember to add a reference to the
necessary assemblies. (See "References" on page 299 for more information.)
PaperVision® Capture Administration Guide
295
Chapter 12 Custom Code Configuration
Opening the Script Editor
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job you want to edit, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace of the Job Definitions window, double-click the Custom Code job step to display the
Properties tab on the left pane.
6. On the Properties tab, expand Custom Code Events (Step Level).
7. Click Step Executing, and then click the ellipsis button
dialog box.
to open the Select Custom Code Generator
8. Ensure that the Advanced check box is selected.
9. From the Language list, select the C# or Visual Basic programming language.
10. From the list of generators, select one of the following.
l
l
Select Basic to write your own custom code directly in the Script Editor.
Select Export Template to open a pre-defined custom code script for custom exports that you can edit
in the Script Editor.
The Script Editor appears similar to the following.
Script Editor
PaperVision® Capture Administration Guide
296
Chapter 12 Custom Code Configuration
Importing Custom Code
The Import command lets you load an external custom code XML file into the Script Editor.
To import an external XML file
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. On the toolbar, click Import
.
3. In the Open dialog box, locate, and then select the XML file you want to import.
4. Click Open.
Exporting Custom Code
The Export command lets you export custom code as an XML file.
To export custom code
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. On the toolbar, click Export
.
NOTE: You cannot export code that does not compile successfully in the Script Editor.
3. In the Save As dialog box, specify in which folder and under what name you want to save the exported XML
file.
4. Click Save.
Deleting, Copying, and Moving Custom Code
You can delete, copy, and move sections of the custom code within the Script Editor or to another editor.
To delete custom code
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. In the Script Editor window, select the code you want to delete.
3. On the toolbar, click Cut
.
PaperVision® Capture Administration Guide
297
Chapter 12 Custom Code Configuration
To copy custom code
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. In the Script Editor window, select the code you want to copy.
3. On the toolbar, click Copy
.
To move custom code
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. In the Script Editor window, select the code you want to move.
3. From the toolbar, click Cut
if you want to remove the code, or click Copy
4. Place your cursor at the new location for the code, and then click Paste
to copy it.
.
Compiling Custom Code
The Compile command validates your code.
To compile your code
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. After writing your custom code in the Script Editor, on the toolbar click Compile
.
If the code compiles correctly, a "Code compiled successfully" message appears.
If the code does not compile correctly, the Compile Errors pane appears at the bottom of the widow
similar to the following.
Compile Errors
The Compile Errors pane describes the error and its location.
3. Fix any errors that exist, and then compile again.
4. After the "Code compiled successfully" message appears, click OK.
PaperVision® Capture Administration Guide
298
Chapter 12 Custom Code Configuration
References
References are used to link external assemblies, including standard .NET or custom assemblies that you
generate.
To specify references
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. On the toolbar, click References
to open the References dialog box where a default listing of
assembly files appears. You can add to or remove files from this list.
References
3. If you want to add more assembly files, click Add to open the Add Reference dialog box.
PaperVision® Capture Administration Guide
299
Chapter 12 Custom Code Configuration
Add Reference
4. Do one of the following.
l
Select the file(s) that you want to add, and then click OK.
l
Click Browse to locate the assembly file you want to use, and then click Open.
5. To remove an assembly file from the References dialog box, select it, and then click Remove .
6. When you are finished adding and removing references, click OK .
Finding Code in the Script Editor
You can quickly locate code in the script editor by using the Find operation.
To find code in the Script Editor
1. If the Script Editor window is not open, complete the procedure under "Opening the Script Editor" on page
296.
2. In the Find
box, enter the code or character you want to find.
3. Press Enter to initiate the search. The code or character is selected in the Script Editorwindow.
4. You can click Find Next
specified code or character.
or Find Previous
PaperVision® Capture Administration Guide
on the toolbar to navigate to instances of your
300
Chapter 12 Custom Code Configuration
Modifying Exports with the Script Editor
After you have initially configured exports with the Custom Code Generator Wizard, you can use the Script
Editor to modify export scripts. (See "Exports" on page 306 for information about configuring PaperVision
Capture exports.)
To modify exports with the Script Editor
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job you want to edit, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace of the Job Definitions window, double-click the Custom Code job step to display the
Properties tab on the left pane.
6. On the Properties tab, expand Custom Code Events (Step Level).
7. Click Step Executing, and then click the ellipsis button
to open the Select Edit Mode dialog box.
Select Edit Mode
8. Select Script Editor, and then click OK. The resulting export script appears in the Script Editor.
Modifying Export Constants
From the Script Editor, you can modify export scripts that you previously created with the Custom Code
Generator Wizard. (See "Modifying Exports with the Script Editor" on page 301 for more information.) On the
OCR tab, for example, you can change the OCR_CONVERTER_CODE constant in the Script Editor so
that PDF searchable images are exported (for Nuance Full-Text OCR). To modify the constant, the following
line in the XML script would read:
private const string OCR_CONVERTER_CODE = “PDFImageOnText”;
NOTE: For a list of converter codes, see the PVCaptureBatchAPI.chm help file’s
PVBatch.TryGetOCRFiles Method topic found within the Docs directory where PaperVision Capture is
installed.
PaperVision® Capture Administration Guide
301
Chapter 12 Custom Code Configuration
In another scenario, you can use full-text OCR data from another job step by modifying the OCR_JOB_
STEP_NAME constant. This is completed by entering the name of the step between the quotes (for example,
“Nuance Full-Text OCR” or "Open Text Full-Text OCR").
Match and Merge Wizard
The Match and Merge - Auto generator launches the Match and Merge Wizard where you configure the
connection properties, field mapping, and optional Match and Merge settings.
NOTE: Ensure that the lookup table and columns for the database have been configured and indexes
have been defined before launching the Match and Merge Wizard.
To Select the Match and Merge Generator
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job you want to edit, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace of the Job Definitions window, double-click the Custom Code job step to display the
Properties tab on the left pane.
6. On the Properties tab, expand Custom Code Events (Step Level).
7. Click Step Executing, and then click the ellipsis button
dialog box.
to open the Select Custom Code Generator
Select Custom Code Generator
PaperVision® Capture Administration Guide
302
Chapter 12 Custom Code Configuration
NOTE: To remove existing custom code, on the Properties tab, expand Custom Code Events [Step
Level]. Right-click Step Executing, and then click Reset . Additionally, to prevent the Select Scripting
Language dialog box from appearing each time you configure custom code, select Suppress this dialog
when creating new custom code.
8. From the Language list, select the programming language you want to use. You can select C# or Visual
Basic.
9. Double-click Match and Merge - Auto to open the Match and Merge Wizard.
Configuring the Match and Merge Wizard
The Connection Properties area appears when you open the Match and Merge Wizard. You can configure
the database connection properties including the database server and name, user name and password, and
database lookup table.
To configure the Match and Merge Wizard
1. If the Match and Merge Wizard is not open, complete the procedure under "To Select the Match and
Merge Generator" on page 302.
Connection Properties
2. In the Server box, type the database server where the match and merge process will be performed.
3. In the Database box, type the database name where the match and merge process will be performed.
4. In the User Name box, type the user name for the database server connection.
5. In the Password box, type the password for the database server connection.
PaperVision® Capture Administration Guide
303
Chapter 12 Custom Code Configuration
NOTE: If you leave the User Name and Password fields blank, the database connection will use the
Windows Authentication credentials. Entering a user name and password for the database will supersede
the Windows Authentication credentials.
6. If you want to use a custom connection string, select Custom Connection String, and then type the
connection information in the box below the option.
7. Click Connect to test the connection to the database. After you have connected to the database, values
appear in the Lookup Table list.
8. From the Lookup Table list, select the database table used for lookups.
9. Click Next. The Field Mapping area appears.
Field Mapping
10. The Field Mapping area lets you match the columns in the database to the field names (indexes) that you
defined. Click the Column Name list to select the database column name that will match the associated
field name. If one of the index fields should not be matched, do not map it to the Column Name. When the
operator executes the Merge Index Values command, only the mapped fields will be populated in the Index
Manager.
NOTE: Field names are synonymous with indexes that have been defined.
11. After selecting the column names, click the Match check box(es). Detail fields are indicated by shaded
Match columns and cannot be selected to match.
l
In the example above, the Check Number index value, entered by the operator, will be matched with the
corresponding Check_Number column in the database.
PaperVision® Capture Administration Guide
304
Chapter 12 Custom Code Configuration
l
l
Once the operator executes the Merge Index Values command, the corresponding Check Date, Invoice
Date, Invoice Number, and Payee are populated in the Operator Console Index Manager.
If the operator does not know the exact index value during hand-key indexing, the operator can insert
wildcard characters to perform a partial search against a database. For example, the operator can insert
the percent sign (%) to specify any number of unknown characters to search for in a SQL, Sybase, or
Oracle database; the operator can insert the asterisk (*) to specify any number of unknown characters to
search for within a Microsoft Access database.
12. Click Next. The Match and Merge Options area appears.
Match and Merge Options
13. The Match and Merge Options area contains additional parameters that define the match and merge
process. In the Number of Blank Fields Required box, type or select the number of fields that must be
blank for PaperVision Capture to attempt to match during the custom code execution.
l
l
For example, you set the Number of Blank Fields Required to a value of 2. If only one field is left
blank before the match and merge process is run, then PaperVision Capture will not match because at
least two fields were not blank.
Valid values range from zero to the number of database columns that are defined. For example, if you
have five database columns defined, you can enter a value from zero to five.
14. If you select Overwrite Existing Index Information, the match and merge values will overwrite the
existing index entries already populated in the batch.
15. The Match Count Column setting applies only to integer data type columns in the database. Select the
Match Count Column check box if the match count should increment in the database by one each time a
match is encountered. If you enable this setting, choose the database column from the corresponding list.
16. Select Delete Matching Records to remove the matching record from the database once it is found during
the match and merge process.
PaperVision® Capture Administration Guide
305
Chapter 12 Custom Code Configuration
NOTE: You can enable only the Match Count Column or the Delete Matching Records setting, but
not both.
17. For manual indexing, select Enable Detail Sets if the detail fields should be populated when the operator
enters the index fields. See "Configuring Detail Sets" on page 59for more information.
l
If you do not select Enable Detail Sets, the operator is presented with a pick list of data that meets the
index field criteria. The operator then selects the appropriate record, and the detail fields are populated
according to the selected record.
When you define a Custom Code step to run an automated Match and Merge process
l
l
If you select Enable Detail Sets , all detail fields are automatically populated (for example, if five rows
of data meet your criteria, five detail sets are populated).
Conversely, if you do not select Enable Detail Sets, the detail fields populate with data from the first
row of results.
18. Click Next, which opens the last screen of the wizard.
19. Click Finish, which opens the Script Editor where you can make changes to the code if necessary.
20. Click OK.
Matching and Merging with Text Files
If you are using custom code to match and merge index fields with a text file, you can control how data is
handled in the lookup table. If the text file contains dates, currency, or decimal data, for example, you can
manipulate how data is formatted by creating a schema information file (Schema.ini) and placing it in the
same directory where the text file resides. If you do not define how date columns are handled, date values will
be imported in the DateTime format. You can find information on how to create Schema.ini files on the
Microsoft Software Developer's Network:
http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx
Exports
PaperVision Capture provides a user interface for export definitions within the Custom Code step. Exports
can subsequently be imported into ImageSilo or PaperVision Enterprise (ImageSilo/PVE XML), PaperFlow
(PaperFlow.xml), and other systems. If you have modified an export script in PaperVision Capture R72 or
earlier, the Exports library is located in Digitech Systems\PaperVision Capture\Library\Exports where
PaperVision Capture was installed. If you have not modified an export script in R72 or earlier, or you are
initially installing PaperVision Capture R73, the Exports library will not exist since exports are configured
directly in the user interface.
As exports are run, they are appended to the first available destination folder based on the sequence number
and maximum export size (as defined by the MAX_EXPORT_SIZE script constant). When the maximum
export size is reached, exports are appended to the next available folder. If two or more automated processes
attempt to execute the same export (in the same destination folder), the first process places an exclusive
lock on the folder. As a result, all subsequent processes will append exports to the next available folder. You
can overwrite this method by specifying an automation server (in the export's Configuration > Options tab)
that will process exports.
PaperVision® Capture Administration Guide
306
Chapter 12 Custom Code Configuration
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER script constant (or you do not specify a value for the AUTOMATION_
SERVER script constant), your exported data may output to multiple folders (for example, data groups).
Export Definitions
PaperVision Capture exports contain specific definitions that you can configure. When you configure a
PaperVision Capture export from the Select Custom Code Generator dialog box, properties specific to that
export are shown, and default values that you can modify are included. Use the following procedure to open
the Select Custom Code Generator dialog box where you can select the export that you want to configure.
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job you want to edit, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. On the workspace of the Job Definitions window, double-click the Custom Code job step to display the
Properties tab on the left pane.
6. On the Properties tab, expand Custom Code Events (Step Level).
7. Click Step Executing, and then click the ellipsis button
to open the Select Custom Code Generator
dialog box. Each custom code generator and corresponding description are listed.
Select Custom Code Generator
8. If you want to see only the generators that you can configure using the provided dialog boxes, rather than
editing code in the Script Editor, then clear the Advanced check box.
PaperVision® Capture Administration Guide
307
Chapter 12 Custom Code Configuration
NOTE: To remove existing custom code, on the Properties tab, expand Custom Code Events
[Step Level]. Right-click Step Executing, and then click Reset . Additionally, to prevent the Select
Scripting Language dialog box from appearing each time you configure custom code, select
Suppress this dialog when creating new custom code.
ASCII with Images
The ASCII with Images export creates an ASCII text file containing images that can be imported into other
systems. The format of the file is completely customizable.
To configure the ASCII with Images export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click ASCII with Images. The ASCII with
Images Configuration dialog box appears.
ASCII with Images Configuration - General
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
3. Assign the appropriate properties on the General, Indexes, OCR, and Options tabs. Descriptions for
constant values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
PaperVision® Capture Administration Guide
308
Chapter 12 Custom Code Configuration
l
l
l
l
l
l
l
ROOT_PATH: This is the location where the exports will be created once the automation service
processes the step.
FIELD_DELIMITER: This customizable delimiter separates index values, page number/counts, and image
sizes.
IMAGE_DELIMITER: This customizable delimiter separates images when exporting using multiple-line
indexing and converting to single-page images.
FIELD_QUALIFIER: This constant contains the characters that surround the field values. By default,
quotation marks will appear.
IMAGE_QUALIFER: This constant contains the characters that surround the image values. By default,
quotation marks will appear.
REPORTED_ROOT_PATH: The path referenced in the export file originates from this location, not the
ROOT_PATH.
MAX_EXPORT_SIZE: This constant indicates the maximum export file size in megabytes, which defaults
to a value of 600.
NOTE: If the Root Path is blank, the export is written to the directory where the application is installed (for
example, C:\Program Files\Digitech Systems\PaperVision Capture). If the Reported Root Path is blank,
the resulting export script displays a blank value for the REPORTED_ROOT_PATH.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next
to the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To
change the order in which the indexes appear,select the index you want to move, and then click Move Up or
Move Down.
ASCII with Images Configuration - Indexes
PaperVision® Capture Administration Guide
309
Chapter 12 Custom Code Configuration
To edit the indexes in the resulting export script, you can modify the following INDICES_TO_INCLUDE
constant.
l
INDICES_TO_INCLUDE: This constant determines what index values are included in the export file. In
the resulting script, you can enter the name of the index value(s) between quotation marks, and separate
each index value with a comma. If you leave this array blank, no indices are included.
OCR
When you configure the properties on the OCR tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
ASCII with Images Configuration - OCR
l
l
l
OCR_ENGINE: This constant specifies the OCR engine (Nuance or Open Text) that processes OCR data
for the export.
OCR_CONVERTER_CODE: This constant specifies the OCR converter code, such as PDF, Text, XML,
etc., whose output format is used to export full-text data. When no value is defined (the default setting),
both images and associated full-text data are exported.
OCR_JOB_STEP_NAME: This constant specifies the job step whose full-text data are used for the
export. No value is defined by default, so full-text data from the current job step are used for the export.
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
PaperVision® Capture Administration Guide
310
Chapter 12 Custom Code Configuration
ASCII with Images Configuration - Options
l
l
l
l
l
l
PLACE_IMAGES_IN_SINGLE_DIR: If set to False, the images are placed in subdirectories at the
ROOT_PATH (maximum of 1000 images per directory). If set to True, the images are placed directly in the
ROOT_PATH folder.
INCLUDE_PAGE_NUMBER_COUNT: This determines whether the page number or page count of the
document should be added as an additional field in the export. If set to False, when exporting in a multi-line
format and creating single-page images, this value will match the page number of the document. If set to
True, the value will match the total number of pages in the document.
INCLUDE_IMAGE_SIZE: This constant determines whether the image file size is added as an additional
field in the export. If set to True, this value will match the image size referenced on that line of the export file
when exporting using a multi-line format and creating single-page images. If set to False, this value will
match the size of the first page in the document.
CREATE_MULTI_PAGE_IMAGE: Used in conjunction with CONVERSION_TYPE, this constant
determines whether exported images are single-page or multi-page.
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
PaperVision® Capture Administration Guide
311
Chapter 12 Custom Code Configuration
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
CONVERSION_TYPE: This constant determines the type of image file created during the export. The
default value, CVT_NO_CONVERSION, does not convert images during the export. If exporting to a
format that supports both single and multi-page images, you must set the CREATE_MULTI_PAGE_
IMAGE constant to True if you want to create multi-page images; otherwise single page images will result.
For example, if you set this to CVT_TIFF_G4_MEDJPG, a TIFF image is created during the export. If the
source image is binary, it will create a TIFF using Group 4 compression; if the source image is color (JPG or
BMP), it will create a TIFF using Medium JPEG compression. (See "Enumerations" on page 291 for more
information.)
TEXT_FILE_ORDER: This constant determines how the export file is formatted. You can select from the
following options.
o
IndicesFollowedByListImages: This option creates a single row for each document
with indexes listed first, followed by image files.
o
ListImagesFollowedByIndices: This option creates a single row for each document
with images listed first, followed by the index values.
o
MultiLineIndicesFollowedBySingleImage: This option creates one row of index
values for every image created during the export. If multiple image files are created for a
single document, multiple rows of identical index values will be created, each referencing
a different page of the document. This will be formatted with index values followed by
images.
o
MultiLineImagesFollowedByIndices: One row of index values for every image created
during the export. If multiple image files are created for a single document, multiple rows
of identical index values will be created, each referencing a different page of the
document. This will be formatted with images followed by index values.
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection, <None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
PaperVision® Capture Administration Guide
312
Chapter 12 Custom Code Configuration
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
Hyland OnBase
The Hyland OnBase export creates an ASCII text file and single-page TIFF images that can be imported into
the Hyland OnBase system. The following settings must be configured in the Hyland OnBase system prior to
importing any PaperVision Capture exports.
1. The Document Import Processor separator must be set to New Line.
2. The field delimiter must be set to None.
3. The field type must be set to Tagged Fields.
NOTE: If the PaperVision Capture job contains dates, the Hyland OnBase date format settings must
match the date field format for that job.
To configure the Hyland OnBase export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click Hyland OnBase. The Hyland OnBase
Configuration dialog box appears.
PaperVision® Capture Administration Guide
313
Chapter 12 Custom Code Configuration
Hyland OnBase Configuration - General
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
3. Assign the appropriate properties on the General, Indexes, and Options tabs. Descriptions for constant
values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
REPORTED_ROOT_PATH: The path referenced in the export file originates from this location, not the
ROOT_PATH.
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture). If the Reported Root
Path box is blank, the resulting export script displays a blank value for the REPORTED_ROOT_PATH
constant.
l
FULL_PATH_TAG: This tag precedes the REPORTED_ROOT_PATH in the export file.
l
DOCUMENT_TYPE: This constant indicates index mapping to the document type.
l
MAX_EXPORT-SIZE: This constant indicates the maximum export file size in megabytes. The default
value is 600.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next to
the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To change
PaperVision® Capture Administration Guide
314
Chapter 12 Custom Code Configuration
the order in which the indexes appear,select the index you want to move, and then click Move Up or Move Down.
Hyland OnBase Configuration - Indexes
To edit the indexes in the resulting export script, you can modify the following INDICES_TO_INCLUDE constant.
l
INDICES_TO_INCLUDE: This constant determines what index values are included in the export file. In
the resulting script, you can enter the name of the index value(s) between quotation marks, and separate
each index value with a comma. If you leave this array blank, no indices are included.
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
Hyland OnBase Configuration - Options
l
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
PaperVision® Capture Administration Guide
315
Chapter 12 Custom Code Configuration
images are exported.
l
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported, and is set to False by default.
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders that have not reached the maximum file size.
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection, <None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server box blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this box, and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
PaperVision® Capture Administration Guide
316
Chapter 12 Custom Code Configuration
Image Only
The Image Only export creates image files that are named after a specific index field. Any subdirectories
containing those image files are named after other index fields (optional). Single-page image file formats are
named with an "-X" at the end of the file name where "X" denotes the page number.
To configure the Image Only export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click Image Only. The Image Only
Configuration dialog box appears.
Image Only Configuration
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
3. Assign the appropriate properties on the General, Indexes, OCR, and Options tabs. Descriptions for
constant values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
PaperVision® Capture Administration Guide
317
Chapter 12 Custom Code Configuration
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture).
l
l
IMAGE_DELIMITER: This constant specifies the character that will separate the image file name if
multiple index values are combined to create the image file name.
WRITE_DUPLICATES_TO_EXCEPTION_FOLDER: If duplicate files are created in the same directory
during the export and this value is set to False, PaperVision Capture will not copy the duplicate files into the
EXCEPTION_FOLDER directory. If this value is set to True, duplicate files are placed in the
EXCEPTION_FOLDER instead.
NOTE: Files in the EXCEPTION_FOLDER directory display with "_#" appended to the file name,
where "#" is a unique incrementing number starting with "1." This appending process prevents the
exception files from being overwritten in the directory.
l
l
l
EXCEPTION_FOLDER: If the WRITE_DUPLICATES_TO_EXCEPTION_FOLDER value is set to True,
and multiple images with the same file name are created in the same directory, duplicates will be placed in
this folder at the ROOT_PATH instead of overwriting the existing file of that name.
DEFAULT_VALUE: As the export script executes, invalid characters are stripped from index fields,
possibly resulting in blank fields. By default, the resulting DEFAULT_VALUE for these blank fields is
defined as "UNKNOWN."
MAX_EXPORT-SIZE: This constant indicates the maximum export file size in megabytes. The default
value is 600.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next
to the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To
change the order in which the indexes appear,select the index you want to move, and then click Move Up or
Move Down.
Image Only Configuration - Indexes
PaperVision® Capture Administration Guide
318
Chapter 12 Custom Code Configuration
To edit the indexes in the resulting export script, you can modify the following constants.
l
IMAGE_INDICES: Images created during the export are named based on the index fields mapped in the
IMAGE_INDICES field. If multiple index fields are mapped, the specified IMAGE_DELIMITER value is
used to separate the fields in the name of the file. If no fields are mapped, a standard eight-digit
incrementing file name is used.
NOTE: Image file names are pulled from a single index field configured in the IMAGE_INDICES field.
Any subdirectories are also configured similarly. Index fields should not contain characters that create
invalid file or directory names.
l
FOLDER_INDICES: Images created during the export are placed in named folders based on the
FOLDER_INDICES. The first mapped field will match the first folder, the second mapped field will match
the name of the subfolder, and so on. If no fields are mapped, the images are placed directly in the ROOT_
PATH.
OCR
When you configure the properties on the OCR tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
Image Only Configuration - OCR
l
l
l
OCR_ENGINE: This constant specifies the OCR engine (Nuance or Open Text) that processes OCR data
for the export.
OCR_CONVERTER_CODE: This constant specifies the OCR converter code, such as PDF, Text, XML,
etc., whose output format is used to export full-text data. When no value is defined (the default setting),
both images and associated full-text data are exported.
OCR_JOB_STEP_NAME: This constant specifies the job step whose full-text data are used for the
export. No value is defined by default, so full-text data from the current job step are used for the export.
PaperVision® Capture Administration Guide
319
Chapter 12 Custom Code Configuration
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
Image Only Configuration - Options
l
l
l
CREATE_MULTI_PAGE_IMAGE: Used in conjunction with CONVERSION_TYPE, this constant
determines whether exported images are single-page or multi-page.
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
PaperVision® Capture Administration Guide
320
Chapter 12 Custom Code Configuration
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
CONVERSION_TYPE: This constant determines the type of image file created during the export. The
default value, CVT_NO_CONVERSION, does not convert images during the export. If exporting to a
format that supports both single and multi-page images, you must set the CREATE_MULTI_PAGE_
IMAGE constant to True if you want to create multi-page images; otherwise single page images will result.
For example, if you set this to CVT_TIFF_G4_MEDJPG, a TIFF image is created during the export. If the
source image is binary, it will create a TIFF using Group 4 compression; if the source image is color (JPG or
BMP), it will create a TIFF using Medium JPEG compression. (For file types you can use for conversion
during the export process, see "Enumerations" on page 291 for more information.)
FILE_EXTENSION: This constant determines whether the file extension or page number will be assigned
to the file type created during the export. You can choose one of the following options.
Regular:
l
This option uses the original file extension (for example, .tif, .jpg, etc.).
PageNumberStartingZero:
l
This option uses the page number for the file extension, starting with 0 (for
example,.0, .1, and so on).
PageNumberStartingOne:
l
This option uses the page number for the file extension, starting with 1 (for
example, .1, .2, and so on).
PageNumberStartingZeroWithPadding:
l
This option uses the page number for the file extension, starting
with 000 (for example, .000, .001,and so on).
PageNumberStartingOneWithPadding:
l
This option uses the page number for the file extension, starting
with 001 (for example, .001, .002, and so on).
l
l
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection, <None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server box blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this box, and values that you enter are not casesensitive.
PaperVision® Capture Administration Guide
321
Chapter 12 Custom Code Configuration
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
ImageSilo/PVE XML
The ImageSilo/PVE XML export creates an export that can be used to import batches into ImageSilo or
PaperVision Enterprise.
To configure the ImageSilo/PVE XML export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click ImageSilo/PVE XML. The
ImageSilo/PVE XML Configuration dialog box appears.
ImageSilo/PVE XML Configuration - General
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
PaperVision® Capture Administration Guide
322
Chapter 12 Custom Code Configuration
3. Assign the appropriate properties on the General, Indexes, OCR, Options and FTP tabs. Descriptions for
constant values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture).
l
l
l
l
l
l
l
COMPANY_NAME: This constant is the name of your company or department and has a blank default
value. The Company Name is required.
COMPANY_ID: This constant is the ID of your company or department. The default value is set to the
identifier, "yymmddhhnnssms".
INITIAL_DATA_GROUP_NUMBER: This constant represents the initial Data Group number used by
ImageSilo or PaperVision Enterprise. The default value is 1.
PROJECT_NAME: This constant indicates the name of your project. The default value is set to Project
Name.
PV_FOLDER_ROOT_PATH: This constant specifies the root path containing all folders (used in the
Folder view in ImageSilo or PaperVision Enterprise). Type the root path between the quotes (for example,
C:\\Exports\\PVEXml\\FolderRootPath\\).
DOCUMENT_MAX_PER_DATAGROUP: This constant indicates the maximum number of documents
per data group. The default value is 1000, which is the recommended value for XML files.
MAX_EXPORT-SIZE: This constant indicates the maximum export file size in megabytes. The default
value is 600.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next to
the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To change
the order in which the indexes appear,select the index you want to move, and then click Move Up or Move Down.
PaperVision® Capture Administration Guide
323
Chapter 12 Custom Code Configuration
ImageSilo/PVE XML Configuration - Indexes
To edit the indexes in the resulting export script, you can modify the following constants.
l
l
INDICES_TO_INCLUDE: This constant determines the index values included in the export file. To include
all indices, leave the array blank.
PV_FOLDER_INDICES: This constant determines the index value(s) representing each folder (used in the
Folder view in ImageSilo or PaperVision Enterprise). If you leave the array blank, no index values will be
included.
OCR
When you configure the properties on the OCR tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
PaperVision® Capture Administration Guide
324
Chapter 12 Custom Code Configuration
ImageSilo/PVE XML Configuration - OCR
l
l
l
OCR_ENGINE: This constant specifies the OCR engine (Nuance or Open Text) that processes OCR data
for the export.
OCR_CONVERTER_CODE: This constant specifies the OCR converter code, such as PDF, Text, XML,
etc., whose output format is used to export full-text data. When no value is defined (the default setting),
both images and associated full-text data are exported. If you select the PaperVision Full-Text OCR
converter, only full-text data will be exported (associated images will not be exported).
OCR_JOB_STEP_NAME: This constant specifies the job step whose full-text data are used for the
export. No value is defined by default, so full-text data from the current job step are used for the export.
PaperVision® Capture Administration Guide
325
Chapter 12 Custom Code Configuration
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
ImageSilo/PVE XML Configuration - Options
l
l
l
CREATE_MULTI_PAGE_IMAGE: Used in conjunction with CONVERSION_TYPE, this constant
determines whether exported images are single-page or multi-page.
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
PaperVision® Capture Administration Guide
326
Chapter 12 Custom Code Configuration
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
l
l
CREATE_SUBMIT_FILE: Enable this option to automatically generate a DATAGRP.SUBMIT file. If you
are importing the data group into PaperVision Enterprise via a Monitored Import Path or via Data Transfer
Manager, this file is required before the import can run in ImageSilo or PaperVision Enterprise.
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
CONVERSION_TYPE: This constant determines the type of image file created during the export. The
default value, CVT_NO_CONVERSION, does not convert images during the export. If exporting to a
format that supports both single and multi-page images, you must set the CREATE_MULTI_PAGE_
IMAGE constant to True if you want to create multi-page images; otherwise single page images will result.
For example, if you set this to CVT_TIFF_G4_MEDJPG, a TIFF image is created during the export. If the
source image is binary, it will create a TIFF using Group 4 compression; if the source image is color (JPG or
BMP), it will create a TIFF using Medium JPEG compression. (See "Enumerations" on page 291 for more
information.)
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection,<None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
l
EXCLUSIVE_EXPORT: This constant determines whether to create separate folders for multiple exports
that are processed simultaneously. When set to True, the default setting, only one export will be processed
at a time in the ROOT_PATH location. If two or more exports access the same ROOT_PATH location, an
error message will appear in the Windows Event Viewer, indicating the export folder is already in use.
PaperVision® Capture Administration Guide
327
Chapter 12 Custom Code Configuration
IMPORTANT!
l
l
l
If you set the former EXCLUSIVE_EXPORT constant to True in PaperVision Capture R72 and earlier:
If you will regenerate an export script in R73 or later, you must specify the automation server when you
configure the export.
If you will use an export script from R72 or earlier and you will not regenerate the script in R73 or later, it is
not required to specify the automation server.
FTP
The FTP tab contains settings that let you securely transfer data to an FTP site. You can transfer data files in
their original state, or they can be placed in a compressed package file. When you configure the properties on
the FTP tab, you can modify constant values that appear in the resulting export script. Descriptions for each
constant value follow. To make the options on the FTP tab available, you must select the Enable FTP check
box on the bottom of the tab.
ImageSilo/PVE XML Configuration - FTP
l
l
FTP_HOST: This constant specifies the FTP host site name used for the export.
FTP_PORT: This constant specifies the command port number that will be used to connect to the remote
FTP server. FTP communications are typically initiated on port 21.
PaperVision® Capture Administration Guide
328
Chapter 12 Custom Code Configuration
l
l
FTP_CONNECTION: This constant specifies the type of connection that will be created. During an active
connection, the remote FTP server specifies the data port number that will be used. During a passive
connection, PaperVision Capture specifies the data port number that will be used.
FTP_ENCRYPTION: This export supports fully encrypted FTP communications using SSL (also known as
FTPS). The remote FTP server must also support this feature to take advantage of the export's capabilities.
You can select one of the following from the SSL Mode list.
l
l
l
l
l
l
l
Automatic indicates the server will use SSL encryption, but will attempt to automatically
determine whether to use Implicit or Explicit SSL.
Implicit indicates the SSL negotiation will start immediately after the FTP connection is
established.
Explicit indicates the connection will be established in plain text and then explicitly starts the
SSL negotiation.
None (no SSL encryption) indicates a standard FTP, non-encrypted session connection will be
used.
FTP_USERNAME: This constant specifies the user name that will be used to authenticate to the remote
FTP server.
FTP_PASSWORD: This constant specifies the password that will be used to authenticate to the remote
FTP server. If desired, you can expose the password in the Script Editor by inserting the tilde character (~)
prefix before the password (for example, ~password).
FTP_PATH: This constant specifies the folder name on the FTP site that stores the exported data. By
default, this field is blank, and will write data to the user's home directory as specified by the FTP server.
For example, other possible paths include the following:
1. / (root)
2. FolderA (subdirectory under home directory)
3. /FolderA (subfolder under root path)
l
FTP_COMPARE_LAST_MODIFIED_DATE: For an operation type related to data groups or package files,
the agent will automatically record the last modified date of the file that is being processed. When the same
job is processed (and potentially the same file), the last modified date of the previous run is compared to the
current, last modified date. If the file has not changed, it will not be processed again.
For data group processing, this will also allow users to perform incremental data group processing. After the
data group has been changed, any data group files (that is, images) that have a modified date/time greater
than or equal to the previous run's database (that is, DATAGRP.MDB or DATAGRP.XML) last modified
date/time will be processed.
l
l
FTP_DELETE_SOURCE_AFTER_EXPORT: Once the data has been successfully transferred, this
constant allows the agent to delete the source data.
FTP_ENABLE_PACKAGE: When pushing data groups or files to a remote site, you can increase transfer
speed by sending a single, large file rather than hundreds or thousands of small files. This option causes the
agent to create a compressed package file that increases transfer speeds and security (if encryption is
enabled).
PaperVision® Capture Administration Guide
329
Chapter 12 Custom Code Configuration
l
l
l
l
FTP_ENTITY_ID: When the export is configured to create compressed package files, the Entity ID and
Encryption values are placed into the package file to allow the remote PaperFlow system to decrypt the
data. This constant specifies the ID of the remote entity whose encryption key will be used to decrypt the
package file.
FTP_KEY_NAME: This constant specifies the name of the encryption key used to decrypt the package
file.
FTP_PASS_PHRASE: For compressed package files, this constant specifies a user-defined pass phrase
that is passed through a SHA-2 algorithm (Secure Hashing Algorithm) to generate a 256-bit hash.
FTP_ENABLE: This constant specifies whether FTP has been enabled for the export.
Testing FTP Connections
After you have configured the FTP settings, click Test Connection to ensure that the connection is valid. If
you successfully connected to the site, click OK to the Success prompt.
LaserFiche
The LaserFiche export creates an ASCII text file and single-page TIFF images that can be imported into the
LaserFiche system using the LaserFiche List Import Feature.
To configure the LaserFiche export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click LaserFiche. The LaserFiche
Configuration dialog box appears.
LaserFiche Configuration - General
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
PaperVision® Capture Administration Guide
330
Chapter 12 Custom Code Configuration
3. Assign the appropriate properties on the General, Indexes, and Options tabs. Descriptions for constant
values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
REPORTED_ROOT_PATH: The path referenced in the export file originates from this location, not the
ROOT_PATH.
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture). If the Reported Root
Path box is blank, the resulting export script displays a blank value for the REPORTED_ROOT_PATH
constant.
l
l
l
l
l
FOLDER_ID_FIELD_NAME: This field name specifies the index value that populates the FOLDER ID
field in the export.
FOLDER_TITLE_FIELD_NAME: This field name specifies the index value that populates the FOLDER
TITLE field in the export.
DOCUMENT_ID_FIELD_NAME: This field name specifies the index value that populates the
DOCUMENT ID field in the export.
DOCUMENT_TITLE_FIELD_NAME: This field name specifies the index value that populates the
DOCUMENT TITLE field in the export.
MAX_EXPORT-SIZE: This constant indicates the maximum export file size in megabytes. The default
value is 600.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next to
the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To change
the order in which the indexes appear,select the index you want to move, and then click Move Up or Move Down.
PaperVision® Capture Administration Guide
331
Chapter 12 Custom Code Configuration
LaserFiche Configuration - Indexes
To edit the indexes in the resulting export script, you can modify the following INDICES_TO_INCLUDE constant.
l
INDICES_TO_INCLUDE: This constant determines what index values are included in the export file. In
the resulting script, you can enter the name of the index value(s) between quotation marks, and separate
each index value with a comma.
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
LaserFiche Configuration - Options
l
l
TEMPLATE_NAME: This specified value will populate the TEMPLATE NAME field in the export.
EXCLUDE_FOLDER_DOCUMENT_COUNT: When set to True, an incrementing number can be
appended to the FOLDER line of the export. It will increment from 1 to 2, and so on, for each new
document. If set to False, no numbers are appended to the FOLDER line of the export.
PaperVision® Capture Administration Guide
332
Chapter 12 Custom Code Configuration
l
l
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
CONVERSION_TYPE: This constant determines the type of image file created during the export. The
default value, CVT_NO_CONVERSION, does not convert images during the export. If exporting to a
format that supports both single and multi-page images, you must set the CREATE_MULTI_PAGE_
IMAGE constant to True if you want to create multi-page images; otherwise single page images will result.
For example, if you set this to CVT_TIFF_G4_MEDJPG, a TIFF image is created during the export. If the
source image is binary, it will create a TIFF using Group 4 compression; if the source image is color (JPG or
BMP), it will create a TIFF using Medium JPEG compression. (See "Enumerations" on page 291 for more
information.)
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection,<None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
PaperVision® Capture Administration Guide
333
Chapter 12 Custom Code Configuration
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
OTG Record Out
The OTG Record Out export creates a valid OTG Record-Out file and its associated images. This can be
imported into the OTG Application Extender system using the OTG RDS.
NOTE: Ensure that date formats for the PaperVision Capture job correspond with date formats
configured in OTG and that all appropriate index values have been defined.
To configure the OTG Record Out export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click OTG Record Out. The OTG Record Out
Configuration dialog box appears.
OTG Record Out Configuration - General
Default values that you can modify are provided for your reference, and the available options are specific to
the generator you selected. In addition, you can browse to the appropriate directories instead of manually
entering file paths.
3. Assign the appropriate properties on the General, Indexes, and Options tabs. Descriptions for constant
values that appear in the resulting export script follow.
PaperVision® Capture Administration Guide
334
Chapter 12 Custom Code Configuration
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
REPORTED_ROOT_PATH: The path referenced in the export file originates from this location, not the
ROOT_PATH.
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture). If the Reported Root
Path box is blank, the resulting export script displays a blank value for the REPORTED_ROOT_PATH
constant.
l
l
DELIMITER: This constant specifies the character that will delimit index values in the export file.
MAX_EXPORT-SIZE: This constant indicates the maximum export file size in megabytes. The default
value is 600.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next
to the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To
change the order in which the indexes appear,select the index you want to move, and then click Move Up or
Move Down.
OTG Record Out Configuration - Indexes
To edit the indexes in the resulting export script, you can modify the following INDICES_TO_INCLUDE
constant.
l
INDICES_TO_INCLUDE: This constant determines what index values are included in the export file. In
the resulting script, you can enter the name of the index value(s) between quotation marks, and separate
each index value with a comma.
PaperVision® Capture Administration Guide
335
Chapter 12 Custom Code Configuration
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
OTG Record Out Configuration - Options
l
l
l
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
CREATE_RECORD_FILE_ONLY: If set to True, a RECORD.TXT file will be created, but no images will
be created during the export.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
PaperVision® Capture Administration Guide
336
Chapter 12 Custom Code Configuration
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection,<None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
PaperFlow
The PaperFlow export can be used to import batches into PaperFlow, OCRFlow, or QCFlow.
To configure the PaperFlow export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click PaperFlow. The PaperFlow dialog box
appears.
PaperVision® Capture Administration Guide
337
Chapter 12 Custom Code Configuration
PaperFlow Configuration - General
Default values that you can modify are provided for your reference, and the available options are
specific to the generator you selected. In addition, you can browse to the appropriate directories
instead of manually entering file paths.
3. Assign the appropriate properties on the General, Indexes, OCR, Options, and FTP tabs. Descriptions for
constant values that appear in the resulting export script follow.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
ROOT_PATH: This is the location where the exports are created after the automation service processes
the step.
NOTE: If the Root Path box is blank, the export is written to the directory where the application is
installed (for example, C:\Program Files\Digitech Systems\PaperVision Capture).
l
l
l
l
DEPT_ID: This value is uniquely assigned to each client for which the export is generated. The default
value is 0001.
DEPT_NAME: This value is uniquely assigned to each client or department and is a required field. The
default value is blank.
PROJECT_NAME: This value is uniquely assigned to each client or department. The default value is
Project One.
INITIAL_CD_NUMBER: This value can be used to export to a CD. The default value is 1.
PaperVision® Capture Administration Guide
338
Chapter 12 Custom Code Configuration
If you change this value after you have already performed a PaperFlow export, the new value will not be
reflected in exported data groups unless you remove the ”//” comment codes. The ”Reset CD Number?”
code should appear as follows in the export script:
if (!PVUtilities.TrySetCustomCounter(DEPT_ID + "_" + PROJECT_NAME,
INITIAL_CD_NUMBER, out error))
throw (new Exception("Unable to reset custom counter: " + error.Message));
After you remove the comment codes, you must run the export to reset the counter. The next data group
that is created will reflect your new INITIAL_CD_NUMBER value. Lastly, to ensure that new data groups
increment properly from the new INITIAL_CD_NUMBER, you must insert the ”\\” comment codes once
again:
//if (!PVUtilities.TrySetCustomCounter(DEPT_ID + "_" + PROJECT_NAME,
INITIAL_CD_NUMBER, out error))
//throw (new Exception("Unable to reset custom counter: " +
error.Message));
NOTE: You must export to a directory that does not contain existing data groups. Otherwise, the
system will attempt to append to data groups whose maximum size has not been reached, and the
new INITIAL_CD_NUMBER value may be ignored or other unexpected results may occur.
l
MAX_DATAGROUP_SIZE: This indicates the maximum size (in MB) that a data group can reach before a
new data group begins. The default value is 600, the standard CD size.
Indexes
On the Indexes tab, you can specify the indexes that will appear in the export by selecting the check box next
to the index. To include all of the indexes, click Select All. To remove all selections, click Deselect All. To
change the order in which the indexes appear,select the index you want to move, and then click Move Up or
Move Down.
PaperFlow Configuration - Indexes
PaperVision® Capture Administration Guide
339
Chapter 12 Custom Code Configuration
To edit the indexes in the resulting export script, you can modify the following INDICES_TO_INCLUDE
constant.
l
INDICES_TO_INCLUDE: This constant determines what index values are included in the export file. In
the resulting script, you can enter the name of the index value(s) between quotation marks, and separate
each index value with a comma.
OCR
When you configure the properties on the OCR tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
PaperFlow Configuration - OCR
l
OCR_JOB_STEP_NAME: This constant specifies the job step whose full-text data are used for the
export. No value is defined by default, so full-text data from the current job step are used for the export.
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
PaperVision® Capture Administration Guide
340
Chapter 12 Custom Code Configuration
PaperFlow Configuration - Options
l
l
l
l
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
USE_DATAGROUP_NUMBER_IN_EXPORT_FOLDER: When set to True, the parent export directory
will be organized by data group name instead of export number.
INCLUDE_DATAGROUP_IN_FOLDER: When set to True, a folder named "DATAGRP" is created under
the directory in which the export data is copied (for example, <root>\<export#>\DATAGRP\<export data>).
When set to False (the default setting), the "DATAGRP" folder is not created.
USE_EXPORT_COMPLETE_FILE: This constant, set to True by default, generates an "export.complete"
file once an export has reached its maximum file size, so data will no longer be appended to the export.
When set to False, the "export.complete" file is not generated, so data may be appended to export folders
that have not reached their maximum size. If you set this constant to False, for example, and the following
four folders are available under the ROOT_PATH with the MAX_EXPORT_SIZE defined as 600 MB:
1. Folder_1: 600 MB
2. Folder_2: 400 MB
3. Folder_3: 600 MB
4. Folder_4: 100 MB
Since the maximum export size has been reached in Folder_1, Folder_2 will be used as the
export folder, and the "export.complete" file will not be generated.
PaperVision® Capture Administration Guide
341
Chapter 12 Custom Code Configuration
TIP: By default, the lockedPath (working directory) for any export is returned by calling
GetNextLockedPath(). If an export should contain this constant value, the following line in the Script
Editor, which is available to use in all exports, can be changed to: lockedPath =
GetNextLockedpath(root, MAX_EXPORT_SIZE, true).
l
l
l
l
l
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
SUPPORT_MULTIPLE_PROJECTS: When set to True, multiple Department IDs will be exported to the
same folder, creating a single MDB file. When set to False (the default setting), one Department ID will be
exported to a single folder.
DISABLE_APPENDING: This constant is set to False by default. When set to True, exported images will
not be appended to export folders whose maximum file sizes have not been reached.
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection,<None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
l
EXCLUSIVE_EXPORT: This constant determines whether to create separate folders for multiple exports
that are processed simultaneously. When set to True, the default setting, only one export will be processed
at a time in the ROOT_PATH location. If two or more exports access the same ROOT_PATH location, an
error message will appear in the Windows Event Viewer, indicating the export folder is already in use.
IMPORTANT!
If you set the former EXCLUSIVE_EXPORT constant to True in PaperVision Capture R72 and earlier:
l
l
If you will regenerate an export script in R73 or later, you must specify the automation server when you
configure the export.
If you will use an export script from R72 or earlier and you will not regenerate the script in R73 or later, it is
not required to specify the automation server.
PaperVision® Capture Administration Guide
342
Chapter 12 Custom Code Configuration
FTP
The FTP tab contains settings that let you securely transfer data to an FTP site. You can transfer data files in
their original state, or they can be placed in a compressed package file. When you configure the properties on
the FTP tab, you can modify constant values that appear in the resulting export script. Descriptions for each
constant value follow. To make the options on the FTP tab available, you must select the Enable FTP check
box on the bottom of the tab.
PaperFlow Configuration - FTP
l
l
l
l
FTP_HOST: This constant specifies the FTP host site name used for the export.
FTP_PORT: This constant specifies the command port number that will be used to connect to the remote
FTP server. FTP communications are typically initiated on port 21.
FTP_CONNECTION: This constant specifies the type of connection that will be created. During an active
connection, the remote FTP server specifies the data port number that will be used. During a passive
connection, PaperVision Capture specifies the data port number that will be used.
FTP_ENCRYPTION: This export supports fully encrypted FTP communications using SSL (also known as
FTPS). The remote FTP server must also support this feature to take advantage of the export's capabilities.
You can select one of the following from the SSL Mode list.
l
l
Automatic indicates the server will use SSL encryption, but will attempt to automatically
determine whether to use Implicit or Explicit SSL.
Implicit indicates the SSL negotiation will start immediately after the FTP connection is
established.
PaperVision® Capture Administration Guide
343
Chapter 12 Custom Code Configuration
l
l
l
l
l
Explicit indicates the connection will be established in plain text and then explicitly starts the
SSL negotiation.
None (no SSL encryption) indicates a standard FTP, non-encrypted session connection will be
used.
FTP_USERNAME: This constant specifies the user name that will be used to authenticate to the remote
FTP server.
FTP_PASSWORD: This constant specifies the password that will be used to authenticate to the remote
FTP server. If desired, you can expose the password in the Script Editor by inserting the tilde character (~)
prefix before the password (for example, ~password).
FTP_PATH: This constant specifies the folder name on the FTP site that stores the exported data. By
default, this field is blank, and will write data to the user's home directory as specified by the FTP server.
For example, other possible paths include the following:
1. / (root)
2. FolderA (subdirectory under home directory)
3. /FolderA (subfolder under root path)
l
FTP_COMPARE_LAST_MODIFIED_DATE: For an operation type related to data groups or package files,
the agent will automatically record the last modified date of the file that is being processed. When the same
job is processed (and potentially the same file), the last modified date of the previous run is compared to the
current, last modified date. If the file has not changed, it will not be processed again.
For data group processing, this will also allow users to perform incremental data group processing. After the
data group has been changed, any data group files (that is, images) that have a modified date/time greater
than or equal to the previous run's database (that is, DATAGRP.MDB or DATAGRP.XML) last modified
date/time will be processed.
l
l
l
l
l
l
FTP_DELETE_SOURCE_AFTER_EXPORT: Once the data has been successfully transferred, this
constant allows the agent to delete the source data.
FTP_ENABLE_PACKAGE: When pushing data groups or files to a remote site, you can increase transfer
speed by sending a single, large file rather than hundreds or thousands of small files. This option causes the
agent to create a compressed package file that increases transfer speeds and security (if encryption is
enabled).
FTP_ENTITY_ID: When the export is configured to create compressed package files, the Entity ID and
Encryption values are placed into the package file to allow the remote PaperFlow system to decrypt the
data. This constant specifies the ID of the remote entity whose encryption key will be used to decrypt the
package file.
FTP_KEY_NAME: This constant specifies the name of the encryption key used to decrypt the package
file.
FTP_PASS_PHRASE: For compressed package files, this constant specifies a user-defined pass phrase
that is passed through a SHA-2 algorithm (Secure Hashing Algorithm) to generate a 256-bit hash.
FTP_ENABLE: This constant specifies whether FTP has been enabled for the export.
PaperVision® Capture Administration Guide
344
Chapter 12 Custom Code Configuration
Testing FTP Connections
After you have configured the FTP settings, click TestConnection to ensure that the connection is valid. If you
successfully connected to the site, click OK to the Success prompt.
SharePoint
The SharePoint export creates a file that can be used to import PaperVision Capture data into a Microsoft®
SharePoint® site.
NOTE: Only Microsoft SharePoint 2007 (on Windows Server 2003 or 2008) or Microsoft SharePoint 2010
(on Windows Server 2008) are supported for this export.
To configure the SharePoint export
1. If the Select Custom Code Generator dialog box is not open, complete the procedure under "Export
Definitions" on page 307.
2. In the Select Custom Code Generator dialog box, double-click SharePoint. The SharePoint
Configuration dialog box appears.
SharePoint Configuration - General
3. You must configure all properties on the General tab. Descriptions for each property follow this procedure.
4. Proceed to the Indexes tab. If you entered valid SharePoint data, you can map PaperVision Capture index
field names to SharePoint columns.
PaperVision® Capture Administration Guide
345
Chapter 12 Custom Code Configuration
NOTE: An error message will inform you when you have entered invalid SharePoint data.
5. If applicable, map the appropriate index field names to SharePoint columns. See "Indexes" on page 346 for
more information.
6. Proceed to the OCR and Options tabs to modify the appropriate properties. See "OCR" on page 348 and
"Options" on page 348 for information about each property.
General
When you configure the properties on the General tab, the following constant values appear in the resulting
export script.
l
l
l
l
SHAREPOINT_BASE_URL: This constant specifies the Microsoft SharePoint host site name and port
used for the export.
SHAREPOINT_USERNAME: This constant specifies the Microsoft SharePoint user name.
SHAREPOINT_PASSWORD: This constant specifies the Microsoft SharePoint user's password. By
default, the SharePoint password is encrypted in the Script Editor. If desired, you can expose the
password in the Script Editor by inserting the tilde character (~) prefix before the password, for example,
~password.
SHAREPOINT_DOMAIN: This constant specifies the Microsoft SharePoint domain name.
NOTE: If you select the Use Authenticated User option, the database connection will use Windows
Authentication credentials. Entering a user name and password for the database will supersede the
Windows Authentication credentials.
l
If you select the SharePoint Online option, it enables your hosted SharePoint solution.
l
SHAREPOINT_LIBRARY: This constant specifies the Microsoft SharePoint library.
l
l
l
CONTENT_TYPE: If applicable, select the SharePoint content type. If content types have been created in
the SharePoint library, they will appear in this list. See "SharePoint Content Types" on page 350 for more
information.
ROOT_PATH: This is the location on your SharePoint Server where the folders will be created once the
automation service processes the step. If you do not specify a value for the ROOT_PATH property, no
folders will be created on the SharePoint Server.
LOCAL_TEMP_FOLDER:This constant specifies the local folder path where the Microsoft SharePoint
export is temporarily stored on your local machine prior to moving to the Microsoft SharePoint site.
Indexes
On the Indexes tab, you can map PaperVision Capture index field names to SharePoint column names.
PaperVision Capture index field names appear in the left column. From the SharePoint Column list, select
the column name that maps to the PaperVision Capture index field name. To automatically map a
PaperVision Capture index field to a similarly-named Microsoft SharePoint column, click Auto Map. To edit
the indexes in the resulting export script, you can modify the INDICES_TO_INCLUDE constant, which is
described below.
PaperVision® Capture Administration Guide
346
Chapter 12 Custom Code Configuration
NOTE: Some PaperVision Capture index field types may not be supported in Microsoft SharePoint.
Therefore, some index fields may not be mapped to SharePoint columns in the export.
Alternatively, if a SharePoint column does not exist, you can create a new column that will be mapped to the
corresponding index field. To do this, select <Create New> from the SharePointColumn list.
SharePoint Configuration - Indexes
l
INDICES_TO_INCLUDE: This constant determines the index values mapped from PaperVision Capture
to Microsoft SharePoint columns. These columns must already be defined in your Microsoft SharePoint list.
To provide a mapping between fields, the following format is required:
<Capture Field>:<SharePoint>
Example 1: "Field1", "Field 2", "Field 3", etc.
NOTE: This format can be used when the same field names exist in both PaperVision Capture
and your Microsoft SharePoint site.
Example 2: "Field1:Field1", "Field2:Field2:", etc.
NOTE: This constant is optional, so when an empty array is assigned to INDICES_TO_
INCLUDE, Microsoft SharePoint's metadata is not populated.
PaperVision® Capture Administration Guide
347
Chapter 12 Custom Code Configuration
OCR
When you configure the properties on the OCR tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
SharePoint Configuration - OCR
l
l
l
OCR_ENGINE: This constant specifies the OCR engine (Nuance or Open Text) that processes OCR data
for the export.
OCR_CONVERTER_CODE: This constant specifies the OCR converter code, such as PDF, Text, XML,
etc., whose output format is used to export full-text data. When no value is defined (the default setting),
both images and associated full-text data are exported. If you select the PaperVision Full-Text OCR
converter, only full-text data will be exported (associated images will not be exported).
OCR_JOB_STEP_NAME: This constant specifies the job step whose full-text data are used for the
export. No value is defined by default, so full-text data from the current job step are used for the export.
Options
When you configure properties on the Options tab, you can modify constant values that appear in the
resulting export script. Descriptions for each constant value follow.
PaperVision® Capture Administration Guide
348
Chapter 12 Custom Code Configuration
SharePoint Configuration - Options
l
l
l
l
l
IMG_SRC_PREFER_BITONAL_IMAGES: This constant is applicable to dual-stream scanners and
determines whether to export bitonal or color images. When set to True, which is the default setting, bitonal
images are exported.
DELETE_DOCUMENT_AFTER_EXPORT: This constant specifies whether documents are deleted after
they have been exported (set to False by default).
CONVERSION_TYPE: This constant determines the type of image file created during the export. The
default value, CVT_NO_CONVERSION, does not convert images during the export. If exporting to a
format that supports both single and multi-page images, you must set the CREATE_MULTI_PAGE_
IMAGE constant to True if you want to create multi-page images; otherwise single page images will result.
For example, if you set this to CVT_TIFF_G4_MEDJPG, a TIFF image is created during the export. If the
source image is binary, it will create a TIFF using Group 4 compression; if the source image is color (JPG or
BMP), it will create a TIFF using Medium JPEG compression. (See "Enumerations" on page 291 for more
information.)
IMG_SRC_JOB_STEP_NAME: This constant determines the job step from which images are used for the
export. The default selection,<None>, uses the most recent image prior to exporting. To use images from
another job step, select the name of the step from the Image Source list.
AUTOMATION_SERVER: If you specify an automation server (in the MACHINENAME_INSTANCE
format), your specified server will process exports one at a time in the ROOT_PATH location. When one or
more automation servers are specified, separate folders may be created for multiple exports that are
processed simultaneously.
If you leave the Automation Server field blank during export configuration, all servers will be used to
process the exports. If you are using multiple automation servers, separate each server name with a
comma. You can enter wildcard characters in this field and values that you enter are not casesensitive.
PaperVision® Capture Administration Guide
349
Chapter 12 Custom Code Configuration
NOTE: If you are using multiple automation services and you specify multiple values for the
AUTOMATION_SERVER constant (or, if using multiple automation services and you do not specify a
value for the AUTOMATION_SERVER constant), your exported data may output to multiple folders
(for example, data groups).
SharePoint Content Types
When exporting documents to a SharePoint site, you can optionally link documents to content types. Content
types contain limited subsets of index fields in a SharePoint library. For example, a Financial Documents
SharePoint library can contain three content types including Purchase Orders, Invoices, and Expense Reports.
Each content type can be associated with a specific subset of index fields. Document content types, the
default selection, include all index fields in the library. Content types are independent of file types, so one
content type can be applied to multiple file types, such as Microsoft Word documents, Excel spreadsheets, and
PowerPoint presentations.
For example, Purchase Orders, Invoices, and Expense Reports content types in a Financial Documents library
can be associated with the following PaperVision Capture index fields:
Content
Type
Purchase
Orders
Invoices
Expense
Reports
Check Check Company
PO
Number Date
Name
Number
x
x
PO
Date
x
x
x
x
x
x
x
Invoice Invoice
Number Date
Amount
x
x
x
x
x
x
Information on SharePoint 2007 and 2010 content types, respectively, can be found on the following sites:
http://technet.microsoft.com/en-us/library/cc262735(office.12).aspx
http://technet.microsoft.com/en-us/library/cc262735.aspx
Job Configuration
The following instructions describe how to configure a job that will process a PaperFlow export that can be
used to import batches into PaperFlow, OCRFlow, or QCFlow. The following job contains a Capture,
Indexing, and a Custom Code step with the export that handles index and detail fields.
Configuring a Job to Process a PaperFlow Export
1. After inserting a Capture, Indexing, and Custom Code job step, respectively, onto the workspace of the
Job Definitions window, double-click the Indexing step to display the Properties tab on the left pane.
2. On the Properties tab, expand Indexes.
3. Click Indexes, and then click the ellipsis button
the following.
PaperVision® Capture Administration Guide
to open the Index Configuration dialog box similar to
350
Chapter 12 Custom Code Configuration
Index Configuration
4. In the Index Configuration dialog box, click Add. The Add Index dialog box appears.
5. Select New Index, and then type Check Number in the Field Name box.
6. Click OK.
7. Repeat steps 4 through 6 for the following index fields:
l
Check Date
l
Check Amount
l
Payee
8. Three detail sets will be added to the job. In the Index Configuration dialog box, click Add.
Index Configuration
9. In the Add Index dialog box, select Job Detail Set, and then click OK.
10. On the right pane of the Index Configuration dialog box, click Detail Set, and then click the ellipsis button
to open the Detail Set Configuration dialog box.
PaperVision® Capture Administration Guide
351
Chapter 12 Custom Code Configuration
Detail Set Configuration
11. In the Detail Set Configuration dialog box, click Add.
12. In the Add Index dialog box, select New Index, and then type Invoice Number in the Field Name box.
13. Click OK.
14. Repeat steps 11 through 13 for the following detail fields:
l
Invoice Date
l
Invoice Amount
15. In the Detail Set Configuration dialog box, click OK.
16. In the Index Configuration dialog box, click OK.
NOTE: After you have configured the Indexing step, you must configure a Custom Code step to
create the PaperFlow export. Because detail fields are defined at the job level, indexes and detail fields
must be configured in the Indexing step; otherwise, detail fields will NOT be included when the export
runs.
17. On the workspace of the Job Definitions window, double-click the Custom Code step to display the
Properties tab on the left pane.
18. On the Properties tab, expand Custom Code Events [Step Level].
19. Click Step Executing, and then click the ellipsis button
dialog box.
PaperVision® Capture Administration Guide
to open the Select Custom Code Generator
352
Chapter 12 Custom Code Configuration
Select Custom Code Generator
20. From the Language list, select the C# programming language.
21. Select the PaperFlow custom code generator, and then click OK. The PaperFlow Configuration dialog
box appears.
PaperFlow Configuration
22. Configure the properties on the General tab, configure all required fields. If applicable, proceed to the
Indexes, OCR, Options, and FTP tabs to configure the remaining properties. For more information about
specific properties, see the "PaperFlow" on page 337.
PaperVision® Capture Administration Guide
353
Chapter 12 Custom Code Configuration
23. In the PaperFlow Configuration dialog box, click OK. The script automatically compiles, and the constant
values that you defined will appear in the Script Editor within "quotation marks".
NOTE: Do not remove the quotations from the resulting export script.
24. On the Job Definitions window, assign the appropriate users to the Capture and Indexing steps.
25. On the toolbar, click Activate Job
.
26. On the toolbar, click Check In Job
to check the job into the server and make it available for use in the
Operator Console. The operator can then create and submit batches in the PaperVision Capture Operator
Console, and then the PaperFlow export will automatically process the batch.
PaperVision® Capture Administration Guide
354
Chapter 13 Quality Control (QC)
The Automated QC job step provides automated quality control operations on batches, documents, pages,
and indexes without requiring user intervention in the PaperVision Capture Operator Console. The Manual QC
job step allows operators to manually tag batches, documents, pages, and index fields for further review in
the PaperVision Capture Operator Console. The Allow Manual QC property in the Capture and Indexing job
steps, enables operators to tag batches, documents, pages, and indexes for further review while scanning or
indexing. Enabling this property within a Capture or Indexing step consumes a Capture QC Manual license in
addition to the Capture Scan or Capture Index license.
QC batch statistics provide totals for tagged index values, pages, and documents per batch. Batch statistics
also provide the total number of tags and record how many of each tag type were applied. Additionally, the
total amount of time the operator spent in the QC step is also recorded. Automated statistics are recorded by
the PaperVision Capture Automation Server when the Automated QC step is executed.
See "Automated Quality Control (QC)" on page 355 and "Manual Quality Control (QC)" on page 362 for more
information.
Automated Quality Control (QC)
PaperVision Capture’s Automated Quality Control (QC) job step provides automated functionality for quality
control operations on batches, documents, images, and indexes, eliminating the need for user input in the
Operator Console. The Automated QC step can greatly enhance QC accuracy and productivity for your
batches and jobs. When an Automated QC step is used in a job, a Capture QC Auto license is consumed
upon image capture (in the Capture step). In addition, automated QC statistics are recorded by the
PaperVision Capture Automation Server when the Automated QC step is executed.
To view the Automated QC properties
1. In Job Definitions, select the Automated QC step.
2. Expand the properties grid, and then expand the Automated QC and General nodes. See "Setting
Common Job Step Properties" on page 53for information about General properties for the Automated QC
job step.
Adding Pass and Fail Links
When you configure a Manual or Automated QC step, you can define pass and fail links from each QC step.
Pass and fail links define the action taken after an operator completes a Manual QC step in the Operator
Console or when the Automated QC step finishes executing all automated tasks. If one or more QC tags
were added to a batch, document, image, or index, then that batch fails the QC step and proceeds to the fail
step upon batch submission. If no QC tags were added to the batch, document, image, or index, then a QC
step passes and proceeds to the pass step.
NOTE: It is not required to define a pass or fail link from a QC step. When using pass and fail links,
however, the job can only contain a single end step.
PaperVision® Capture Administration Guide
355
Chapter 13 Quality Control (QC)
For example, in a job containing a Capture, Image Processing, Manual QC, and an Indexing step,
respectively, you can add a fail link from a Manual QC step that connects to a preceding Capture step if an
operator tags an image to be re-scanned. Then, you can add a pass link to a subsequent Indexing step if an
operator does not tag any images in the batch.
Pass and Fail Links to/from a Manual QC Step
To add a pass link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the subsequent job step if the QC step passes.
3. Click the Add Pass Link
icon.
To remove a pass link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the job step to which the QC pass link is connected.
3. Click the Remove Pass Link
icon.
To add a fail link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the subsequent job step if the QC step fails.
3. Click the Add Fail Link
icon.
To remove a fail link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the job step to which the QC fail link is connected.
PaperVision® Capture Administration Guide
356
Chapter 13 Quality Control (QC)
3. Click the Remove Fail Link
icon.
NOTE: QC fail links are not required prior to activating and checking in the job.
Automated QC - Order of Operations
When the Automated QC step executes, the following operations are performed in the following order on each
page, document, index, and batch.
1. For each page within a document, the Automated QC step performs the following automated operations:
2. The DocumentPageCount operation verifies that the document page count falls within the specified
parameters. The following automated operations are performed on each index field (in order):
a. Invalid Image Path: Ensures a valid image path can be located.
b. Invalid Image: Ensures the image can be opened successfully.
c. ImageDimensions: Verifies that image dimensions fall within the specified parameters (in pixels).
d. ImageFile Size: Verifies that image file size falls within specified parameters (in kilobytes).
3. Check for Indexing Errors operation locates indexing errors resulting from the following configured
properties (in order):
l
Index Type
l
Index Verification Regular Expression
l
Verification Search Strings
l
Predefined Values
4. The Check Numeric Sequence operation ensures the minimum and maximum numeric values (only for
numeric index types) exist within a batch. This operation then iterates between all documents to ensure all
possible values (between minimum and maximum values) exist within that batch. If values do not fall within
the specified range, missing ranges are written out to batch-level tags.
5. Lastly, the Batch Document Count operation verifies the batch document count falls within specified
parameters.
Automated Batch and Document QC Operations
You can configure the Automated QC job step to execute specific automated operations on each batch and
document. For example, you can configure the Automated QC step to ensure that each batch contains a
minimum of one document and a maximum of five documents. You can also configure the Automated QC
step to ensure that no more than ten pages comprise each document.
Batch Document Count
1. Click the ellipsis button next to the Batch Document Count field. The Batch Document Count dialog
box appears.
PaperVision® Capture Administration Guide
357
Chapter 13 Quality Control (QC)
2. To enforce a minimum document count, select the Minimum check box, and then enter the value.
3. To enforce a maximum document count, select the Maximum check box, and then enter the value.
4. Click OK.
Document Page Count
You can configure the Automated QC step to ensure that each document contains a minimum and/or
maximum number of pages. If a document’s page count falls outside of a specified range, it is tagged for
review in the Operator Console.
To configure the minimum and maximum document page count
1. Click the ellipsis button next to the Document Page Count field. The Document Page Count dialog box
appears.
2. To enforce a minimum document count, select the Minimum check box, and then enter the value.
3. To enforce a maximum document count, select the Maximum check box, and then enter the value.
4. Click OK.
NOTE: As a final verification, the Automated QC step ensures the document page count falls within
range, since pages may have been removed as a result of automated image operations. If the document
page count falls outside this range, the document is tagged for review.
Automated Image QC Operations
In addition to the batch and document automated operations, you can also configure the Automated QC job
step to execute automated operations on each image. The following operations can be performed on each
image within a document, and the image can be either deleted or tagged for review in the Operator Console.
Image Dimensions
The Image Dimensions operation ensures that each image falls within a specified height and/or width (in
pixels). If an image’s dimensions do not fall within range, it can be deleted or tagged for review in the Operator
Console. To calculate the approximate dimensions of an image in pixels, multiply the original size of the
image (in inches) by the resolution of the scanned image. For example, an 8.5 x 11 inch page that is scanned
at 200 DPI would be approximately 1700 pixels wide x 2200 pixels high.
To configure the image dimensions for the Automated QC step
1. Click the ellipsis button next to the Image Dimensions field. The Image Dimensions dialog box appears.
2. In the Action box, select the action (Tag or Delete) to be taken if the image dimensions fall outside the
specified range.
3. To specify a minimum and maximum width, select the appropriate check boxes, and then enter the value in
pixels.
PaperVision® Capture Administration Guide
358
Chapter 13 Quality Control (QC)
4. To specify a minimum and maximum height, select the appropriate check boxes, and then enter the value in
pixels.
5. Click OK.
Image Dimensions
The Image File Size operation ensures that the file size falls within your specified parameters (in kilobytes). If
an image does not fall within range, it can be deleted or tagged for review in the Operator Console.
To configure the image file size range for the Automated QC step
1. Click the ellipsis button next to the Image File Size field. The Image File Size dialog box appears.
2. In the Action box, select the action (Tag or Delete) to be taken if the image file size falls outside your
specified range.
3. To specify a minimum file size, select the check box, and then enter the value in kilobytes.
4. To specify a maximum file size, select the check box, and then enter the value in kilobytes.
5. Click OK.
Invalid Image
The Invalid Image operation verifies that each image can be opened successfully. To enable this operation,
select the action (Delete Page or Tag Page) to be executed if the image cannot be opened in PaperVision
Capture.
Invalid Image Path
The Invalid Image Path operation ensures that each image path can be located. To enable this operation,
select the action (Delete Page or Tag Page) to be executed if the image path cannot be found.
Prefer Bitonal
When only using dual stream scanners, set this property to True.
Automated Indexing Operations
You can add new indexes and configure automated operations for each. General QC properties specific to the
Automated QC step are described below.
To configure automated indexing operations in the Automated QC step:
1. In the Properties grid under the Automated QC node, click the ellipsis button next to the Indexes field.
The Index Configuration dialog box appears.
PaperVision® Capture Administration Guide
359
Chapter 13 Quality Control (QC)
Indexing Configuration (General QC – Step Level)
2. Click Add, and then enter a name for the new index. Configure the applicable General (Job Level),
General (Step Level), and Predefined Index Values (Job Level) properties.
l
For information on the general job level properties, see the Index Configuration - Job Level topic.
l
For information on the general step level properties, see the Index Configuration - Step Level topic.
l
For information on configuring predefined index values, see the Predefined Index Values topic.
3. Expand the General QC (Step Level) node.
4. Select one or multiple automated QC operations that will be performed on each index field:
l
l
l
Check for Indexing Errors checks for indexing errors in each index field. If an indexing error is found
(e.g., blank field, invalid character or number, etc.), the index field is tagged for review. Select True to
enable this operation.
Check Numeric Sequence checks for the minimum and maximum numeric index values within the
batch (applicable to numeric index field types). The process then iterates between all documents to
ensure all index values (between the specified range) exist within the batch. Missing index values are
written out to batch-level tags. Select True to enable this operation.
QC Index Formatting automatically inserts or removes leading or trailing characters to create index
values of a specific length. Additionally, this operation can automatically execute a search for an index
value and replace it with specific characters.
PaperVision® Capture Administration Guide
360
Chapter 13 Quality Control (QC)
QC Index Formatting
To remove a certain number of characters from an index value, select Remove Characters. To remove
characters at the beginning or end of an index value, select Leading Characters or Trailing Characters,
respectively. In either scenario, enter the number of characters to remove from the index value.
NOTE: You can remove both leading and trailing characters during the QC Index Formatting operation.
To insert a certain number of characters at the beginning of an index value, select Insert Characters. To
insert characters at the end of the index value, select the Trailing Characters check box. In either scenario,
enter the number of characters the resulting index value should contain in the Length field, and then enter the
replacement character in the Character field.
The search operation automatically searches for any portion of the index value containing the specified text.
For example, searching for “Test” in index values “123Test,” Test123,” and “123Test123” will replace the word
“Test” with your specified replacement text. Optionally, you can select whether the Search and Replace
operation is case-sensitive (by default, this operation is case-insensitive).
When the Search For field is left blank, blank index fields will be replaced with your Replace With text.
When the Replace With field is left blank, any occurrences of the Search For text will be removed from the
index field. If you specify the Search For text as an asterisk (*), all values (indexed or blank) will be
substituted with your replacement text.
PaperVision® Capture Administration Guide
361
Chapter 13 Quality Control (QC)
To ensure leading or trailing characters appear correctly in the resulting index value, enter a sample index
value in the Input field and the result appears in the Result field.
l
Reformat Index Values automatically re-formats specific index values (dates, currency, etc.) and
performs index masking.
Manual Quality Control (QC)
The Manual QC step enables an operator to manually tag batches, documents, pages, and index fields for
further review in the Operator Console. A second operator can then repair, re-scan, re-index, etc., in
subsequent steps that you configure. Optionally, you can define pass and fail links from the Automated or
Manual QC step to a previous or subsequent step. A batch with no remaining QC tags results in a passed QC
step, and a batch containing one or more tags results in a failed QC step.
The “Allow Manual QC” property in the manual Capture and Indexing steps allows operators to tag batches,
documents, pages, and indexes for further review while they scan or hand-key index. If you enable this
property within a Capture or Indexing step, a Capture QC Manual license is also required (in addition to the
Capture Scan or Capture Index license).
QC batch statistics provide totals for tagged index values, pages, and documents per batch. Batch Statistics
also provide the total number of tags and record how many of each tag type were applied. Additionally, the
total amount of time the operator spent in the QC step is also recorded. See "Batch Statistics" on page 415
for descriptions of each statistic.
Adding Custom QC Tags
You can define custom QC tags that will be available for selection when operators inspect batches,
documents, pages, and index fields in the Operator Console. The following predefined tags are available in
the Manual QC step (also in a Capture or Indexing step with the “Allow Manual QC” property enabled).
l
l
Document Count: Indicates that the document count falls outside the specified range
Index Sequence: Indicates that one or more numeric index values fall outside the specified minimum and
maximum values
l
Document Page Count: Indicates that a document page count falls outside the specified range
l
Document Re-Scan: Indicates that a document needs to be scanned once again
l
Index Error: Indicates that an indexing error exists
l
Re-Index: Indicates that a specific index field needs to be indexed once again
l
Bad Image: Indicates that an image cannot be opened
l
Bad Image Path: Indicates that an image cannot be located
l
Image Dimensions: Indicates that an image falls outside the specified height and width parameters
l
Image File Size: Indicates that an image size falls outside the specified range
l
Page Re-Scan: Indicates that the page needs to be scanned once again
To add custom QC tags to the job
1. In the job's General Properties grid, click the ellipsis button next to the Custom QC Tags row. The
Custom QC Tags dialog box appears.
PaperVision® Capture Administration Guide
362
Chapter 13 Quality Control (QC)
Custom QC Tags
NOTE: Predefined Tags are provided for informational purposes. All predefined tags are available
for selection when operators add QC tags in the Manual QC step.
2. You can add new custom QC tags and view the predefined tags available in each category. Custom QC
tags that you define will be available for selection when operators tag batches, documents, images, and
indexes in the Manual QC step. In the Custom QC Tags section, click the Add
icon.
3. Enter the name of the custom QC tag.
4. To remove a custom tag, highlight one or more tags, and then click the Remove icon.
5. Click OK.
Adding Pass and Fail Links
When you configure a Manual or Automated QC step, you can define pass and fail links from each QC step.
Pass and fail links define the action taken after an operator completes a Manual QC step in the Operator
Console or when the Automated QC step finishes executing all automated tasks. If one or more QC tags
were added to a batch, document, image, or index, then that batch fails the QC step and proceeds to the fail
step upon batch submission. If no QC tags were added to the batch, document, image, or index, then a QC
step passes and proceeds to the pass step.
NOTE: It is not required to define a pass or fail link from a QC step. When using pass and fail links,
however, the job can only contain a single end step.
PaperVision® Capture Administration Guide
363
Chapter 13 Quality Control (QC)
For example, in a job containing a Capture, Image Processing, Manual QC, and an Indexing step,
respectively, you can add a fail link from a Manual QC step that connects to a preceding Capture step if an
operator tags an image to be re-scanned. Then, you can add a pass link to a subsequent Indexing step if an
operator does not tag any images in the batch.
Pass and Fail Links to/from a Manual QC Step
To add a pass link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the subsequent job step if the QC step passes.
3. Click the Add Pass Link
icon.
To remove a pass link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the job step to which the QC pass link is connected.
3. Click the Remove Pass Link
icon.
To add a fail link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the subsequent job step if the QC step fails.
3. Click the Add Fail Link
icon.
To remove a fail link from a QC step
1. Select the appropriate Manual (or Automated) QC step.
2. While pressing the Ctrl key, select the job step to which the QC fail link is connected.
PaperVision® Capture Administration Guide
364
Chapter 13 Quality Control (QC)
3. Click the Remove Fail Link
icon.
NOTE: QC fail links are not required prior to activating and checking in the job.
Custom Code Events (Step Level) Properties
You can configure custom code that operators can execute in the PaperVision Capture Operator Console.
Click the ellipsis button next to the appropriate event to select the scripting language and to configure the
custom code. Some events contain code-handling arguments that you can modify; these arguments define
what actions are triggered after an operator executes the custom code (see the Custom Code Configuration
topic's section on Digitech Systems' API for more information).
Batch Opened
Batch Opened executes custom code when the operator opens a batch in the Operator Console. The
following sample is a custom code event handler that can be inserted into the code to display a message box,
allowing the user to cancel the open batch operation:
CCustomCodeBatchOpeningEventArgs eventArgs
= (CCustomCodeBatchOpeningEventArgs)Parameter;
if (MessageBox.Show("Open Batch?", "Capture", MessageBoxButtons.OKCancel, MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelOpen = true;
}
NOTE: The Batch Opened event will not execute if you have enabled the Max Documents per Batch
property and the user completes the Submit and Create New Batch operation.
Batch Submitted
Batch Submitted executes custom code when the operator submits a batch in the Operator Console. The
following sample is a custom code event handler that can be inserted into the code to display a message box,
allowing the operator to cancel the submit batch operation:
CCustomCodeBatchSubmittingEventArgs eventArgs
=(CCustomCodeBatchSubmittingEventArgs)Parameter;
if (MessageBox.Show("Submit Batch?", "Capture", MessageBoxButtons.OKCancel,
MessageBoxIcon.Question)== DialogResult.Cancel)
{
eventArgs.CancelSubmit = true;
}
PaperVision® Capture Administration Guide
365
Chapter 13 Quality Control (QC)
Custom Code Execution
Custom Code Execution executes when the operator clicks the Execute Custom Code button in the
PaperVision Capture Operator Console.
To prevent the programming language prompt from appearing each time you configure custom
code events, right-click the ellipsis button, and select Custom Code Options. Select either the C# or
Visual Basic programming language to use by default, and then choose the option to suppress the dialog
when creating new custom code.
Index
You can configure index values in the Manual QC step. For information on the available Indexing settings,
see the Indexing Configuration topic.
NOTE: The Allow Hand-Key Indexing property is not available in the Manual QC step. Operators
assigned to the Manual QC step can review index values in the read-only Index Manager so they can
apply QC index tags as necessary (without consuming a Capture Index license that is required to edit
indexes).
QC Auto Play
The QC Auto Play setting is available only in the Manual QC step or in manual steps with the Allow Manual
QC property enabled, which requires a Capture QC Manual license. First, you can determine how long (in
milliseconds) each image appears on screen for operators to perform inspections on batches, documents,
pages, and indexes in the Operator Console. Additionally, you can determine whether to skip batches or
documents during auto play. You can further refine batch and document skipping by entering a specific or
random number of documents or pages to skip during auto play.
To configure QC Auto Play settings
1. Click the ellipsis button to the right of the Manual QC Auto Play field.
2. The Delay (ms) property determines how long each image or group of images remains on screen at a time
in the Manual QC step. Enter the length of time in milliseconds.
3. The Skip Mode determines whether auto play skips batches or documents:
l
l
If you select the Batch skip mode, then you can define how pages are skipped. For page skipping, you
can require that operators inspect all pages (None), by page number (Number, such as 1, 5, 10, etc.),
or by a random number of pages (Random).
If you select the Document skip mode, you can define how documents and pages are skipped in the
next steps.
PaperVision® Capture Administration Guide
366
Chapter 13 Quality Control (QC)
4. If you select document skipping, you can require that operators inspect one of the following:
l
All documents (None)
l
By document number (Number, such as 1, 5, 10, etc.)
l
By a random number of documents (Random)
5. If you select page skipping, you can require that operators inspect one of the following:
l
All pages (None)
l
By page number (Number, such as 1, 5, 10, etc.)
l
By a random number of pages (Random)
When you select the Random option, auto play skips an arbitrary number of pages or documents (between
zero and your assigned number). For example, if you enter “10,” then three pages/documents may be skipped
during the first auto play; nine pages/documents during the second auto play; ten pages/documents during
the third auto play; etc.
Operator Permissions
You can assign specific permissions that allow operators to perform operations on documents and pages. In
addition, you can determine whether operators can view the Browse Batch window in the Operator Console.
The Import Images operation (set to False by default) is the only operation that requires an additional Capture
Scan license (in addition to the Capture Index license). The remaining permissions do not require an
additional license and are enabled by default to provide operators the flexibility in manipulating documents
and pages when performing manual QC operations in the Operator Console.
Add Documents - When set to True, the operator can append a blank document to the end of the batch.
Allow Browse Batch - When set to True, the operator can view the Browse Batch window.
Copy Documents - When set to True, the operator can copy all pages and append the new document after
the selected document.
Copy/Move Pages - When set to True, the operator can copy/paste and cut/paste consecutive or nonconsecutive pages in one document or across multiple documents. The operator can also drag and drop
pages from one location to another in the Thumbnails window or multiple-display view.
Delete Documents - When set to True, the operator can delete a document and its associated images.
Delete Pages - When set to True, the operator can delete one or multiple page(s) within one document or
across multiple documents.
Extract and Copy Pages - When set to True, the operator can extract a region of an image and copy it to the
next page of the document.
Import Images - When set to True, the operator can import images into a document.
NOTE: When you enable this property, the Manual QC step also consumes a Capture Scan license (in
addition to the Capture QC Manual license).
PaperVision® Capture Administration Guide
367
Chapter 13 Quality Control (QC)
Insert Document Breaks - When set to True, the operator can insert a document break within a document.
NOTE: When you enable this property, the Manual QC step also consumes a Capture Scan license (in
addition to the Capture QC Manual license).
Invert and Save Pages - When set to True, the operator can invert one or multiple pages’ polarity and then
save the pages.
Remove Document Breaks - When set to True, the operator can remove an existing document break within
a document.
Re-Save Pages - When set to True, the operator can save a page that has been rotated or whose polarity
has been inverted.
Rotate and Save Pages - When set to True, the operator can rotate one or multiple pages and then save the
pages.
Shuffle Documents to Duplex - When set to True, the operator can shuffle documents to duplex.
PaperVision® Capture Administration Guide
368
Chapter 14 Batch Splitting
Using a Batch Splitting step allows a user to divide a batch into two or more operations. The splitting of the
batch occurs when the conditions you define are met.The batch can be split into another step or another job.
For example, you can set up conditions to split a batch to another step when a certain invoice number or
invoice series is detected. If the invoice numbers starting with 1001 need to be processed differently than
those starting with 6232, you can specify that the batch is split when invoices starting with 6232 occur.
NOTE: When a job that contains a Batch Splitting step is exported, any configured Target Jobs or Steps
are removed, and must be reconfigured when the job is subsequently imported back into the Job
Definitions window. (See "Exporting Jobs" on page 63 for more information.)
Configure Batch Splitting
When you use a batch splitting step,a separate step or job can process specific documents. This allows the batch
to continue processing when certain conditions are met rather than having the batch process stop.
To specify a target job for a batch-splitting step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. Add the Batch Splitting step to the job using one of the following methods.
l
Select the job step that you want batch splitting to follow. On the Job Step Toolbox tab, double-click
Batch Splitting.
l
On the Job Step Toolbox tab, drag Batch Splitting
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select Batch Splitting.
7. Double-click the Batch Splitting step to display the Properties tab on the left pane.
8. On the Properties tab, expand Batch Splitting, and then click Target Jobs.
9. Click the ellipsis button
to open the Target Jobs dialog box. This dialog box displays all of the batchsplitting jobs to which you have access.
PaperVision® Capture Administration Guide
369
Chapter 14 Batch Splitting
NOTE: When creating the initial batch splitting step, no jobs appear in the Target Jobs dialog box.
10. To set up or edit configuration items for the target job, do one of the following:
l
l
If this is an initial set up, on the toolbar, click Add Batch Split
a batch-splitting step" on page 370.
. See "To configure the target job for
If you want to edit an existing job, select the job, and then click Edit Batch Split
configure the target job for a batch-splitting step" on page 370.
. See "To
11. To perform other functions for a target job, select the job, and then click a button on the toolbar. A
description for each button follows.
Button
Function
Opens the Target Job Configuration dialog box where you can add a new
target job and step.
Opens the Target Job Configuration dialog box with the configuration
items for the selected target job displayed.
Deletes the selected job.configuration.
Moves the selected job configuration up in priority.
Moves the selected job configuration down in priority.
Toggles test mode on and off.
To configure the target job for a batch-splitting step
1. Complete the procedure under "To specify a target job for a batch-splitting step" on page 369 to access the
Target Job Configuration dialog box.
NOTE: When creating the initial job configuration condition, no conditions appear in the Target Job
Configuration dialog box.
2. From the Target Job list
documents separated by the batch-splitting step.
, select the job that will process the
3. From the Target Step list
, select the step within the target job that
will process the documents separated by the batch-splitting step.
PaperVision® Capture Administration Guide
370
Chapter 14 Batch Splitting
4. Click the Index Mapping arrow
mapping values.
to specify index
Index Mapping
5. Under the Source column, select the index value you want to map.
6. Under the corresponding Target column, select the target index value to which you want the source index
value mapped.
7. Repeat the previous two steps for each index you want to map, and then click OK.
NOTE: It is not necessary to match every Source index value to a Target index value.
8. (Optional) Select the Stop Subsequent Splitting If Condition Met checkbox to stop the batch-splitting
process when this condition is met. This option stops the batch splitting process when more than one job is
configured in the batch splitting conditions. When this condition is met, the rest of the conditions configured
to split the batch to other jobs will be ignored.
9. Click Add New Condition
.
10. In the New Condition dialog box, select one of the following condition sources:
l
l
Capture Index- Select this option to initiate the batch split on index fields configured in the capture job.
Document Classification - Select this option to initiate the batch split on the document classification.
(With Forms Magic,document classification is based on the Forms Magic form.)
PaperVision® Capture Administration Guide
371
Chapter 14 Batch Splitting
l
QC Index Tag - Select this option to initiate a batch split on an index tagged for Quality Control.
l
QC Document Tag - Select this option to initiate a batch split on a document tagged for Quality Control.
l
Forms Magic Metadata Field - Select this option to initiate the batch split using data from Forms Magic.
l
l
Forms Magic Content Type - Select this option to initiate the batch split on a type of content. The Forms
Magic content type can be an invoice, a statement, etc.
Invoice Approval - Select this option to initiate a batch split on the status of an invoice.
11. To configure the condition source you selected, go to "Condition Sources" on page 373.
12. To perform other functions in the Target Job Configuration dialog box, you can use the components
described in the following table.
Target Job Configuration Components
Component
Description
Select the target job for the batch split. The list
contains all your active jobs.
Select the step of the target job to which the
documents separated by the batch split will be
sent.
Match the indexes of the source job to the indexes
of the target job. The Source job is the job that
contains the batch splitting step being configured.
The Target job is the job to which the documents
separated by the batch split are being sent.
This setting stops the batch splitting process when
this condition is met. The other subsequent
conditions will be ignored.
Opens the New Condition dialog box where you
can define conditions.
Opens the Edit Condition dialog box for the
selected condition so you can edit it.
Deletes the selected condition.
Select an operator for the condition:
l
l
l
PaperVision® Capture Administration Guide
AND - This condition and the previous
condition must be met before the batch split
occurs.
OR - This condition or the next condition
must be met before the batch split occurs.
XOR - The batch split will occur if “condition
372
Chapter 14 Batch Splitting
Component
Description
A” is true or “condition B” is true but not
when both conditions are true. If both
conditions are true, the batch split will not
occur.
Sets the batch split to occur when the defined
condition is not present.
Moves the condition into a new group.
Moves the condition down to an existing child
group.
Moves the condition up one level.
Moves a condition up in order among siblings.
Moves a condition down in order among siblings.
Toggles the test mode on and off.
Condition Sources
Condition sources let you specify the type of content to which the condition will apply. The following procedures
describe how to configure each condition source. If you haven’t already done so, complete the procedures under
"To specify a target job for a batch-splitting step" on page 369 and "To configure the target job for a batch-splitting
step" on page 370.
You select which condition source to use in the following New Condition Please Choose Condition Source
dialog box.
New Condition Please Choose Condition Source
PaperVision® Capture Administration Guide
373
Chapter 14 Batch Splitting
Capture Index
The Capture Index option splits the batch when the index you specify is found in the document.
1. In the New Condition Please Choose Condition Source dialog box, select Capture Index, and then
click Next.
2. In the Please Choose Capture Index dialog box, click the list, and then select the index that will initiate
the batch split.
3. Click Next.
4. To continue, go to "Comparison Types" on page 376.
Document Classification
The Document Classification option splits the batch when the document class you specify is found in the
batch.
1. In the New Condition Please Choose Condition Source dialog box, select Document Classification,
and then click Next.
2. To continue, go to "Comparison Types" on page 376.
QC Index Tag
The QC Index Tag option splits the batch when the quality control index tag you specify is found in the
document.
1. In the New Condition Please Choose Condition Source dialog box, select QC Index Tag, and then
click Next.
2. In the Please Choose Capture Index dialog box, click the list, and then select the index that will initiate
the batch split.
3. Click Next.
4. In the Please Choose Comparison Type dialog box, select one of the following options.
l
Tag Exists - Initiates the batch split on the QC Index tag. After you select this option, click Finish. You
have completed the set up of this condition source.
l
String Comparison - Initiates the batch split on an index field containing a specified string.
l
Numeric Comparison - Initiates the batch split on an index field containing a specified numeric value.
l
Date/Time Comparison - Initiates the batch split on an index field containing a specified date/time value.
l
Regular Expression - Initiates the batch split using a regular expression.
5. To continue, go to "Comparison Types" on page 376.
PaperVision® Capture Administration Guide
374
Chapter 14 Batch Splitting
QC Document Tag
The QC Document Tag option splits the batch when the quality control document tag you specify is found in
the batch.
1. In the New Condition Please Choose Condition Source dialog box, select QC Document Tag, and
then click Next.
2. In the Please Choose Comparison Type dialog box, select one of the following options.
l
Tag Exists - Initiates the batch split on the QC document tag. After you select this option, click Finish.
You have completed the set up of this condition source.
l
String Comparison - Initiates the batch split on an index field containing a specified string.
l
Numeric Comparison - Initiates the batch split on an index field containing a specified numeric value.
l
Date/Time Comparison - Initiates the batch split on an index field containing a specified date/time value.
l
Regular Expression - Initiates the batch split using a regular expression.
3. To continue, go to "Comparison Types" on page 376.
Forms Magic Metadata Field
The Forms Magic Metadata Field option splits the batch when the index you specify is found on the Forms
Magic form.
1. In the New Condition Please Choose Condition Source dialog box, select Forms Magic Metadata
Field, and then click Next. The Please Specify FM Metadata Field dialog box appears.
2. If you have already established a connection to the Forms Magic database, continue to step 10 in this
procedure. Otherwise, next to the FM Database Connection box, Click the ellipsis button
Forms Magic Database dialog box.
to open the
3. In the Server box, type the name of the server where the Forms Magic database resides.
4. In the Port box, type the port number to access the server.
5. In the Database box, type the name of the Forms Magic database.
6. In the Username box, type the name of the user to access the Forms Magic database.
7. In the Password box, type the password for the user.
8. Click Test Connection to ensure that the information you provided is valid.If the database connection is
successful, a confirmation message appears. If the connection is not successful, this symbol
next to the data you must fix.
appears
9. After the database connection is established, click OK.
10. In the Please Specify FM Metadata Field dialog box, click the Metadata Field Name list, and then select
an index value that appears on the Forms Magic form.
11. Click Next.
12. To continue, go to "Comparison Types" on page 376.
PaperVision® Capture Administration Guide
375
Chapter 14 Batch Splitting
Forms Magic Content Type
The Forms Magic Content Type option splits the batch when a specified content type is detected from a
batch received from Forms Magic.
1. In the New Condition Please Choose Condition Source dialog box, select Forms Magic Content
Type, and then click Next.
2. To continue, go to "Comparison Types" on page 376.
Invoice Approval
The Invoice Approval option splits the batch when the invoice status you specify is found in the batch.
1. In the New Condition Please Choose Condition Source dialog box, select Invoice Approval, and then
click Next.
2. From the Invoice Approval Status list, select the invoice status that will initiate the batch split, and then
click Finish.
Comparison Types
Comparison types let you specify the criteria used to evaluate each condition source. The procedures in this
section describe how to configure each comparison type. If you haven’t already done so, complete the previous
procedures under:
l
"To specify a target job for a batch-splitting step" on page 369.
l
"To configure the target job for a batch-splitting step" on page 370.
l
"Condition Sources" on page 373.
You select which comparison type to use in the following New Condition Please Choose Comparison Type
dialog box.
New Condition Please Choose Comparison Type
PaperVision® Capture Administration Guide
376
Chapter 14 Batch Splitting
String Comparison
1. In the New Condition Please Choose Comparison Type dialog box, select String Comparison, and
then click Next.
2. In the Please Specify Comparison dialog box, under the condition source you previously selected, click
the down arrow to access the following list.
String Comparison Values
3. From the list, select one of the following values.
l
= (equal to) - This value specifies that the condition source displayed above the list must match
exactly the value typed in the box below the list.
l
l
CONTAINS - This value specifies that the condition source displayed above the list must contain
the value typed in the box below the list.
IN - This value specifies that the condition source displayed above the list must be contained by the
value typed in the box below the list.
Usage Example: Suppose that you want to split a batch so that it is routed to the following
departments: accounting, marketing, and support. To accomplish this, you could use two
methods.You could create three separate conditions with the comparison type set to = (equal
to), and then type the specific department for each one. As an alternative, you could create one
condition with the comparison type set to IN, and then type accounting, marketing, support
in the box below the list.
4. In the box below the list, type the value that will initiate the batch split.
5. If you want to match the capitalization of the value you typed, then select Case Sensitive.
6. Click Finish to save the condition. The condition appears in the Target Job Configuration dialog box.
Numeric Comparison
1. In the New Condition Please Choose Comparison Type dialog box, select Numeric Comparison, and
then click Next.
2. In the Please Specify Comparison dialog box, under the condition source you previously selected, click
the down arrow to access the following list.
Numeric Comparison Values
PaperVision® Capture Administration Guide
377
Chapter 14 Batch Splitting
3. From the list, select one of the following values.
l
= (equal to) - This value specifies that the condition source displayed above the list must match
exactly the value typed in the box below the list.
l
l
l
l
> (greater than) - This value specifies that the condition source displayed above the list must be
greater than the value typed in the box below the list.
> (greater than or equal to) - This value specifies that the condition source displayed above the list
must be greater than or equal to the value typed in the box below the list.
< (less than) - This value specifies that the condition source displayed above the list must be less
than the value typed in the box below the list.
< (less than or equal to) - This value specifies that the condition source displayed above the list must
be less than or equal to the value typed in the box below the list.
4. In the box below the list, type the value that will initiate the batch split.
5. Click Finish to save the condition. The condition appears in the Target Job Configuration dialog box.
Date/Time Comparison
1. In the New Condition Please Choose Comparison Type dialog box, select Date/Time Comparison.
2. Click Next to open the Please Specify Comparison dialog box.
New Condition Please Specify Comparison (Date/Time)
3. Under the condition source you previously selected, click the down arrow to access the following list.
Numeric Comparison Values
PaperVision® Capture Administration Guide
378
Chapter 14 Batch Splitting
4. From the list, select one of the following values.
l
= (equal to) - This value specifies that the condition source displayed above the list must match
exactly the time and/or date values you specify below the list.
l
l
l
l
> (greater than) - This value specifies that the condition source displayed above the list must be later
than the time and/or date values you specify below the list.
> (greater than or equal to) - This value specifies that the condition source displayed above the list
must be later than or equal to the time and/or date values you specify below the list.
< (less than) - This value specifies that the condition source displayed above the list must be earlier
than the time and/or date values you specify below the list.
< (less than or equal to) - This value specifies that the condition source displayed above the list must
be earlier than or equal to the time and/or date values you specify below the list.
5. Select one of the following items that will initiate the batch split.
l
Time - If you select this option, type or select the time in the corresponding box.
l
Date - If you select this option, type or select the date in the corresponding box.
l
Date/Time - If you select this option, type or select the date and time in the corresponding boxes.
6. Click Finish to save the condition. The condition appears in the Target Job Configuration dialog box.
Regular Expression
A regular expression is a pattern of text that consists of ordinary characters (for example, letters a through z) and
special characters, known as metacharacters. The pattern describes one or more strings to match when searching
text.You can find examples of regular expressions and a listing of metacharacters and their behavior in the context
of regular expressions at the following link.
http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.90).aspx
Before you start this procedure, prepare the regular expression you want to use to split the batch.
1. In the New Condition Please Choose Comparison Type dialog box, select Regular Expression, and
then click Next.
2. In the Regular Expression box, type the regular expression that will initiate the batch split.
3. In the Test box, type a test value applicable to the regular expression you entered. If your regular
expression is working correctly, a check mark
appears next to the Test box. Otherwise, you will see
.
4. Click Finish to save the condition. The condition appears in the Target Job Configuration dialog box.
PaperVision® Capture Administration Guide
379
Chapter 14 Batch Splitting
Test Batch Splitting Configurations
This section contains procedures for testing the configured conditions for batch splitting. You can perform testing
on the Target Jobs and Target Job Configuration dialog boxes.
Testing Conditions from the Target Jobs Dialog Box
The Target Jobs dialog box serves as a dashboard where you can see a listing of target jobs set up for batch
splitting. Use the following procedure to test a listed job.
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job for which you want to test batch splitting conditions, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. Double-click the Batch Splitting step to display the Properties tab on the left pane.
6. On the Properties tab, expand Batch Splitting, and then click Target Jobs.
7. Click the ellipsis button
to open the Target Jobs dialog box. This dialog box displays all of the batchsplitting jobs to which you have access.
Target Jobs
PaperVision® Capture Administration Guide
380
Chapter 14 Batch Splitting
8. On the toolbar, click Toggle Test Mode
. Test fields appear on the right pane.
Target Jobs Test Mode
9. In the Field Value boxes, you can type data to test the criteria of the condition. When you type a correct
value, a check mark
appears in the Pass column. Otherwise, you will see
10. Click Toggle Test Mode
in the Pass column.
to exit test mode.
11. If you had a condition that did not pass validation, select it, and then click Edit Batch Split
.
12. In the Target Job Configuration dialog box, check the condition to ensure it was configured correctly.
Test Conditions from the Targeted Jobs Configuration dialog
In the Target Job Configuration dialog box you can test existing conditions or you can test them as you
create them.
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Select the job for which you want to test batch splitting conditions, and then click Edit Job
4. If necessary, click Check Out Job
.
so you can edit it.
5. Double-click the Batch Splitting step to display the Properties tab on the left pane.
6. On the Properties tab, expand Batch Splitting, and then click Target Jobs.
7. Click the ellipsis button
to open the Target Jobs dialog box. This dialog box displays all of the batchsplitting jobs to which you have access.
8. Select the job you want to edit, and then click Edit Batch Split
PaperVision® Capture Administration Guide
.
381
Chapter 14 Batch Splitting
9. In the Target Job Configuration dialog box, click Toggle Test Mode
right pane.
. Test fields appear on the
Target Job Configuration Test Mode
10. In the Field Value boxes, you can type data to test the criteria of the condition. When you type a correct
value, a check mark
appears next to the condition on the left pane. Otherwise, you will see
11. Click Toggle Test Mode
.
to exit test mode.
PaperVision® Capture Administration Guide
382
Chapter 15 Forms Magic Processing
PaperVision Capture can process data from Forms Magic when you add a Forms Magic Processing step to a job.
This step lets you configure a connection to a Forms Magic database so that the job can access and process
Forms Magic data.
Note: Once the Metadata Vector Document (MVD) has been created, the document cannot be
reprocessed to create or update the existing MVD (for example, changing the language).
Configuring a Forms Magic Processing Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. Add the FM Processing step to the job using one of the following methods.
l
Select the job step that you want FM Processing to follow. On the Job Step Toolbox tab, double-click
FM Processing.
l
On the Job Step Toolbox tab, drag FM Processing
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select FM Processing.
7. Double-click the FM Processing step to display the Properties tab on the left pane.
8. On the Properties tab, expand FM Processing.
9. The following properties are available.
l
l
Break Composite Documents - This option is available only when the Process Composite
Documents option is set to True. Set this option to True if you want PaperVision Capture to break
composite documents based on the Forms Magic detection of different document types. Set this option
to False if you want to leave composite documents as they are.
First N Pages - This option limits the number of pages to be processed from Forms Magic. For
example, if you want to process only the first five pages of a document coming from Forms Magic, then
type 5 in this field. This prevents processing pages that you don’t need.
PaperVision® Capture Administration Guide
383
Chapter 15 Forms Magic Processing
l
l
l
l
l
FM Database - This option configures the connection to the Forms Magic database that contains the
documents to be processed. See "Configuring a Connection to the Forms Magic Database" on page 384
to establish a connection to the Forms Magic database.
Languages - This option will include characters from other languages. Click the ellipsis and the Select
Languages and/or Other Countries dialog box appears. Double-click the language(s) you want you
want to include from the Available list, or select the language and click the right arrow. To remove
languages from the Select list, double-click the language, or select it and click the left arrow. Click
OK to save your selections, or Cancel to return to the Properties tab.
Process All Documents - Set this option to True if you want PaperVision Capture to process all Forms
Magic documents and overwrite any existing classifications. Setting this option to False will force
Forms Magic to skip documents that have already been processed.
Process Composite Documents - Composite documents are documents that contain two or more
documents to be processed. For example, if a document contains more than one invoice, it is a
composite document that is made up of multiple embedded documents. Set this option to True if you
want PaperVision Capture to process documents as composite documents. (This option must be set to
True for the Break Composite Documents option to be available.) Set this option to False if you want
PaperVision Capture to treat each document as a single document.
Unclassified Composite Document Handling - This option is available only when Break Composite
Documents and Process Composite Documents are set to True. This setting determines how
composite documents that contain unclassified embedded documents are handled. Select Leave as
Single Document to keep the original composite document intact and tag it as an exception. Select
Separate Into Classified Documents to break the original composite document into a batch of
classified documents as identified by Forms Magic, and unclassified documents comprised of the
unclassified pages from the original composite document. The unclassified documents are QC tagged.
(See "Options for Processing a Composite Document" on page 385 for more information.)
10. On the toolbar, click Save Job
to save the FM Processing step configuration.
Configuring a Connection to the Forms Magic Database
1. If you haven’t already done so, complete the procedure under "Configuring a Forms Magic Processing Job
Step" on page 383 .
2. On the Job Definitions window, on the workspace, double-click the FM Processing job step to open the
Properties tab.
3. On the Properties tab, expand FM Processing.
4. Click FM Database, and then click the ellipsis button
to open the Forms Magic Database dialog box.
5. In the Server box, type the name of the server where the FM database to which you want to connect is
located.
6. In the Port box, type the port number used to connect to the FM database.
7. In the Database box, type the name of the FM database.
8. In the Username box, type the name for a read-only database user. Do not use a read/write database user.
9. In the Password box, type the password for the specified user.
PaperVision® Capture Administration Guide
384
Chapter 15 Forms Magic Processing
10. Click Test Connection to ensure that the FM database connection is valid. If the database connection is
successful, a confirmation message appears. If the connection is not successful, this symbol
next to the data you must fix.
appears
11. After the database connection is established, click OK.
12. On the toolbar, click Save Job
to save the FM Processing step configuration.
Options for Processing a Composite Document
When Forms Magic processes composite documents, there might be cases where it is unable to classify all
or a portion of the document. When this occurs, you can use the Unclassified Composite Document
Handling property to specify how PaperVision capture handles the unclassified documents. (See the
procedure under "Configuring a Forms Magic Processing Job Step" on page 383 for information about setting
properties for Forms Magic processing.) Based on your settings, PaperVision Capture will either keep the
original composite document intact and tag it as an exception, or break the original composite document into
a batch comprised of documents classified by Forms Magic and unclassified documents comprised of the
unclassified pages from the original composite documents. The unclassified documents are QC tagged.
Here is an example of what happens when you set the Unclassified Composite Document Handling
property to Separate Into Classified Documents. The original batch contains a single composite document
that has six pages, and Forms Magic processing has classified them as follows:
l
page 1 is a one-page invoice
l
pages 2-3 are unclassified
l
pages 4-5 are a two-page invoice
l
page 6 is unclassified
The resulting batch will look as follows:
l
Document 1 is comprised of the original page 1 (the one-page invoice).
l
Document 2 is comprised of the original pages 2-3 (unclassified) and is QC tagged.
l
Document 3 is comprised of the original pages 4-5 (the two-page invoice).
l
Document 4 is comprised of the original page 6 (unclassified) and is QC tagged.
PaperVision® Capture Administration Guide
385
Chapter 16 Forms Magic Index Mapping
PaperVision Capture can map index files set up in Forms Magic and process them. To use this feature, you
must first set up a connection to the Forms Magic database and add a Forms Magic Processing step to the
job. (See "Configuring a Forms Magic Processing Job Step" on page 383 to complete these tasks.)
Configuring a Forms Magic Index Mapping Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. Add the FM Index Mapping step to the job using one of the following methods.
l
Select the job step that you want FM Index Mapping to follow. On the Job Step Toolbox tab, doubleclick FM Processing.
l
On the Job Step Toolbox tab, drag FM Index Mapping
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select FM Index Mapping.
7. Double-click the FM Index Mapping step to display the Properties tab on the left pane.
8. The following properties are available.
l
l
l
Apply Formatting - Set this option to True if you want PaperVision Capture to apply formatting to the
contents of mapped Forms Magic fields upon import. For example, a currency value would
automatically be formatted as such. If PaperVision Capture cannot parse an imported value as the
specified field type, no formatting occurs. For example, if a currency field contained the value “John
Smith” the field would be left as is. Set this option to False if you do not want PaperVision Capture to
automatically apply formatting to imported fields.
Detail Set Mapping - This option maps detail sets from Forms Magic to PaperVision Capture. See
"Mapping Forms Magic Detail Sets and Fields" on page 387 to configure this feature.
Field Mapping- This option lets you map index values set up in Forms Magic to corresponding values
in PaperVision Capture. See "Mapping Forms Magic Detail Sets and Fields" on page 387 to configure
this feature.
NOTE: All FM Index Mapping job step properties are removed when the step is copied to a new job.
PaperVision® Capture Administration Guide
386
Chapter 16 Forms Magic Index Mapping
9. When you are finished setting properties, click Save Job
.
Mapping Forms Magic Detail Sets and Fields
1. If you haven’t already done so, complete the procedure under "Configuring a Forms Magic Index Mapping
Job Step" on page 386 .
2. On the Job Definitions window, on the workspace, double-click the FM Index Mapping job step to open
the Properties tab.
3. On the Properties tab, expand FM Index Mapping.
4. Click Detail Set Mapping or Field Mapping, and then click the ellipsis button
.
5. In the Forms Magic Database dialog box, specify the following items:
l
Server - Type the name of the server where the FM database to which you want to connect is located.
l
Port - Type the port number used to connect to the FM database.
l
Database - Type the name of the FM database.
l
Username - Type the user name for a read-only database user. Do not use a read/write database user.
l
Password - Type the password for the specified user.
6. To ensure that the FM database connection is valid, click Test Connection. If the database connection is
successful, a confirmation message appears. If the connection is not successful, this symbol
next to the data you must fix.
appears
7. After the database connection is established, click OK.
8. Click Add
.
9. (Optional) Under Filter FM Fields, you can choose the following options:
l
l
by Content Type - Select this check box to filter Forms Magic fields by content type. After you select
this option, you can select a value from the associated list.
by Form - Select this check box to filter Forms Magic fields by form. After you select this option, you
can select a value from the associated list.
10. (Optional) If you have Forms Magic fields and Capture indexes that have identical names, you can click
Auto Match to automatically map these values. (For example, a Forms Magic Quantity field would
automatically be mapped to a Capture Quantity index.) You must manually map any values that do not
have identical names.
11. In the FM Fields list, select the Forms Magic field that you want to map.
12. In the Capture Indexes list, select the PaperVision Capture index to which you want to map the field you
selected in the previous step.
13. In the corresponding Minimum Confidence (0-100) box, type or select a value between zero and 100. This
value defines the minimum level of confidence that the incoming value was read correctly by the OCR
engine. A confidence level equal to or above this value causes the value to be automatically imported. If the
confidence level is less than this value, then the action you specify in the next step is taken.
PaperVision® Capture Administration Guide
387
Chapter 16 Forms Magic Index Mapping
14. Click the corresponding No Confidence Action list, and then select one of the following actions to be
taken if the specified minimum confidence value is not met.
l
Do Nothing - This option causes the index value to not be imported.
l
Set Blank - This option sets the index field to blank.
l
Copy Value and Tag Index - This option copies the value and tags the index for QC or other purposes.
l
l
l
Copy Value and Tag Document - This option copies the value and tags the document for QC or other
purposes.
Copy Value and Tag both Index and Document - This option copies the value and tags the index and
document for QC or other purposes.
Set Blank and Clear FM Confidence - This option sets the index field to blank and removes the Forms
Magic confidence level from the field. The confidence level determines what field appears when a user
clicks Next FM Field on the Index Manager to move through index fields in the PaperVision Capture
Operator Console. Removing the confidence level allows operators to skip the review of any fields that
weren’t properly read during the Forms Magic OCR process. This option is used only when the Forms
Magic QC property is enabled in an indexing step.
15. If you are mapping multiple values, click Apply, and then repeat the previous steps until you have mapped
all values.
16. When you are finished mapping values, click Done. The values you mapped appear in a list.
17. (Optional) After a mapping value appears in the list, you can select it, and then click: the up and down
arrows to move it, or
to delete it.
18. (Optional) You can click FM Database Connection to configure and test the connection to the Forms
Magic database.
19. Click OK, and then click Save Job
.
NOTE: If you plan to perform a match and merge using detail fields that contain Forms Magic data, be
aware that after the match and merge process is complete, all Forms Magic detail set data is removed.
PaperVision® Capture Administration Guide
388
Chapter 17 AP Processing
The AP Processing step is a manual step that you assign to a PaperVision Capture user. Using an
AP Processing step lets you match a single purchase order with a single invoice, multiple purchase orders
with a single invoice, and approve or reject invoices. Purchase order information can be retrieved from an
external data source.
This content describes how to configure the AP Processing properties that you can use when you add an
AP Processing job step. See "Chapter 4 Job Creation and Configuration" on page 46 for information about
general job set up and the properties that apply to all job steps.
Configuring an AP Processing Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. If necessary, add the AP Processing job step to the workspace using one of the following methods.
l
Select the job step that you want AP Processing to follow. On the Job Step Toolbox tab, double-click
AP Processing.
l
On the Job Step Toolbox tab, drag AP Processing
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select AP Processing.
7. Double-click the AP Processing step to display the Properties tab on the left pane.
8. On the Properties tab, expand AP Processing.
9. You can set the following properties.
l
l
Allow Modify Index - Set this property to True to allow users assigned to this job step to edit indexes
in the PaperVision Capture Operator Console (which will then also require that a PaperVision Capture
Index License is available for this step). Set this property to False to disallow the editing of indexes on
this job step.
Multi PO - This property lets you specify the detail set and data sources for invoices that reference
multiple purchase orders. See "Setting Properties for Multiple PO Processing" on page 390 for more
information. Use this option if your purchase order field is a detail field.
PaperVision® Capture Administration Guide
389
Chapter 17 AP Processing
l
l
Rejection Reasons - This property lets you define the rejection reasons that users can select from in
the PaperVision Capture Operator Console. See "Defining Rejection Reasons" on page 390 for more
information.
Single PO - This property lets you specify the PaperVision Capture index and data sources for invoices
that reference single purchase orders. See "Setting Properties for Single PO Processing" on page 390
for more information. Use this option if your purchase order field is an index field.
Setting Properties for Multiple PO Processing
1. Complete the procedure under "Configuring an AP Processing Job Step" on page 389 .
2. On the Properties tab, click Multi PO.
3. Click
to access the Pick Detail Field dialog box.
4. From the list of detail fields, select the field containing the purchase order information, and then click Next.
5. Configure the external data source. (If you need help, see "Configuring an External Data Source" on page
391 for more information.)
6. Click Finish to save the external data source configurations.
Defining Rejection Reasons
1. Complete the procedure under "Configuring an AP Processing Job Step" on page 389.
2. On the Properties tab, click Rejection Reasons.
3. Click
to access the Specify Rejection Reasons dialog box.
4. Click Add
, and then type the reason that the invoice is rejected.
5. Repeat the previous step to add more reasons.
6. (Optional) After a reason appears in the list, you can select it, and then click the up and down arrows to
move it, or click
to delete it.
7. When the list of reasons looks the way you want it to appear in the PaperVision Capture Operator Console,
click OK.
Setting Properties for Single PO Processing
1. Complete the procedure under "Configuring an AP Processing Job Step" on page 389 .
2. On the Properties tab, click Single PO.
3. Click
to access the Pick Capture Index dialog box.
4. From the list of indexes, select the PO field, and then click Next.
5. Configure the external data source. (If you need help, see"Configuring an External Data Source" on page
391 for more information.)
6. Click Finish to save the external data source configurations.
PaperVision® Capture Administration Guide
390
Chapter 17 AP Processing
Configuring an External Data Source
In the AP Processing step you can specify an external data source for PO field data and PO line items. The
following procedure will walk you through configuring Microsoft® SQL Server and Microsoft® Access for external
data sources.
To configure an external data source
1. Complete the procedure for setting the properties for Single or Multiple PO Processing until you get to the
Select Datasource Provider dialog box.
2. Do one of the following:
For a Microsoft SQL Server database, select Microsoft® SQL Server, and then click Next. In the Connect
To Server dialog box, specify the following values, and then click Next.
Server - Type the name of the server where the database to which you want to connect is located.
l
Authentication - Select the type of authentication required for the database user. SQL Server
Authentication requires a user name and password. Windows Authentication uses the login credentials
of the current user. No user name or password is required.
l
l
Username - Type the user name for the database user who has access to the database.
l
Password - Type the password for the database user.
l
Database - Select the database to be used.
or
For a Microsoft Access database, select Microsoft® Access, and then click Next. In the Specify
Database File Path dialog box, specify the following values, and then click Next.
l
Database Path - Click
l
Password - If required, type your password.
l
to locate the database, and then click Open.
Datasource Provider - If you have more than one version of Microsoft Office Access Database Engine
Provider, select the version you want to use.
NOTE: If this is the first time you have set up Microsoft Access as a data source, you must download the
Microsoft Office Access Database Engine. Click the link (Click here to download Access provider) to
begin the download. After it is complete, click Cancel and then start this procedure over.
3. In the Select Tables and Associations dialog box, click the Tables tab, and select the table(s) from which
you want to retrieve invoice and PO data.
4. Click the Associations tab, and then click Add
values, and then click Add.
to define a table association. Specify the following
l
Table 1 - Select one of the tables for which you want to create an association.
l
Column - Select the column name for which you want to create an association.
PaperVision® Capture Administration Guide
391
Chapter 17 AP Processing
l
l
l
Table 2 - Select the table for which you want to create an association with Table 1. You must select two
different tables.
Column - Select the column name for which you want to create an association with the column you
specified from Table 1.
Join Type - Select the type of join you want to use to combine the rows from the tables.
5. If you want to add additional associations, repeat the previous step.
6. (Optional) After an association appears in the list, you can select an entry, and then click: the up and down
arrows to move it,
to edit it, or
to delete it.
7. When you are done adding associations, click Next.
8. In the Select Columns dialog box, click Add
in the query, and then click Add.
, specify the following values to add columns to return
l
Table - Select the table you want to use.
l
Column - Select the column you want to use in the matching process.
l
Field Name - Type the name you want to appear in the AP Processing window.
9. If you want to add more columns, repeat the previous step.
10. (Optional) After a column appears in the list, you can select it, and then click the up and down arrows to
move it,or click
to delete it.
11. When you are finished adding and arranging the columns, click Next.
12. In the Configure the Query Where Clause dialog box, you can create a where clause to filter the
purchase order fields. For example, you may wish to create a filter to return specific purchase order
numbers, or purchase orders that fall within a specified range. To create your query, click Add Condition,
and then specify the following values.
l
Table - Select the table you want to use.
l
Column - Select the column that contains the data you want to filter.
l
Parameter - Select the parameter you want to use. Static values are unavailable for the AP Processing
step.
l
Condition Operator - Select the condition operator you want to use to create your query.
l
Group Condition - Select this check box to group this condition with the preceding condition.
13. Click Apply to save the condition. If you want to add additional conditions, repeat the previous step.
14. You can edit conditions by clicking on the column, parameter, or operator values in each condition.
15. You can click the arrow to the upper-right of the condition to add a sub condition or delete a condition.
16. When you are finished configuring the where clause, click Next.
17. In the Test Query dialog box, if you want to test your query, type a parameter in the Value box, and then
click Execute.
18. Click Next. The External Data Source (PO Fields) Configured dialog box appears.
PaperVision® Capture Administration Guide
392
Chapter 17 AP Processing
19. Click Next, and then repeat this procedure to configure the external data source for retrieving PO line items.
20. When you get to the External Data Source (PO Line Items) Configured dialog box, click Finish.
PaperVision® Capture Administration Guide
393
Chapter 18 Business Rules
The Business Rules job step is an automated job step that provides predefined business rules. These
business rules perform complex tasks for which there is a common business need, such as ensuring that
invoice totals and date ranges are correct, that specified field values are populated, and performing various
comparison, matching, merging, and validation operations on indexes. You can customize each business rule
to meet your specific business needs. Business rules are grouped into the following categories based on the
functions they perform:
l
AP (Accounts Payable)
l
Capture Detail Set
l
Capture Index
l
Forms Magic
This content describes how to configure the various business rules that you can use in a job when you add a
Business Rules job step. See "Chapter 4 Job Creation and Configuration" on page 46 for information about
general job set up and the properties that apply to all job steps.
Configuring a Business Rules Job Step
1. After you have logged in to the PaperVision Capture Administration Console, expand Entities, and
then expand Entity Name.
2. Click Capture Jobs. A listing of jobs appears on the right pane.
3. Do one of the following:
To edit an existing job, select it, and then click Edit Job
l
To add a new job, click Create New Job
OK.
l
4. If necessary, click Check Out Job
.
. In the Name box, type a name for the job, and then click
so you can edit it.
5. On the Job Definitions window, click the Job Step Toolbox tab.
6. If necessary, add the Business Rules job step to the workspace using one of the following methods.
l
Select the job step that you want Business Rules to follow. On the Job Step Toolbox tab, doubleclick Business Rules.
l
On the Job Step Toolbox tab, drag Business Rules
on to the workspace.
l
On the workspace, right-click, point to Insert Job Step, and then select Business Rules.
7. Double-click the Business Rules step to display the Properties tab on the left pane.
8. On the Properties tab, expand Business Rules.
9. Click the Business Rules property, and then click
PaperVision® Capture Administration Guide
to access the Specify Business Rules dialog box.
394
Chapter 18 Business Rules
10. Do one of the following:
l
l
To edit an existing business rule, select it, and then click Edit Business Rule
information, go to the configuration procedure for the business rule you selected.
To add a business rule, click Add Business Rule
. If you need more
to open the Select Business Rule dialog box.
Select Business Rule Dialog Box
11. Go to the configuration procedure for the business rule category that contains the business rule you want to
add.
Configuring AP (Accounts Payable) Business Rules
The following AP (Accounts Payable) business rules are available:
l
l
l
Invoice Total Check - This business rule checks the invoice total against the sum of all line items, plus
other charges like shipping and taxes. By default, a tag is automatically applied to documents that do not
pass the check so they can be reviewed.
Line Item Sub-Total Check - This business rule checks each line item (detail fields) total against the
calculated amount to verify that the invoice is correct. For example, you can use this rule to verify that
the Quantity x the Unit Price + Tax - Discount = the Line Total amount.
PO Lookup Verification - This business rule checks each line item value against an external data
source and tags a document when a match is not found.
The following procedures will walk you through configuring the AP (Accounts Payable) business rules.
PaperVision® Capture Administration Guide
395
Chapter 18 Business Rules
To configure the Invoice Total Check business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click AP (Accounts Payable).
4. In the Business Rule list, click Invoice Total Check, and then click Next.
5. In the Pick Capture Detail Field dialog box, select the detail field that contains the total for each line item,
and then click Next.
6. In the Pick Capture Index dialog box, select the index that contains the total amount for the invoice, and
then click Next.
7. (Optional) If you want to include other items that affect the invoice total, such as discounts, shipping
charges, and tax, perform the following sub-procedure. Otherwise, click Next.
1. In the Pick Other Invoice Fields dialog box,click Add
.
2. In the Capture Index column, select the index that contains the value you want added to or
subtracted from the invoice total.
3. In the Operator column, do one of the following:
l
Select + to add the index value to the invoice total.
l
Select - to subtract the index value from the invoice total.
4. Repeat the two previous steps to include other index values. You can select an entry, and then click
the up and down arrows to move it,or click
to delete it.
5. When you are finished adding indexes, click Next.
8. In the Bad Invoice Total Action dialog box, you can specify whether documents with totals that do not
equal the sum of the line items are tagged. To tag documents, select the Tag Document check box; if you
don’t want documents tagged, clear this check box.
9. The Tag Name box defines what appears on the tag. You can use the default text or type a new entry.
10. Click Finish to save the business rule.
To configure the Line Item Sub-Total Check business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click AP (Accounts Payable).
4. In the Business Rule list, click Line Item Sub-Total Check, and then click Next.
5. In the Pick Line Item Fields dialog box, specify the following values.
PaperVision® Capture Administration Guide
396
Chapter 18 Business Rules
l
Quantity - Select the index value that contains the quantity for the line item. This value is required.
l
Unit Price - Select the index value that contains the unit price for the line item. This value is required.
l
l
l
Discount - Select the index value that contains the discount for the line item. You can leave this value
blank if it is not applicable.
Tax - Select the index value that contains the tax for the line item. You can leave this value blank if it is
not applicable.
Line Total - Select the index value that contains the total for the line. This value is required.
6. Click Next.
7. In the Bad Line Item Action dialog box, you can specify what action occurs when documents with
incorrect line item totals are found. You can select the following options.
l
Tag Document - Select this check box if you want to tag documents with incorrect line item totals. If
you don’t want documents tagged, clear this check box. The Tag Name box defines what appears on the
tag. You can use the default text or type a new entry.
l
Clear Line Item Total - Select this check box to clear the index value for the line item total.
l
Clear Quantity - Select this check box to clear the index value for the quantity.
l
Clear Unit Price - Select this check box to clear the index value for the unit price.
l
Clear Discount - Select this check box to clear the index value for the discount.
l
Clear Tax - Select this check box to clear the index value for the tax.
8. Click Finish to save the business rule.
Configuring Capture Detail Set Business Rule
The Capture Detail Set business rule category contains the Complete Group business rule. This business
rule verifies that each specified detail field in a group is populated. You can configure this rule to skip a group
if one of the specified fields contains one of a set of pre-defined values.
To configure the Complete Group business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Detail Set.
4. In the Business Rule list, click Complete Group, and then click Next.
5. From the Available list, select the detail fields to check if they are empty or missing. If the selected detail
field has a value, this business rule is disregarded. To select multiple detail fields, hold down the Ctrl key.
To select all detail fields, select the Select All check box.
6. Click the right arrow to move the detail fields you selected to the Members list.
7. (Optional) To remove detail fields from the Members list, select them, and then click the left arrow. To
remove all detail fields, select the Select All check box, and then click the left arrow.
PaperVision® Capture Administration Guide
397
Chapter 18 Business Rules
8. When you are finished selecting detail fields, click Next.
9. (Optional) If you want to specify a value for a detail field that when found will cause the business rule to skip
that detail set or line item, perform the following sub-procedure. Otherwise, click Next.
1. In the Pick Capture Detail Field dialog box, select Enable Exception Field.
2. Click Add
, and then type the exception value in the box that appears.
3. Repeat the previous step if you want to add multiple exception values.
4. (Optional) You can select an exception, and then click the up and down arrows to move it,or click
to delete it.
5. In the Exception Field list, select the detail field to which you want to apply the exception(s) you
created, and then click Next.
NOTE: You can define exceptions for only one detail field. Exceptions are applied to the detail field that is
selected when you click Next.
10. In the Incomplete Group Action dialog box, you can specify whether documents that have missing or
empty detail fields are tagged. To tag documents, select the Tag Document check box; if you don’t want
documents tagged, clear this check box.
11. The Tag Text box defines what appears on the tag. You can use the default text or type a new entry.
12. Click Finish to save the business rule.
Configuring Capture Index Business Rules
The following Capture Index business rules are available:
l
l
l
l
l
l
Date Range - This business rule checks that date ranges defined by two index fields are correct. For
example, with a date range using a “from” and “to” format, such as from 01/01/2014 to 01/01/2015,
this business rule verifies that the value in the “to” field falls on or after the value in the “from” field.
Match Field Value (Another Capture Index) - This business rule matches the PaperVision Capture
index values you specify against one another.
Match Field Value (External Source) - This business rule matches the PaperVision Capture index
values you specify against index values from an external source that you specify.
Merge Like Documents - This business rule merges pages from multiple documents with the same
index values into a single document.
Missing Field Value - This business rule checks for missing index value(s).
NPI (National Provider Identifier) Check - This business rule verifies that the National Provider
Identifier (NPI) contained in an index field is valid. The NPI is a unique 10-digit identification number
issued to health care providers in the United States by the Centers for Medicare and Medicaid Services.
The NPI has replaced the Unique Provider Identification Number (UPIN) as the required identifier for
Medicare services, and is used by other payers, including commercial healthcare insurers.
The following procedures will walk you through configuring the Capture Index business rules.
PaperVision® Capture Administration Guide
398
Chapter 18 Business Rules
To configure the Date Range business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Index.
4. In the Business Rule list, click Date Range, and then click Next.
5. In the Pick the From field list, select the index that contains the earliest value for a date range, and then
click Next.
6. In the Pick the To field list, select the index that contains the latest value for a date range, and then click
Next.
7. In the Date Range Actions dialog box, you can specify the tagging options to apply when a date range is
incorrect, that is, the “from” value for the date range occurs later than the “to” value. You can specify the
following options. For each option, type the text that you want to appear on the tag in the Tag Text box. To
remove a tagging option, clear the check box.
l
l
l
Tag ‘From’ Index - Select this check box to place a tag on the index that contains the “from” value for
the date range.
Tag ‘To’ Index - Select this check box to place a tag on the index that contains the “to” value for the date
range.
Tag Document - Select the check box to place a tag on the document that contains an incorrect date
range.
8. Click Finish to save the business rule.
To configure the Match Field Value (Another Capture Index) business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Index.
4. In the Business Rule list, click Match Field Value (Another Capture Index), and then click Next.
5. In the Pick Capture Index dialog box, select the index to check against another PaperVision Capture
index, and then click Next.
6. Select the index to check against the first index you selected in the previous step, and then click Next.
7. In the No Match Action dialog box, you can specify the action to occur when the specified index values do
not match. You can specify the following tagging options. For each option, type the text that you want to
appear on the tag in the Tag Text box. To remove a tagging option, clear the check box.
PaperVision® Capture Administration Guide
399
Chapter 18 Business Rules
l
l
Tag Index - Select this check box to place a tag on the first index you selected.
Tag Document - Select the check box to place a tag on the document that contains the index values
that do not match.
8. If you want to populate the first index you selected with a different value, select the Populate Index check
box. You can select one of the following options.
l
l
l
l
Constant - Select this option to populate the index with the value you type in the box.
Other Index- Select this option to populate the index with the value contained in the index you select
from the list.
External Source - Select this option to populate the index with a value you specify from an external
source. If you select this value, click Next, and then configure the data source. (See "Specifying an
External Data Source Provider for Business Rules" on page 405 if you need help.)
Match Value - Select this option to populate the first index with the value from the second index you
selected.
9. Click Finish to save the business rule.
To configure the Match Field Value (External Source) business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Index.
4. In the Business Rule list, click Match Field Value (External Source), and then click Next.
5. In the Pick Capture Index dialog box, select the PaperVision Capture index to match against an external
source, and then click Next.
6. Configure the data source. (See "Specifying an External Data Source Provider for Business Rules" on page
405 if you need help, and then go to the next step in this procedure.)
7. In the No Match Action dialog box, you can specify the action to occur when the specified index values do
not match. You can specify the following tagging options. For each option, type the text that you want to
appear on the tag in the Tag Text box. To remove a tagging option, clear the check box.
l
l
Tag Index - Select this check box to place a tag on the first index you selected.
Tag Document - Select the check box to place a tag on the document that contains the index values
that do not match.
8. If you want to populate the first index you selected with a different value, select the Populate Index check
box. You can select one of the following options.
l
l
Constant - Select this option to populate the index with the value you type in the box.
Other Index- Select this option to populate the index with the value contained in the index you select
from the list.
PaperVision® Capture Administration Guide
400
Chapter 18 Business Rules
l
l
External Source - Select this option to populate the index with a value you specify from an external
source. If you select this value, click Next, and then configure the data source. (See "Specifying an
External Data Source Provider for Business Rules" on page 405 if you need help.)
Match Value - Select this option to populate the first index with the value from the second index you
selected.
9. Click Finish to save the business rule.
To configure the Missing Field Value business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Index.
4. In the Business Rule list, click Missing Field Value, and then click Next.
5. In the Pick Capture Index dialog box, select the index to check if it is empty or missing, and then click
Next.
6. In the Missing Index Action dialog box, you can specify the action to occur when the specified index is
missing or empty. You can specify the following tagging options. For each option, type the text that you
want to appear on the tag in the Tag Text box. To remove a tagging option, clear the check box.
l
l
Tag Index - Select this check box to place a tag on the index that is missing or empty.
Tag Document - Select the check box to place a tag on the document that contains the missing or
empty index.
7. If you want to populate the index, select the Populate Index check box. You can choose to populate the
index with the following options.
l
l
l
Constant - Select this option to populate the index with the value you type in the box.
Other Index - Select this option to populate the index with the value contained in the index you select
from the list.
External Source - Select this option to populate the index with a value you specify from an external
source. If you select this value, click Next, and then configure the data source. (See "Specifying an
External Data Source Provider for Business Rules" on page 405 if you need help.)
8. Click Finish to save the business rule.
To configure the NPI (National Provider Identifier) Check business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Capture Index.
PaperVision® Capture Administration Guide
401
Chapter 18 Business Rules
4. In the Business Rule list, click NPI (National Provider Identifier) Check, and then click Next.
5. In the Pick Capture Index dialog box, select the index that contains the National Provider Identifier (NPI)
value that you want to validate using the Luhn formula, and then click Next.
6. In the No Match Action dialog box, you can specify the action to occur when the NPI is not valid. You can
specify the following tagging options. For each option, type the text that you want to appear on the tag in the
Tag Text box. To remove a tagging option, clear the check box.
l
Tag Index - Select this check box to place a tag on the index that contains the invalid NPI.
l
Tag Document - Select the check box to place a tag on the document that contains the invalid NPI.
7. If you want to populate the index that contains the invalid NPI, select the Populate Index check box. You
can choose to populate the index with the following options.
l
l
l
Constant - Select this option to populate the index with the value you type in the box.
Other Index - Select this option to populate the index with the value contained in the index you select
from the list.
External Source - Select this option to populate the index with a value you specify from an external
source. If you select this value, click Next, and then configure the data source. (See "Specifying an
External Data Source Provider for Business Rules" on page 405 if you need help.)
8. Click Finish to save the business rule.
Configuring Forms Magic Business Rules
NOTE: You can use the Forms Magic business rules only on documents that have been classified in
PaperVision Forms Magic. It is recommended that you configure and place a Batch Splitting job step
before either of the Forms Magic business rules to ensure that any unclassified documents are split from
the batch prior to reaching a Forms Magic business rule.
The following Forms Magic business rules are available:
l
l
Merge Field Sets - This business rule merges multiple Forms Magic field sets into a single Forms Magic
field set that is used to map fields in PaperVision Capture. You must configure at least two Forms Magic
field sets to use this business rule. This business rule merges field sets from multiple Forms Magic
content types into a single field set.You can then use this merged field set in PaperVision Capture to
perform other work.
Parse Document Text and Populate FM Fields - This business rule lets you search a document or
form for specific terms that precede and follow a Forms Magic field. This business rule is useful if you
want to locate and use data elements in unstructured documents to populate Forms Magic fields, that is,
the data element you need does not always appear in the same location. In this scenario, you could
specify unique text that precedes and follows the data element you want to capture. When the business
rule finds the unique text, you can configure it to populate the Forms Magic field with the data element
that falls between the specified search criteria.
The following procedures will walk you through configuring the Forms Magic business rules.
PaperVision® Capture Administration Guide
402
Chapter 18 Business Rules
To configure the Merge Field Sets business rule
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Forms Magic.
4. In the Business Rule list, click Merge Field Sets, and then click Next.
5. In the Pick FM Databases dialog box, click Add
, and then specify the following items:
l
Server - Type the name of the server where the FM database to which you want to connect is located.
l
Port - Type the port number used to connect to the FM database.
l
Database - Type the name of the FM database.
l
l
Username - Type the user name for the read-only database user. Do not use the read/write database
user.
Password - Type the password for the specified user.
6. To ensure that the FM database connection is valid, click Test Connection. If the database connection is
successful, a confirmation message appears. If the connection is not successful, this symbol
next to the data you must fix. After the database connection is established, click OK.
appears
7. Repeat the two previous steps to specify additional FM database connections.
8. (Optional) After an entry appears in the Forms Magic Database Connections list, you can select an entry,
and then click: the up and down arrows to move it,
to edit it, or
to delete it.
9. Click Finish to save the business rule.
To configure the Parse Document Text and Populate FM Fields business
rule
Note: Once the Metadata Vector Document (MVD) has been created, the document cannot be
reprocessed to create or update the existing MVD (for example, changing the language).
1. Complete the procedure under "Configuring a Business Rules Job Step" on page 394 to open the Select
Business Rule dialog box.
2. Ensure that the Group By Category check box is selected.
3. In the Category list, click Forms Magic.
4. In the Business Rule list, click Parse Document Text and Populate FM Fields, and then click Next.
5. In the Full Text Parsing dialog box, click Configure FM Database, and then specify the following items:
PaperVision® Capture Administration Guide
403
Chapter 18 Business Rules
l
Server - Type the name of the server where the FM database to which you want to connect is located.
l
Port - Type the port number used to connect to the FM database.
l
Database - Type the name of the FM database.
l
Username - Type the user name for a read-only database user. Do not use a read/write database user.
l
Password - Type the password for the specified user.
6. To ensure that the FM database connection is valid, click Test Connection. If the database connection is
successful, a confirmation message appears. If the connection is not successful, this symbol
next to the data you must fix.
appears
7. After the database connection is established, click OK.
8. (Optional) To include characters from other languages, click Languages, and then double-click the
language(s) you want you want to include from the Available list, or select the languages and click the right
arrow. To remove languages from the Select list, double-click the language, or select it and click the left
arrow. Click OK to save your selections, or Cancel to return to the Full Text Parsing dialog box.
9. To select a file to use for testing this business rule, click
next to the Test Document dialog box. You
can select an image file ( for example, .tif, .jpg, .png) or a .pdf file.
10. In the Open dialog box, select the file you want to use, and then click Open. The Full-Text OCR engine will
proceed to parse the document.
11. (Optional) By default, all non-alphanumeric characters are stripped from documents. If you want to use any
non-alphanumeric characters, you must type them in the Allowed Characters box. Do not separate
multiple characters with a space. For example, if you want to capture an entire valid email address, you
must type a period ( . ) and an at sign ( @ ) without any spaces in the Allowed Characters box.
12. (Optional) By default, a single space is used as the delimiter to separate words. If you want to use
something other than a single space, type the value in the Delimiter box.
13. Click View Words to display the words that were parsed by the OCR engine. This data helps you identify
the words that appear before and after the data element you want to capture. Click an area outside of the
View Words box to close it.
14. To specify the words for which you want to search, click Add
l
l
, and then specify the following items:
by Content Type - Select this check box to filter Forms Magic fields by content type. After you select
this option, you can select a value from the associated list.
by Form - Select this check box to filter Forms Magic fields by form. After you select this option, you
can select a value from the associated list.
l
Search Words - Type the words for which you want to search. You cannot leave this box blank.
l
Action - Select the action to take when the words you specified are found.
l
FM Field - This option is available only if you selected Populate FM Field for the Action option. Select
the FM field that you want populated.
PaperVision® Capture Administration Guide
404
Chapter 18 Business Rules
EXAMPLE: To locate a specific data element, you must search for unique text that precedes and follows
the data element you want to capture. For example, you want to use an email address to populate the
Customer Code field from the following text: Email Address: john.doe@DSI.com of customer. To
accomplish this,you must create two entries. For the first entry, type Email Address in the Search
Words box, and then set the Action to None. For the second entry, type of customer in the Search
Words box, set the Action to Populate FM field, and then set FM Field to the appropriate field. Your
entries would appear similar to the following:
These entries tell the business rule to find text that begins with Email Address and ends with of
customer, and then use the content that falls between those words to populate the Customer Code field.
You can add as many entries as you need to populate FM fields. After you have located a piece of text,
when you add additional search criteria, search forward from the point where the last set of criteria ended.
14. Repeat the previous step until you have defined all search criteria and actions.
15. (Optional) After an entry appears in the list, you can select it, and then click: the up and down arrows to
move it,
to edit it, or
to delete it.
16. After you have defined all search criteria and actions, you can click Test to verify that the business rule is
working correctly.
NOTE: Each time you click Test, your Forms Magic license is deprecated equal to the number of pages
in the test document.
17. Click Finish to save the business rule.
Specifying an External Data Source Provider for Business Rules
There are several business rules that let you specify an external data source. The following procedure will
walk you through configuring Microsoft ® SQL Server and Microsoft ® Access for external data sources.
To configure an external data source
1. Complete the procedure for configuring the business rule until you get to the Select Datasource Provider
dialog box.
2. Do one of the following:
For a Microsoft SQL database, select Microsoft® SQL, and then click Next. In the Connect To Server
dialog box, specify the following values, and then click Next.
l
Server - Type the name of the server where the database to which you want to connect is
PaperVision® Capture Administration Guide
405
Chapter 18 Business Rules
located.
l
Authentication - Select the type of authentication required for the database user.
SQL Server Authentication requires a user name and password. Windows Authentication
uses the login credentials of the current user. No user name or password is required.
l
Username - Type the user name for the database user who has access to the database.
l
Password - Type the password for the database user.
l
Database - Select the database to be used.
or
For a Microsoft Access database, select Microsoft® Access, and then click Next. In the Specify
Database File Path dialog box, specify the following values, and then click Next.
l
Database Path - Click
l
Password - If required, type your password.
l
to locate the database, and then click Open.
Datasource Provider - If you have more than one version of Microsoft Office Access
Database Engine Provider, select the version you want to use.
NOTE: If this is the first time you have set up Microsoft Access as a data source, you must
download the Microsoft Office Access Database Engine. Click the link (Click here to
download Access provider) to begin the download. After it is complete, click Cancel,
and then start this procedure over.
3. In the Select Tables and Associations dialog box, click the Tables tab, and then select the table(s) from
which you want to retrieve data.
4. Click the Associations tab, and then click Add
values, and then click Add.
to define a table association. Specify the following
l
Table 1 - Select one of the tables for which you want to create an association.
l
Column - Select the column name for which you want to create an association.
l
l
l
Table 2 - Select the table for which you want to create an association with Table 1. You must select two
different tables.
Column - Select the column name for which you want to create an association with the column you
specified from Table 1.
Join Type - Select the type of join you want to use to combine the rows from the tables.
5. If you want to add additional associations, repeat the previous step.
6. (Optional) After an association appears in the list, you can select an entry, and then click: the up and down
arrows to move it,
to edit it, or
to delete it.
7. When you are done adding associations, click Next.
PaperVision® Capture Administration Guide
406
Chapter 18 Business Rules
8. In the Select Columns dialog box, click Add
in the query, and then click Add.
,specify the following values to add columns to return
l
Table - Select the table you want to use.
l
Column - Select the column that you want to use in the matching process.
l
Field Name - Type the name you want to use for the field.
9. If you want to add more columns, repeat the previous step.
10. (Optional) After a column appears in the list, you can select it, and then click: the up and down arrows to
move it, or
to delete it.
11. When you are finished adding and arranging the columns, click Next.
12. In the Configure the Query Where Clause dialog box, you can create a where clause to filter fields. To
create your query, click Add Condition, and then specify the following values.
l
Table - Select the table you want to use.
l
Column - Select the column that contains the data you want to filter.
l
Parameter - Select the parameter you want to use, or type a value in the Static Value box.
l
Condition Operator - Select the condition operator you want to use to create your query.
l
Group Condition - Select this check box to group this condition with the preceding condition.
13. Click Apply to save the condition. If you want to add additional conditions, repeat the previous step.
14. You can edit conditions by clicking on the column, parameter, or operator values in each condition.
15. You can click the arrow to the upper-right of the condition to add a sub condition or delete a condition.
16. When you are finished configuring the where clause, click Next.
17. In the Test Query dialog box, if you want to test your query, type a parameter in the Value box, and then
click Execute.
18. Click Next, and then continue with the procedure for the business rule you are configuring.
PaperVision® Capture Administration Guide
407
Chapter 19 Capture Batches
In PaperVision Capture, a batch is a collection of documents and their associated index name-value pairs and
statistics that are moved as a logical unit of work through a job. In the Administration Console, you can
manage an entity's batches by assigning batch ownership and other properties. To manage capture batches,
open Entity > Company > Capture Batches, then you can access the Batch Management and Batch
Statistics screens.
File Menu
The File menu contains the Set Password and Exit commands.
l
l
File > Change Password allows you to change your current password for the PaperVision Capture
Administration Console.
File > Exit prompts you to save your current changes before logging you out of the system.
Help Menu
Help > Help Topics opens the Online Help file. Help > User's Manual opens a PDF of the PaperVision
Capture Administration Guide. Help >About PaperVision Capture Administration Console displays a
splash screen with the copyright and version information for your version of PaperVision Capture.
Batch Management
The Batch Management screen automatically tracks batches created in the PaperVision Capture Operator
Console and displays user and job data specific to each batch. If a batch is not owned, you can edit the Batch
Name, Batch Description, Date/Time, Administrative Priority, Job Step, Scheduled Destruction, and Retain
Statistics fields. If a batch is owned or awaiting automated processing, you can change its status to "Not
Owned" so you can edit these fields. Additionally, you can filter the batch list so you can quickly locate
batches that match your specified criteria. Batches that are not owned can be assigned to another job step
within the same job and can be scheduled for destruction according to your specified time and date.
Move the pointer over a row to view a tool-tip summary of the batch. You can also right-click on the
batch and select the appropriate operation from the context menu.
Batch Management Grid
PaperVision® Capture Administration Guide
408
Chapter 19 Capture Batches
Viewing the Properties of a Batch
To view the properties of a batch
1. Highlight the batch in the list.
2. Click the Properties
icon. The Batch Properties dialog box appears.
Batch Properties
3. To view a summary of each batch property, highlight the property in the grid, and a summary of the property
appears at the bottom left. Read-only fields appear with gray text; editable fields appear with black text.
l
l
Batch ID: Unique identifier of the batch in the database
Internal Name: Unique name assigned and used by the system to store batch-related files and
metadata
l
Name: Batch name assigned by the user (255 characters maximum)
l
Description: Description assigned by the user (255 characters maximum)
l
Date/Time: Date and time assigned by the user
NOTE: For information on date/time formats, see Index Types and Formats.
l
Status: Current status of the batch, including Owned, Unowned, In Transmission, or Automated
Processing
l
Owned: A user has assumed ownership of the batch in the Operator Console.
l
Not Owned: A user has not assumed ownership of the batch in the Operator Console.
PaperVision® Capture Administration Guide
409
Chapter 19 Capture Batches
l
l
In Transmission: The batch is moving from the temporary local batch repository to the master
batch repository.
Automated Processing: The PaperVision Capture Automation Service is currently processing the
batch.
l
Created: Date and time the batch was created
l
Last Update: Most recent date and time that batch record was updated in the database
l
Administrative Priority: Priority (ranging from 1 - 999,999) assigned by administrator for the batch (the
higher the number, the higher the priority)
l
Batch Path: The path in the master batch repository where the batch files reside
l
Job: Job name to which the batch is assigned
l
Job Description: Job description to which the batch is assigned
l
Step: Name of the job step in which the batch in currently processing or waiting
NOTE: To transition a batch to the end of the job (and skip all remaining steps), select the last
blank line from this drop-down list. As a result, no further processing of the batch will occur.
l
Step Start: Date and time when the batch entered the job step
l
Owned Date/Time: Date and time ownership of the batch was last taken
l
Owned By User: User who currently owns the batch
l
Owned By Workstation: Workstation where batch is currently owned
l
Deleted: Indicates whether the batch has been deleted
l
Scheduled Destruction: Date and time when the batch will be destroyed
NOTE: If the Batch Destruction Offset for the job step is scheduled for a different date/time than
the Scheduled Destruction assigned for the batch, the last date/time overrides the other.
l
Retain Statistics: Indicates whether to retain the batch statistics upon batch deletion
l
Size (Bytes): Indicates the total batch size in bytes, kilobytes, megabytes, or gigabytes
l
Document Count: Number of total documents contained in the batch
l
Page Count: Number of total pages contained in the batch
l
Image Count: Number of total images contained in the batch
4. Click OK when you are finished viewing and/or changing the properties.
PaperVision® Capture Administration Guide
410
Chapter 19 Capture Batches
Viewing the Batch History
You can view operations performed on a batch by viewing the batch's history.
To view a batch's history
1. Highlight the batch in the grid.
2. Click the History
icon.
Batch History
3. The history displays the entry's description, date, user, and workstation information for each created and
loaded batch and created and deleted document. To sort each column in ascending or descending order,
click the column header.
4. Click Close.
Filtering the Batch List
The Filter command allows you to search for Capture batches according to your specified criteria.
To filter the list of batches
1. Click the Filter
icon, and the Batch Filter dialog box appears.
PaperVision® Capture Administration Guide
411
Chapter 19 Capture Batches
Batch Filter
2. Enter the filter criteria to use in the search. Some criteria descriptions are found in the Properties section.
Additional criteria include:
l
User Date: Date range entered by the user
l
Created Date: Date range that the batches were created
l
Owned by User: Includes active and inactive users
l
Query Type: AND includes every specified criteria in the search; OR includes any of the specified
criteria in the search
l
Maximum Record Count: Maximum number of batch records to display per page of search results
l
Show Destroyed: If selected, includes destroyed batches in the search results
l
Scheduled Destruction: Date/time that the batches will be destroyed
To remove all the filter criteria, click the Clear All button.
3. Click OK to initiate the search, and the Batch Management grid refreshes with your search results.
NOTE: Your most recent Batch Filter settings are retained the next time you open the Batch
Management screen.
PaperVision® Capture Administration Guide
412
Chapter 19 Capture Batches
Setting the Destruction Date
You can assign the batch destruction date and whether to retain batch statistics for one or more batches.
Only batches marked as ”Not Owned” that have not been previously deleted can be scheduled for destruction.
Setting the batch destruction date does not directly delete a batch; rather, the PaperVision Capture
Automation Service deletes the batch. When a batch is deleted, the image files are removed from disk, but
the batch’s database record (and potentially the statistics) remain in the database. However, you can filter
deleted batches so they do not appear in the Batch Management grid.
To set the destruction date
1. Highlight one or more batches in the grid.
2. Click the Set Destruction Date
icon. The Batch Destruction dialog box appears.
Batch Destruction
3. From the Scheduled Destruction drop-down list, select the date and time, which default to the current
date and time.
4. Or, enter the date.
5. Select Retain Statistics to keep the batch statistics in the database after batch destruction.
6. Click OK.
Changing the Status to 'Not Owned'
You can change the status of one or more owned batches to the "Not Owned" status.
NOTE: If you change the batch status to "Not Owned" while an operator is working on a batch, the
operator's changes will be lost.
To change the batch status
1. Highlight the batch in the grid.
2. Click the Change Status to 'Not Owned'
icon.
3. Click Yes to update the selected batches.
4. Click OK to confirm the update.
PaperVision® Capture Administration Guide
413
Chapter 19 Capture Batches
Changing the Job Step
You can assign one or more batches to a different step within the same job. Multiple batches may only be
moved to another job step if (1) all of the selected batches are "Not Owned" and (2) all of the selected batches
are associated with the same job.
To change the job step
1. Highlight one or more batches in the grid.
2. Click the Change Job Step
icon. The Batch Job Step dialog box appears.
Batch Job Step
3. Select from the Target Step drop-down list.
NOTE: You can transition a batch to the end of the job (and skip all remaining steps) by selecting the
last blank line from this drop-down list. As a result, no further processing of the batch will occur.
4. Click OK.
WARNING: Manually moving a batch to another job step may result in a loss of batch images
and/or index data and should be used only as a last resort. Before proceeding, you may want to consult
with Digitech Systems' Technical Support.
Changing the Batch Path
You can change one or multiple batch paths (for unowned batches) simultaneously.
NOTE: This operation does not physically move batches; rather, the pointer in the database to the
batch’s location is updated.
PaperVision® Capture Administration Guide
414
Chapter 19 Capture Batches
To change the batch path
1. Highlight one or more batches in the grid.
2. Click the Change Batch Path
icon. The Batch Path dialog box appears.
Batch Path
3. Enter the new Batch Path or browse to the new location.
4. Click OK.
Exporting Batch Metadata
You can export the metadata for one or more batches to an XML file. The Export command does not export
documents, images, and associated index values.
To export a batch's metadata
1. Highlight the batch in the list.
2. Click the Export
icon.
3. Enter the File Name of the XML file in the Save As dialog box.
4. Click Save.
Batch Statistics
Batch statistics are updated as operators submit batches in the PaperVision Capture Operator Console and
as batches are processed by the PaperVision Capture Automation Server. You can view each set of
statistics per job, job step, operator, or batch. Totals for all jobs, job steps, operators, and batches are also
included for your reference. Additionally, you can print a representation of the statistics you have expanded in
the tree. To view the Batch Statistics screen, select Entities > Company > Capture Batches > Batch
Statistics.
PaperVision® Capture Administration Guide
415
Chapter 19 Capture Batches
Batch Statistics
The table below summarizes each statistic and displays the value for each STATISTICTYPE column in the
PVCAP_BATCHSTATISTIC database table.
Batch Statistic
Characters Saved
Database Statistic Type
PVCAP_CharactersSaved
This is the total number of characters the operator has entered upon
saving index values. This statistic only applies to the manual Capture
and Indexing steps.
Characters Saved (Automated Match and Merge)
PVCAP_CharactersSaved_AutoMM
This value is the total number of characters populated (upon index values
being saved) only via Match and Merge.
Characters Saved (Excluding Match and Merge)
PVCAP_CharactersSaved_NoMM
This is the total number of characters the operator has entered upon
saving index values. The value excludes characters populated via Match
and Merge.
Document Count
PVCAP_DocumentCount
This is the total number of documents contained in all batches.
Documents Deleted
PVCAP_DocumentsDeleted
This statistic is the total number of documents deleted in a manual
step.
PaperVision® Capture Administration Guide
416
Chapter 19 Capture Batches
Batch Statistic
Documents Marked
Database Statistic Type
PVCAP_DocumentsMarked
This value increments each time the operator completes any of the
following:
l
Copy Document
l
Insert Document Break
l
Mark New Document
NOTE: This value also increments each time a new document is
marked through the Automated Barcode job step, but does not
increment when a new document is marked through Custom Code
execution.
Documents OCRed - Full Text (Success)
This statistic provides a count of documents that have been
successfully OCRed (full-text).
Image Count
PVCAP_
DocumentsOCRedFullTextSuccess
PVCAP_ImageCount
This is the total number of images contained in all batches.
Index Verification Errors
PVCAP_IndexVerificationErrors
This number increments each time an error is found during the index
verification process.
Indexed Documents
PVCAP_IndexedDocuments
This statistic is the total number of documents indexed in a manual
step.
Indexed Documents (Match and Merge)
PVCAP_IndexedDocumentsMM
This statistic is the count of documents for which one or more index
values have been successfully populated via match and merge in a
manual step.
Indices Barcoded (Failed)
PVCAP_IndicesBarcodedFailed
This value increments each time a barcode does not successfully
populate an index field.
NOTE: This statistic does not include the number of auto document
breaks inserted with each barcode.
Indices Barcoded (Success)
PVCAP_IndicesBarcodedSuccess
This value increments each time a barcode successfully populates an
index field.
NOTE: This statistic does not include the number of auto document
breaks inserted with each barcode.
Indices OCRed (Failed)
PaperVision® Capture Administration Guide
PVCAP_IndicesOCRedFailed
417
Chapter 19 Capture Batches
Batch Statistic
Database Statistic Type
This value increments each time the OCR engine does not
successfully populate an index field.
Indices OCRed (Success)
PVCAP_IndicesOCRedSuccess
This value increments each time the OCR engine successfully
populates an index field.
Indices Saved
PVCAP_IndicesSaved
This value is the total number of populated indices saved by the
operator. This statistic only applies to the manual Capture and
Indexing steps.
NOTE: This statistic does not include blank index fields.
Indices Saved (Automated Match and Merge)
PVCAP_IndicesSaved_AutoMM
This value is the total number of populated indices saved and
increments only when indices are populated via Match and Merge.
Indices Saved (Excluding Match and Merge)
PVCAP_IndicesSaved_NoMM
This value is the total number of populated indices saved by the
operator. The value excludes indices populated via Match and Merge.
Nuance OCR Characters
PVCAP_OCREngineCharacters
This is the total number of characters detected by the OCR engine.
Nuance OCR Decomposition Time
This is the total amount of time the OCR engine spent on the image's
page-layout composition (i.e. auto-zoning).
Nuance OCR Full Recognition Time
This is the total amount of time the OCR engine spent on processing
the image, including the time spent processing the image through all
recognition modules and in the checking subsystem. Additionally,
this statistic includes the time spent to recognize the zones (writing
recognition result to recognition data file).
Nuance OCR Rejected Characters
This is the total number of characters the OCR engine failed to
recognize.
Nuance OCR Suspect Words
PVCAP_
OCREngineDecompositionTime
PVCAP_
OCREngineFullRecognitionTime
PVCAP_
OCREngineCharactersRejected
PVCAP_OCREngineWordsSuspect
This is the total number of suspect words that the OCR engine found
in the image. Suspect words must contain at least one character that
was not recognized during OCR processing.
Nuance OCR Words
PVCAP_OCREngineWords
This is the total number of words detected by the OCR engine.
PaperVision® Capture Administration Guide
418
Chapter 19 Capture Batches
Batch Statistic
Page Count
Database Statistic Type
PVCAP_PageCount
This is the total number of pages contained in all batches.
Pages Barcoded
PVCAP_PagesBarcoded
This statistic displays the count of pages from which one or more
barcodes are read in manual and automated steps.
Pages Barcoded as Document Breaks
This statistic displays the count of pages barcoded as document
break sheets in manual and automated steps.
Pages Barcoded for Indices
PVCAP_
PagesBarcodedDocumentBreaks
PVCAP_PagesBarcodedIndices
This statistic displays the count of pages barcoded to populate one or
more indices in manual and automated steps.
Pages Captured
PVCAP_PagesCaptured
This is the total number of pages captured per job, step, and operator.
The counter increments each time the operator imports a batch,
imports an image, scans an image into the batch, and extracts and
copies a region.
NOTE: This statistic only counts pages that are added to the batch.
However, this statistic does not include when the operator re-scans
an image (performs the Re-Scan Pages command).
Pages OCRed - Full Text (Success)
This statistic provides a count of pages that have been successfully
OCRed (full-text).
Pages Re-scanned
PVCAP_
PagesOCRedFullTextSuccess
PVCAP_PagesRescanned
This is the total number of pages the operator re-scans (performs the
Re-Scan Pages command).
Pages Scanned
PVCAP_PagesScanned
This statistic tracks the total number of pages scanned. The counter
increments each time a page is scanned, regardless of whether the
page is added to the batch.
NOTE:
Some scanned pages are not added to the batch because of blank
page deletion or because they are break pages that are deleted.
Step Start-Stop Duration
PVCAP_StepStartStop
This is the total amount of time that the operator worked on a job step
in the PaperVision Capture Operator Console.
Step Take-Submit Duration
PaperVision® Capture Administration Guide
PVCAP_StepTakeSubmit
419
Chapter 19 Capture Batches
Batch Statistic
Database Statistic Type
This is the total amount of time that elapsed since the operator
assumed ownership of the batch until the operator submitted the
batch.
QC Batch Statistics
QC batch statistics are recorded for Manual and Automated QC steps. The automated statistics are recorded
by the PaperVision Capture Automation Server when the Automated QC step is executed.
QC Batch Statistic
Tags Added - Batch Document Count
Database Statistic Type
PVCAP_QCTAG-BatchDocumentCountTags
This value is the total number of batch document count
tags added to the batch.
Tags Removed - Batch Document Count
This value is the total number of batch document count
tags removed from the batch.
Tags Added – Batch Index Sequence
PVCAP_QCTAGBatchDocumentCountTagsRemoved
PVCAP_QCTAG-BatchIndexSequenceTags
This value is the total number of batch index sequence
tags added to the batch.
Tags Removed – Batch Index Sequence
This value is the total number of batch index sequence
tags removed from the batch.
Tags Added – Document Page Count
PVCAP_QCTAGBatchIndexSequenceTagsRemoved
PVCAP_QCTAG-DocumentPageCountTags
This value is the total number of document page count
tags added to the batch.
Tags Removed – Document Page Count
This value is the total number of document page count
tags removed from the batch.
Tags Added – Document Re-Scan
PVCAP_QCTAGDocumentPageCountTagsRemoved
PVCAP_QCTAG-DocumentRescanTags
This value is the total number of document re-scan tags
added to the batch.
Tags Removed – Document Re-Scan
This value is the total number of document re-scan tags
removed from the batch.
Tags Added - Documents
PVCAP_QCTAGDocumentRescanTagsRemoved
PVCAP_QCTAG-DocumentsTagged
This value is the total number of document tags added to
the batch.
PaperVision® Capture Administration Guide
420
Chapter 19 Capture Batches
QC Batch Statistic
Tags Removed - Documents
Database Statistic Type
PVCAP_QCTAG-DocumentTagsRemoved
This value is the total number of document tags removed
from the batch.
Tags Added – Index Errors
PVCAP_QCTAG-IndexErrorTags
This value is the total number of index error tags added to
the batch.
Tags Removed – Index Errors
PVCAP_QCTAG-IndexErrorTagsRemoved
This value is the total number of index error tags removed
from the batch.
Tags Added – Index Re-Index
PVCAP_QCTAG-IndexReindexTags
This value is the total number of index (re-index) tags
added to the batch.
Tags Removed – Index Re-Index
PVCAP_QCTAG-IndexReindexTagsRemoved
This value is the total number of index (re-index) tags
removed from the batch.
Tags Added – Index Values
PVCAP_QCTAG-IndexValuesTagged
This value is the total number of index value tags added to
the batch.
Tags Removed – Index Values
PVCAP_QCTAG-IndexValueTagsRemoved
This value is the total number of index value tags
removed from the batch.
Tags Added – Page Bad Image Path
PVCAP_QCTAG-PageBadImagePathTags
This value is the total number of page (bad image path)
tags added to the batch.
Tags Removed – Page Bad Image Path
This value is the total number of page (bad image path)
tags removed from the batch.
Tags Added – Page Image Bad
PVCAP_QCTAGPageBadImagePathTagsRemoved
PVCAP_QCTAG-PageImageBadTags
This value is the total number of page (image bad) tags
added to the batch.
Tags Removed – Page Image Bad
PVCAP_QCTAG-PageImageBadTagsRemoved
This value is the total number of page (image bad) tags
removed from the batch.
Tags Added – Page Image Dimensions
PVCAP_QCTAG-PageImageDimensionsTags
This value is the total number of page (image dimensions)
tags added to the batch.
PaperVision® Capture Administration Guide
421
Chapter 19 Capture Batches
QC Batch Statistic
Tags Removed – Page Image Dimensions
This value is the total number of page (image
dimensions) tags removed from the batch.
Tags Added – Page Image File Size
Database Statistic Type
PVCAP_QCTAGPageImageDimensionsTagsRemoved
PVCAP_QCTAG-PageImageFileSizeTags
This value is the total number of page (image file size)
tags added to the batch.
Tags Removed – Page Image File Size
This value is the total number of page (image file size)
tags removed from the batch.
Tags Added – Page Re-Scan
PVCAP_QCTAGPageImageFileSizeTagsRemoved
PVCAP_QCTAG-PageRescanTags
This value is the total number of page re-scan tags added
to the batch.
Tags Removed – Page Re-Scan
PVCAP_QCTAG-PageRescanTagsRemoved
This value is the total number of page re-scan tags
removed from the batch
Tags Added – Pages
PVCAP_QCTAG-PagesTagged
This value is the total number of page tags added to the
batch.
Tags Removed – Pages
PVCAP_QCTAG-PageTagsRemoved
This value is the total number of page tags removed from
the batch.
Tags Added – Total
PVCAP_QCTAG-TotalTags
This value is the total number of QC tags added to the
batch.
Tags Removed – Total
PVCAP_QCTAG-TotalTagsRemoved
This value is the total number of QC tags removed from
the batch.
Printing Batch Statistics
You can print a representation of the statistics you have expanded in the Batch Statistics tree.
To print batch statistics
1. Click the Print
icon.
2. Select the printing parameters, and click OK.
PaperVision® Capture Administration Guide
422
Chapter 19 Capture Batches
Filtering Batch Statistics
The Filter command allows you to search for statistics according to your specified criteria.
To filter the list of batch statistics
1. Click the Filter
icon, and the Statistics Filter dialog box appears.
Statistic Filter
2. Enter the applicable filter criteria to use in the search:
l
Batch ID: Unique identifier of the batch in the database
l
Statistic: Select the statistic to search for
l
Batch Created: Date range that the batches were created
l
Job:Job name to which the batch is assigned
l
Step: Name of the job step in which the batch is currently processing or waiting
l
Step Start: Date and time when the batch entered its current job step
NOTE: This is a batch-level filter, so for any batches that fulfill this criterion, all unfiltered statistics
for those batches will be displayed.
l
l
l
Operator: Includes active and inactive users; also includes the PaperVision Capture Automation
Service
Include Deleted Batch Document, Page, and Image Counts: Includes deleted documents, pages,
and images in the batch count statistics
Query Type: AND includes every specified criteria in the search; OR includes any of the specified
criteria in the search
PaperVision® Capture Administration Guide
423
Chapter 19 Capture Batches
To remove all the filter criteria, click the Clear All button.
3. Click OK to initiate the search. The Batch Statistics grid refreshes with your search results.
NOTE: Your most recent Statistic Filter settings are retained the next time you open the Batch
Statistics screen.
Exporting Batch Statistics
You can export all batch statistics to an XML file.
To export all batch statistics
1. Click the Export
icon.
2. Enter the File Name of the XML file in the Save As dialog box.
3. Click Save.
PaperVision® Capture Administration Guide
424
Appendix A Additional Help Resources
At Digitech Systems, we provide multiple resources to help find answers to your questions.
Technical Support
Direct: 402.484.7777
Toll-free: 877.374.3569
Email:support@digitechsystems.com
Help on the Web
MyDSI is an interactive tool for all Digitech Systems customers. Log in to MyDSI to download product
updates, license purchased software, view support contract renewals, and check the status of your software
support cases and requests.
User Forums
Join the Digitech Systems user forums to exchange answers and ideas with other users in our moderated
community.
Knowledge Base
Search our extensive database for articles on all Digitech Systems products. This feature is accessible to
everyone.
Contacting Digitech Systems
At Digitech Systems, we provide software and services that give companies the ability to retrieve any
document, anywhere, anytime.
We strive to provide you the quality service and support you deserve!
Legendary Customer Support
If you have questions while using our products, please contact our customer support service. Our customer
support is legendary. We continually enjoy a 98% satisfaction rating from our resellers and end users, and
more than 85% of issues are resolved on the same day.
Direct: 402.484.7777
Toll-free: 877.374.3569
E-mail: support@digitechsystems.com
Web site: www.digitechsystems.com
Office hours: Monday through Friday, 8 a.m. to 6 p.m. Central Time
PaperVision® Capture Administration Guide
425
Appendix A Additional Help Resources
If you are an ImageSilo reseller and have account-related questions or issues, contact our ImageSilo
administrators at SiloAdmin@digitechsystems.com.
Log in to ImageSilo at: https://login.imagesilo.com
Corporate Headquarters
Digitech Systems is headquartered in Greenwood Village, Colorado.
Address:
Digitech Systems
8400 E. Crescent Parkway, Suite 500
Greenwood Village, CO 80111 USA
Telephone: 866.374.3569
Outside the U.S.: +1 303.493.6900
Fax: 866.245.3569 or 303.493.6979
Web Site: www.digitechsystems.com
Office hours: Monday through Friday, 8 a.m. to 5 p.m. Mountain Time
Additional Resources
Contact our Sales team at sales@digitechsystems.com to discuss the latest products and services from
Digitech Systems with one of our dedicated Client Development Managers (CDMs).
Community Support
We enjoy a strong customer community at Digitech Systems. Turn to any one of our additional services for
support:
My DSI
MyDSI is an interactive tool that provides access to product downloads, account information, support cases,
enhancement requests, and maintenance contract status. In addition, use it to view company
announcements and technical articles. If you are a Digitech Systems reseller, download the sales and
marketing tools and use them in your business.
Access the MyDSI web site at http://mydsi.digitechsystems.com/Login.asp. It is required to create an
account.
To create an account, send an e-mail message to webaccounts@digitechsystems.com with the following
information:
Contact name
E-mail address
Company name
Address
PaperVision® Capture Administration Guide
426
Appendix A Additional Help Resources
Phone number
If applicable, provide the name of your Digitech Systems reseller.
Digitech Systems Knowledge Base
Search the Digitech Systems Knowledge Base of technical support articles at:
http://kb.digitechsystems.com
The Knowledge Base is available to all current Digitech Systems customers and resellers.
The Forums at Digitech Systems, Inc.
Log in to The Forums at Digitech Systems at http://forums.digitechsystems.com to network with other
resellers and users of our products. To join The Forums, you must register online for an account.
PaperVision Certification Testing
Only PaperVision Enterprise certified resellers can sell PaperVision Enterprise. To take the certification test,
register for the test or attend a Digitech Systems-hosted technical training class. Contact your Client
Development Manager (CDM) for more information.
The PaperVision Enterprise certification test is hosted at http://mydsi.digitechsystems.com/CertLogin.ASP.
You must have a certification code to log in.
Customer Feedback
Do you have ideas for enhancements or improvements? The product features and enhancements at Digitech
Systems are driven by you, our customers.
Suggest new features and improvements
If you have an idea for a future product feature or enhancement, please e-mail your idea and your contact
information to our development team at dev@digitechsystems.com.
PaperVision® Capture Administration Guide
427
Appendix B Nuance OCR Spelling Languages
The following Nuance OCR spelling languages are supported in PaperVision Capture:
Supported Nuance OCR Spelling Languages
Afrikaans - spoken in South Africa
Albanian
Automatic language selection for spell-checking only
Aymara - spoken in Bolivia and Peru
Basque
Byelorussian (Cyrillic) - includes the characters of the English language; other spellings are Belarusian and Whire
Russian
Bemba - alternate names are Chibemba, Ichibemba, Wemba, Chiwemba; spoken in Zambia and Democratic
Republic of Congo
Blackfoot - alternate name is Blackfeet, Siksika and Pikanii; spoken in Canada and USA
Portuguese (Brazilian)
Breton
Bugotu - spoken in Solomon Islands
Bulgarian (Cyrillic) - includes the characters of the English language
Catalan
Chamorro - spoken in Guam and Northern Mariana Islands
Chechen
Chuana or Tswana - spoken in Botswana and South Africa
Corsican
Croatian
Crow - spoken in USA
Danish
Dutch
English
Eskimo
Esperanto
Estonian
Faroese
Fijian
French
PaperVision® Capture Administration Guide
428
Appendix B Nuance OCR Spelling Languages
Supported Nuance OCR Spelling Languages
Frisian - macrolanguage of three Frisian languages in Germany
Friulian - spoken in Italy
Galician (alternate names Gallegan and Gallego) - spoken in Spain and Portugal
Ganda or Luganda - spoken in Uganda
German
Gaelic Irish
Gaelic Scottish
Greek - includes the characters of the English language
Guarani (macrolanguage of the Chiripa and some Guarani languages) - spoken in Paraguay, Argentina, Bolivia,
and Brazil
Hani (alternate names are Hanhi, Haw and Hani Proper) - spoken in China, Laos, and Vietnam
Hawaiian
Hungarian
Icelandic
Ido - constructed language
Finnish
Indonesian
Interlingua - constructed language
Italian
Kabardian (alternate name is Beslenei) - spoken in Russia and Turkey
Kashubian - spoken in Poland
Kawa (alternate names are Wa, Va, Vo, Wa Pwo, and Wakut) - spoken in China
Kikuyu - spoken in Kenya
Kongo (macrolanguage of Laari and Kongo languages) - spoken in the Democratic Republic of the Congo,
Angola, and Congo
Kpelle (macrolanguage of Kpelle languages) - spoken in Liberia and Guinea
Kurdish (if written in the Latin alphabet) - macrolanguage of the Kurdish languages
Latvian
Lithuanian
Latin
Luba (alternate names are Luba-Lulua, Luba-Kasai, Tshiluba, Luva, and Western Luba) - spoken in the
Democratic Republic of the Congo
Luxembourgian (alternate names are Luxembourgeois and Letzburgish) - spoken in Luxembourg
Macedonian (Cyrillic) - includes the characters of the English language
Maltese
PaperVision® Capture Administration Guide
429
Appendix B Nuance OCR Spelling Languages
Supported Nuance OCR Spelling Languages
Maori - spoken in New Zealand
Mayan
Miao (macrolanguage of Hmong languages and alternate name is Hmong) - spoken in China, Laos, Thailand,
Myanmar, and Viet Nam
Minankabaw
Malagasy (macrolanguage of Malagasy languages) - spoken in Madagascar
Malinke (alternate names are Western Maninkakan, Malinka, and Maninga) spoken in Senegal, Gambia, and Mali
Malay
Mohawk - spoken in Canada and USA
Moldavian (Cyrillic) - includes the characters of the English language
Nahuatl
No language selection (for spell checking only) - this value can be used to specify that the checking module will
not use the Language dictionary
Norwegian
Nyanja (alternate names are Chichewa and Chinyanja) - spoken in Malawi, Mozambique, Zambia, and Zimbabwe
Occidental - constructed language
Ojibway (macrolanguage of Ojibwa, Chippewa and Ottawa languages and alternate names are Ojibwa and
Ojibwe) - spoken in Canada and USA
Papiamento - spoken in Netherlands Antilles, Aruba
Pidgin English (alternate names are Tok Pisin, Naomalanesian, and New Guinean Pidgin English) - spoken in
Papua New Guinea
Polish
Portuguese
Provencal (alternate name is Occitan) - spoken in France, Italy, and Monaco
Quechua (macrolanguage of the Quechua languages) - spoken in Peru
Rhaetic (alternate names are Romansch and Rhaeto-Romance) - spoken in Switzerland
Romanian
Romany - spoken all over Europe
Ruanda (alternate names are Kinyarwanda and Rwanda) - spoken in Rwanda, the Democratic Republic of
Congo, and Uganda
Rundi - spoken in Burundi and Uganda
Russian (Cyrillic) - includes the characters of the English language
Samoan - spoken in Samoa and American Samoa
PaperVision® Capture Administration Guide
430
Appendix B Nuance OCR Spelling Languages
Supported Nuance OCR Spelling Languages
Sardinian - macrolanguage of the Sardinian languages
Shona - spoken in Zimbabwe, Botswana, and Zambia
Sioux (alternate name is Dakota) - spoken in USA and Canada
Slovak
Slovenian
Sami - combination of the Sami language family
Lule Sami
Northern Sami
Southern Sami
Somali
Sotho, Suto, or Sesuto language selection - spoken in Lesotho and South Africa
Spanish
Serbian (Cyrillic)
Serbian (Latin)
Sundanese (alternate names are Sunda and Priangan) - spoken in Java and Bali in Indonesia
Swahili (macrolanguage of the Swahili languages) - spoken in the Democratic Republic of the Congo, Tanzania,
Kenya, and Somalia
Swedish
Swazi (alternate names are Swati, Siswati, and Tekela) - spoken in Swaziland, Lesotho, Mozambique, and
South Africa
Tagalog - spoken in Philippines
Tahitian
Tinpo
Tongan (alternate names are Tonga, Siska and Nyasa) - spoken in Malawi
Tun (alternate names are Tunia and Tunya) - spoken in Chad
Turkish
Ukrainian (Cyrillic) - includes the characters of the English language
Visayan consists of Cebuano, Hiligaynon, and Samaran or Waray-waray languages - spoken in the Philippines
Welsh
Wend or Sorbian
Wolof - spoken in Senegal and Mauritania
Xhosa - spoken in South Africa and Lesotho
Zapotec (macrolanguage of the Zapotec languages) - spoken in Mexico
Zulu - spoken in South Africa, Lesotho, Malawi, Mozambique, and Swaziland
PaperVision® Capture Administration Guide
431
Appendix C Modifying the Process Batch Operation
By default, an Automation Service that is scheduled to perform the Process Batch operation will execute
every function associated with this operation, such as custom code, image processing, and OCR. These
functions are listed in the DSI.PVECommon.PVProcWork.exe.config file under the
batchConfiguration/batchProcessors element. You can, however, configure an Automation Service to
perform a subset of these functions. For example, full-text OCR can be resource-intensive and timeconsuming, so you could dedicate an Automation Service to full-text OCR to ensure that the throughput of
your non-full-text OCR batches is not adversely affected.
To configure one or more Automation Services to process full-text OCR:
1. Install one or more new Automation Services on dedicated machines with sufficient resources to perform
the full-text OCR.
2. In the DSI.PVECommon.PVProcWork.exe.config file for each of the new services, modify the batch
configuration section such that all batch processing functions except Nuance Full-Text OCR are excluded:
<batchConfiguration isLocal="true">
<batchProcessors>
<add jobStepType="AutomatedOCRFullText"
assembly="DSI.Capture.Business.dll"
batchProcessorClass="DSI.Capture.Business.OCRFullTextManager"/>
</batchProcessors>
<excludedBatchProcessors>
<add jobStepType="CustomCode"
assembly="DSI.Capture.ScriptingLibrary.dll"
batchProcessorClass="DSI.Capture.ScriptingLibrary.BatchProcessor"/>
<add jobStepType="AutomatedBarcode"
assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.BarcodeManager"/>
<add jobStepType="ImageProcessing"
assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.ImgProcessingManager"/>
<add jobStepType="AutomatedOCR"assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.OCRManager"/>
</excludedBatchProcessors>
</batchConfiguration>
3. For any Automation Services that should not be executing full-text OCR (i.e., the existing services),
change the DSI.PVECommon.PVProcWork.exe.config file such that only full-text OCR is excluded:
<batchProcessors>
<add jobStepType="CustomCode" assembly="DSI.Capture.ScriptingLibrary.dll" batchProcessorClass="DSI.Capture.ScriptingLibrary.BatchProcessor"/>
<add jobStepType="AutomatedBarcode"
assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.BarcodeManager"/>
<add jobStepType="ImageProcessing"
PaperVision® Capture Administration Guide
432
Appendix C Modifying the Process Batch Operation
assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.ImgProcessingManager"/>
<add jobStepType="AutomatedOCR"assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.OCRManager"/>
</batchProcessors>
<excludedBatchProcessors>
<add jobStepType="AutomatedOCRFullText"
assembly="DSI.Capture.Business.dll" batchProcessorClass="DSI.Capture.Business.OCRFullTextManager"/>
</excludedBatchProcessors>
</batchConfiguration>
4. In the Administration Console, schedule the new Automation Services to perform the Process Batch
operation.
PaperVision® Capture Administration Guide
433
Appendix D Maximum Image Sizes
This appendix outlines the approximate limits in image sizes into PaperVision Capture and processed through
the Nuance and Open Text Full-Text OCR, Zonal OCR, and Image Processing steps. The Thumbnails
windows, found in both the Administration and Operator Consoles, can handle substantially larger images.
Additionally, images only stored in memory or simply ingested by PaperVision Capture (therefore not viewed
in Thumbnails windows or processed through the Nuance or Open Text Full-Text OCR, Zonal OCR, or Image
Processing steps), can also be significantly larger in size.
DISCLAIMER – PLEASE READ: These dimensions are provided only as estimates to identify size
limits in importing, viewing, and processing images in PaperVision Capture. Variations in technical
environments may cause maximum image sizes to fluctuate across systems.
Maximum Image Sizes (in Pixels)
Stored Images
Thumbnails
Image Processing
Nuance Full-Text OCR and
Zonal OCR
10,000 x 10,000*
* These dimensions can be greater in bitonal images
32,768 x 32,768
10,000 x 10,000*
* These dimensions can be greater in bitonal images
8400 x 8400
32,000 x 24,000*
Open Text Full-Text OCR and
Zonal OCR
PaperVision® Capture Administration Guide
* The maximum supported image dimensions that
can be processed through the Open Text engine
vary with resolution. For example, the maximum
supported image dimensions at 300 dpi are
approximately 106 inches x 80 inches. Images that
are processed through the Open Text OCR engine
must contain matching horizontal and vertical
resolutions.
434
Appendix E Terminal Services Configuration
The PaperVision Capture Operator Console can be configured to support a Terminal Services environment,
enabling multiple operators to remotely log into a single workstation to complete tasks. This appendix
describes how to configure PaperVision Capture so multiple users can log into a single installation of the
Operator Console
In a terminal services configuration, the first operator who logs into the Operator Console and creates or
opens a batch consumes one or more concurrent licenses, depending on the batch's job configuration.
Subsequent operators who log into that same installation of the Operator Console also consume concurrent
licenses. If no remaining concurrent licenses are available, the operator will not be able to log into the
Operator Console. For more information on concurrent licensing, see the section on Licensing inChapter 2 Global Administration.
To configure the PaperVision Capture Operator Console to support a Terminal Services
environment
1. Open the C:\Documents and Settings\All Users\Application Data\Digitech Systems directory (or
other directory as specified during the installation of PaperVision Capture).
2. Open the ClientSettings.xml file.
3. Change the value of the following variable from "false" to "true":
4. "<ALLOWMULTIPLEOPERATORCONSOLES>true</ALLOWMULTIPLEOPERATORCONSOLES>
5. Save the file.
WARNING! Improperly modifying the contents of a PaperVision Capture configuration file may adversely
impact system performance and the overall functionality of PaperVision Capture.
PaperVision® Capture Administration Guide
435
Appendix F Open Text Countries and Languages
The table in this appendix displays the supported Open Text countries, languages, country groups, language
groups, and character sets available in PaperVision Capture. If you narrow the search for specific languages
or countries, the Open Text OCR engine will process more rapidly during OCR recognition.
Each language, country, language group, country group, and character set is compatible with specific code
pages. When you select from the Country/Language property, you can only select combinations of countries,
languages, etc. within the same code page or code page group (i.e., Latin). For example, a valid Latin
combination can be Poland, Hungary, and Germany. A valid Cyrillic combination can be Bulgaria and Russia.
A valid Greek combination can be Greek and OCR.
1. Cyrillic: Code page 1251
2. Greek: Code page 1253
3. Latin: Code pages 1250, 1252, 1254 and 1257 (i.e. Central Europe, Western Europe, Turkey, Baltic)
4. Azerbaijan
Note: Code page 0 (OCR) can be added to any combination above.
Supported Open Text Countries and Languages
Code Page
Australia
1252
Austria
1252
Azerbaijan
1254
Baltic
1257
Belgium
1252
Brazil
1252
Bulgaria
1251
Canada
1252
Central America
1252
Central Europe
1250
Croatia
1250
Cyrillic
1251
Czech
1250
Denmark
1252
Estonia
1257
Finland
1252
PaperVision® Capture Administration Guide
436
Appendix F Open Text Countries and Languages
Supported Open Text Countries and Languages
Code Page
France
1252
Germany
1252
Great Britain
1252
Greece
1253
Hungary
1250
Ireland
1252
Italy
1252
Liechtenstein
1252
Lithuania
1257
Luxembourg
1252
Netherlands
1252
New Zealand
1252
Norway
1252
Poland
1250
Portugal
1252
Romania
1250
Russia
1251
Scandinavia
1252
Slovakia
1250
Slovenia
1250
South Africa
1252
South America
1252
South America Spanish
1252
Spain
1252
Sweden
1252
Switzerland
1252
Turkey
1254
USA
1252
Western Europe
1252
OCR
0
Afrikaans
1252
PaperVision® Capture Administration Guide
437
Appendix F Open Text Countries and Languages
Supported Open Text Countries and Languages
Code Page
Albanian
1250
Azerbaijani Latin
1254
Basque
1252
Bosnian Latin
1250
Bulgarian
1251
Catalan
1252
Croatian
1250
Czech Language
1250
Danish
1252
Dutch
1252
English
1252
Estonian
1257
Faroese
1252
Finnish
1252
French
1252
Frisian
1252
German
1252
Greek
1253
Guarani
1252
Hani
1252
Hungarian
1250
Icelandic
1252
Indonesian
1252
Irish
1252
Italian
1252
Kirundi
1252
Latin
1252
Latvian
1257
Lithuanian
1257
Luxembourgish
1252
Malay
1252
PaperVision® Capture Administration Guide
438
Appendix F Open Text Countries and Languages
Supported Open Text Countries and Languages
Code Page
Norwegian
1252
Polish
1250
Portuguese
1252
Quechua
1252
Rhaeto-Romanic
1252
Romanian
1250
Russian
1251
Rwanda
1252
Serbian Latin
1250
Shona
1252
Slovak
1250
Slovenian
1250
Somali
1252
Sorbian
1250
Spanish
1252
Swahili
1252
Swedish
1252
Turkish
1254
Wolof
1252
Xhosa
1252
Zulu
1252
PaperVision® Capture Administration Guide
439
Appendix G Digitech Logging Utility
For all Digitech Systems software, logging settings are stored in the application’s configuration file. If the
executable for your application were named DigitechApp.exe, then the name of the corresponding
configuration file would be DigitechApp.exe.config and it would be located in the same directory as the
executable. Configuration files are XML files that you can open and edit using any text editor, including
Windows Notepad.
For your convenience, the Digitech Logging Utility is provided so that you can easily change some logging
aspects without having to manually edit the XML configuration files. You can use the logging utility to modify
configuration files for any Digitech Systems Inc. product. The Digitech Logging Utility lets you specify:
l
l
l
l
which configuration file to modify,
the level of detail for the logs,
where the log information is sent and/or stored,
and the appearance of the content.
Configuring the Digitech Logging Utility
1. Click Start, and then click All Programs.
2. Click the Digitech Systems folder, and then click PaperVision Logging Utility. The Digitech Logging
Utility dialog box appears.
Digitech Logging Utility
3. Next to the Configuration File box, click the ellipsis button
to open the Select Config File dialog box.
4. Click the Look in list to locate the directory where the configuration files are stored (typically, Program
Files\Digitech Systems\PaperVision Capture).
5. Click the configuration file for which you want to specify logging properties.
6. Click the Trace Level list, and then select one of the following options:
PaperVision® Capture Administration Guide
440
Appendix G Digitech Logging Utility
l
Error - This option specifies that recoverable errors are logged.
l
Warning - This option specifies that non-critical problems are logged.
l
Information - This option specifies that informational messages are logged.
l
Verbose - This option specifies that debugging trace information is logged.
l
All - This option specifies that all possible logging information is included.
NOTE: If you select Verbose or All, you are requesting the most possible output. These settings can
generate a significant amount of output that requires extra processing which could slow application
performance.
7. Go to "Configuring Listeners " on page 441 to specify where you want the log information to be sent and/or
stored.
Configuring Listeners
After you complete the procedure under "Configuring the Digitech Logging Utility" on page 440 you must specify
where you want the log information to be sent and/or stored. You do this in the Listeners area in the Digitech
Logging Utility dialog box.
Listeners Area
The following procedures describe how to configure each type of listener.
Configuring the Windows Event Log Listener
When you select Windows Event Log, it specifies that you want the log output to appear in the Windows Event
Viewer. To access the Event Viewer, type eventvwr at the command prompt, and then press Enter.
1. If you haven’t already done so, complete the procedure under "Configuring the Digitech Logging Utility" on
page 440.
2. In the Listeners area, select Windows Event Log, and then click
.
3. In the Source Name box, type the name that you want to appear in the Source column of the Event
Viewer, and then click OK.
4. If you are done configuring listeners, click OK. Otherwise, go to the instructions for the next listener you
want to configure.
PaperVision® Capture Administration Guide
441
Appendix G Digitech Logging Utility
Configuring the Email Listener
When you select Email, it specifies that you want the log output to be sent to an email address.
1. If you haven’t already done so, complete the procedure under "Configuring the Digitech Logging Utility" on
page 440.
2. In the Listeners area, select Email, and then click
. The Email Settings dialog box appears.
3. In the SMTP Server box, type the IP address or server name for the SMTP server.
4. In the SMTP Port box, type the applicable port number.
5. In the To Address box, type the email address to which the log information will go.
6. In the From Address box, type the email address from which the log information will be sent.
7. In the Subject Line Starter (Optional) box, type the subject you want to appear in the email notifications.
8. Click OK.
9. If you are done configuring listeners, click OK. Otherwise, go to the instructions for the next listener you
want to configure.
Configuring the File Listener
When you select File, it specifies that you want the log output to go to a file. So that you can easily locate the log
file, you can specify the path and file name. The path you specify can contain any environment variable. The
variable is resolved when the log is output. (To see a list of all the environment variables configured on your
machine, go to a command prompt, type set, and then press Enter.)
If you specify a directory with the file name, and that directory does not exist, the logging utility will create it. For
example, if you type C:\temp\logs\dsi.log for the file name, the logging utility will create the C:\temp\logs
directory if it does not already exist. When using the File option, log entries will continue to accumulate in the
same file, and this can cause the file to become very large. If you do not want to manually maintain this file,
consider using the Rolling File option described under "Configuring the Rolling File Listener" on page 443.
1. If you haven’t already done so, complete the procedure under "Configuring the Digitech Logging Utility" on
page 440.
2. In the Listeners area, select File, and then click
. The File Settings dialog box appears.
3. In the File Name box, type the path and/or file name for the log. If you do not include a path, the file you
specify will be written to the Digitech installation directory.
4. (Optional) In the Header box, type a header that you want to appear at the beginning of each log entry in the
file.
5. (Optional) In the Footer box, type a footer that you want to appear at the end of each log entry in the file.
6. Click OK.
7. If you are done configuring listeners, click OK. Otherwise, go to the instructions for the next listener you
want to configure.
PaperVision® Capture Administration Guide
442
Appendix G Digitech Logging Utility
Usage Example:
If your entries looked like this,
File Settings
then the log entry would appear similar to the following.
---------------------Start PaperVision Capture Client Entry---------------------Timestamp: 4/7/2014 5:22:02 PM
Message: Accepting connections...
Category: Operational
EventId: 100
Severity: Information
Machine: BOBR-NOTEBOOK
OS Version: Microsoft Windows NT 6.1.7601 Service Pack 1
Application Domain: CaptureClient.exe
Process Id: 10932
Process Name: C:\dsi\bin\CaptureClient.exe
Win32 Thread Id: 5332
Thread Name:
---------------------End PaperVision Capture Client Entry------------------------
Configuring the Rolling File Listener
When you select Rolling File, it specifies that you want the log output to go to a file, but to prevent the file from
becoming too large, a new file is created based on the current date and time and the time interval that you specify.
By default, the contents of a rolling file are abbreviated in comparison to the contents of a regular file.
So that you can easily locate the log file, you can specify the path and file name. The path you specify can contain
any environment variable. The variable is resolved when the log is output. (To see a list of all the environment
variables configured on your machine, go to a command prompt, type set, and then press Enter.) Additionally, if
you specify a directory with the file name, and that directory does not exist, the logging utility will create it. For
PaperVision® Capture Administration Guide
443
Appendix G Digitech Logging Utility
example, if you type C:\temp\logs\dsi.log for the file name, the logging utility will create the C:\temp\logs
directory if it does not already exist.
The Rolling File option also supports the use of macros that are expanded when the file is written. Supported
macros include: {machineName}, {processID}, {timeStamp}, or {webSiteID}.
1. If you haven’t already done so, complete the procedure under "Configuring the Digitech Logging Utility" on
page 440.
2. In the Listeners area, select Rolling File, and then click
appears.
. The Rolling File Settings dialog box
3. In the File Name box, type the path and/or file name for the log. If you do not include a path, the file you
specify will be written to the Digitech installation directory. You can also include macros in this box.
NOTE: To specify a time interval for the creation of a new file, you must include the {timeStamp} macro
in the File Name box.
4. In the Time Stamp Format box,specify the time interval for the creation of a new file. Use the
yyyyMMddHH format, where yyyy equals the year, MM equals the month, dd equals the numerical day,
and HH equals the hour in a 24-hour format. You control the interval by using only the portion of the format
you want. For example, to create a new rolling log file on a daily basis, you would type yyyyMMdd. If you
wanted a new rolling log file to be created on an hourly basis, you would type yyyyMMddHH.
5. Click OK.
6. If you are done configuring listeners, click OK. Otherwise, go to the instructions for the next listener you
want to configure.
Usage Example:
If your entries looked like this,
File Settings
the output file would be CaptureClientApplication_BOBR-NOTEBOOK_20140407.log.
PaperVision® Capture Administration Guide
444
Appendix G Digitech Logging Utility
The contents of a rolling file are abbreviated similar to the following sample.
2014-04-07 17:31:23 Accepting connections...
2014-04-07 17:31:23 New incoming connection: 127.0.0.1
2014-04-07 17:31:23 Inside TryCheckSystemVersion_7_062413_
2014-04-07 17:31:23 Login user successfully with user name U and entityId 1
2014-04-07 17:31:23 Command from 127.0.0.1 processed successfully
2014-04-07 17:31:23 Client 127.0.0.1 disconnected (gracefully)
2014-04-07 17:31:23 Enabling manual .NET garbage collection every 300 seconds
2014-04-07 17:31:23 New incoming connection: 127.0.0.1
2014-04-07 17:31:23 Command from 127.0.0.1 processed successfully
2014-04-07 17:31:23 Client 127.0.0.1 disconnected (gracefully)
2014-04-07 17:31:23 New incoming connection: 127.0.0.1
2014-04-07 17:31:23 Command from 127.0.0.1 processed successfully
2014-04-07 17:31:23 Client 127.0.0.1 disconnected (gracefully)
2014-04-07 17:31:25 New incoming connection: 127.0.0.1
2014-04-07 17:31:26 Saved User ID 2 preferences
2014-04-07 17:31:26 Command from 127.0.0.1 processed successfully
2014-04-07 17:31:26 Client 127.0.0.1 disconnected (gracefully)
2014-04-07 17:31:26 New incoming connection: 127.0.0.1
2014-04-07 17:31:26 Command from 127.0.0.1 processed successfully
2014-04-07 17:31:26 Client 127.0.0.1 disconnected (gracefully)
PaperVision® Capture Administration Guide
445
Download PDF

advertising