Application Development Using IBM Datacap Taskmaster Capture

Application Development Using IBM Datacap Taskmaster Capture
IBM Datacap Taskmaster Capture
Version 8.0.1
Application Development using IBM
Datacap Taskmaster Capture
SC19-3251-00
IBM Datacap Taskmaster Capture
Version 8.0.1
Application Development using IBM
Datacap Taskmaster Capture
SC19-3251-00
Note
Before using this information and the product it supports, read the information in “Notices” on page 1.
This edition applies to version 8, release 0, modification 1 of IBM Datacap Taskmaster (product number 5725-C15)
and to all subsequent releases and modifications until otherwise indicated in new editions.
©
CopyrightInc.
IBM
Corporation
1994, 2011
© Datacap
1994,
2011
Contents
Introduction ..................................................................................................................................................1
About this guide................................................................................................................................................................. 1
Required hardware and software..................................................................................................................................... 1
Prerequisite knowledge ..................................................................................................................................................... 1
Business Requirements and Application Architecture ................................................................................ 3
Developing the business requirements ............................................................................................................................... 4
General Taskmaster application architecture..................................................................................................................... 5
TravelDocs: Developing the business requirements ........................................................................................................ 6
Examining the document and page types ...................................................................................................................... 6
Required document structure ........................................................................................................................................ 12
Fields for each page type ................................................................................................................................................ 13
Permitted field values...................................................................................................................................................... 14
Business validation rules ................................................................................................................................................. 14
Data export format.......................................................................................................................................................... 15
Introducing Datacap Studio ....................................................................................................................... 17
Quick tour of the user interface ......................................................................................................................................... 18
Starting Taskmaster Server............................................................................................................................................. 18
Opening a sample Taskmaster application .................................................................................................................. 18
Panel organization within the Datacap Studio window ............................................................................................ 19
What‟s on the Rulemanager tab .................................................................................................................................... 20
What‟s on the Zones tab ................................................................................................................................................ 21
What‟s on the Test tab .................................................................................................................................................... 22
TravelDocs: Starting the TravelDocs application ........................................................................................................... 23
Creating the application framework ............................................................................................................................. 23
Connecting to the application ....................................................................................................................................... 23
The Document Hierarchy .......................................................................................................................... 25
Understanding the document hierarchy ........................................................................................................................... 26
Document structure ........................................................................................................................................................ 26
Generating structured content from unstructured documents ................................................................................ 27
Mapping the document hierarchy to the runtime batch hierarchy .......................................................................... 28
Handling different versions of each page type ........................................................................................................... 29
TravelDocs: Creating the document hierarchy ................................................................................................................ 30
Examining the default document hierarchy ................................................................................................................ 31
Creating new document types ....................................................................................................................................... 32
Creating new page types ................................................................................................................................................. 33
Specifying the structure of documents and pages within the batch ........................................................................ 34
Creating data fields .......................................................................................................................................................... 35
Specifying the structure of fields within each page .................................................................................................... 37
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
I
Sharing field definitions across the document hierarchy .......................................................................................... 38
The Taskmaster Workflow ......................................................................................................................... 39
Understanding the Taskmaster workflow ........................................................................................................................ 40
Workflows, jobs, and tasks ............................................................................................................................................ 40
Task profiles and rulesets ............................................................................................................................................... 41
Rulesets, rules, and actions............................................................................................................................................. 42
Managing the workflow from Taskmaster Client ........................................................................................................... 43
Running batches through the workflow ...................................................................................................................... 43
Monitoring the job queue............................................................................................................................................... 44
What‟s in the Taskmaster Administrator window ...................................................................................................... 45
Configuring shortcuts ..................................................................................................................................................... 46
Configuring jobs and tasks ............................................................................................................................................. 47
Mapping a task to a task profile .................................................................................................................................... 48
Document Input......................................................................................................................................... 49
Entering electronic documents (“virtual scanning”) ...................................................................................................... 50
Document conversion .................................................................................................................................................... 50
Scanning hardcopy documents .......................................................................................................................................... 51
Local scanning.................................................................................................................................................................. 51
Remote scanning.............................................................................................................................................................. 51
TravelDocs: Creating a batch using VScan ...................................................................................................................... 52
Copying the sample documents into the application‟s “images” folder ................................................................. 52
Modifying the VScan ruleset .......................................................................................................................................... 53
Running VScan to generate a batch.............................................................................................................................. 53
Examining the files in the runtime batch folder ......................................................................................................... 54
TravelDocs: Setting up a local scanner (optional)........................................................................................................... 55
Creating a new Batch Pilot project for the scan task ................................................................................................. 55
Creating a new module ................................................................................................................................................... 56
Creating the scan task ..................................................................................................................................................... 56
Configuring the scanner settings ................................................................................................................................... 57
Creating a shortcut for the new scan task.................................................................................................................... 58
Running the scan task ..................................................................................................................................................... 59
Page Identification ...................................................................................................................................... 61
Page identification methods ............................................................................................................................................... 62
Fingerprint matching....................................................................................................................................................... 62
Structure-based page identification............................................................................................................................... 64
Text matching .................................................................................................................................................................. 64
Manual page identification ............................................................................................................................................. 64
Image Enhancement ............................................................................................................................................................ 65
Goal of image enhancement .......................................................................................................................................... 65
II
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
When to perform image enhancement ........................................................................................................................ 65
TravelDocs: Creating the initial fingerprint library ......................................................................................................... 66
Changing the fingerprint creation method .................................................................................................................. 66
Creating fingerprints for known page types ................................................................................................................ 67
TravelDocs: Enhancing the sample fingerprint images ................................................................................................. 68
Determining appropriate image processing settings .................................................................................................. 68
Using the new image processing settings to enhance the fingerprint images ........................................................ 70
TravelDocs: Running a batch through the workflow..................................................................................................... 71
Processing a batch ........................................................................................................................................................... 71
Examining the runtime batch folder ............................................................................................................................ 72
Checking the confidence levels on the runtime pages............................................................................................... 72
Rule Execution........................................................................................................................................... 73
Associating rules with objects ............................................................................................................................................ 74
Example 1 ......................................................................................................................................................................... 74
Example 2 ......................................................................................................................................................................... 75
Order of rule execution ....................................................................................................................................................... 76
Example 1 ......................................................................................................................................................................... 78
Example 2 ......................................................................................................................................................................... 79
Summary of order of rule execution ............................................................................................................................ 80
TravelDocs: Stepping a batch through the PageID task profile ................................................................................... 81
Document Assembly .................................................................................................................................. 83
Creating structured documents .......................................................................................................................................... 84
Creating documents based on the document hierarchy ............................................................................................ 84
Creating the page data files ............................................................................................................................................ 86
Checking document integrity ......................................................................................................................................... 87
Handling document integrity problems ....................................................................................................................... 89
TravelDocs: Creating documents and setting up page files .......................................................................................... 91
Running a batch through the workflow ....................................................................................................................... 91
Examining the runtime batch folder ............................................................................................................................ 92
Reviewing the page data files ......................................................................................................................................... 92
TravelDocs: Handling document integrity issues ........................................................................................................... 93
Configuring branching .................................................................................................................................................... 93
Running a batch with document integrity problems ................................................................................................. 94
Data Recognition ....................................................................................................................................... 97
Recognizing page data ......................................................................................................................................................... 98
Using fingerprints to identify recognition zones ........................................................................................................ 98
Storing the recognition zone information ................................................................................................................... 99
Reading data from the page ......................................................................................................................................... 100
Handling checkbox options .............................................................................................................................................. 101
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
III
Overview of checkbox recognition methods ............................................................................................................ 101
Establishing parent fields ............................................................................................................................................. 102
Setting the required variables on the parent field ..................................................................................................... 102
Using the OCR/A checkbox recognition method ................................................................................................... 103
Using the pixel threshold evaluation method ........................................................................................................... 104
TravelDocs: Specifying recognition zones ..................................................................................................................... 107
Creating the text zones on the Rental_Agreement page ......................................................................................... 107
Creating the OMR zones on the rental agreement page ......................................................................................... 108
Creating the zones for the other page types.............................................................................................................. 109
TravelDocs: Assigning the default rules to the document hierarchy ......................................................................... 111
Assigning the default page level rules to new pages ................................................................................................ 111
Assigning the default field level rules to new fields ................................................................................................. 112
Updating the Recognize Page rule .............................................................................................................................. 113
Running a batch through the workflow ..................................................................................................................... 114
TravelDocs: Updating the application to handle checkbox options .......................................................................... 115
Setting the required variables on the Options and Insurance fields...................................................................... 115
Specifying the checkmark type .................................................................................................................................... 116
Creating a rule to recognize the OMR fields............................................................................................................. 117
Adding the “Recognize OMR Fields” rule to the document hierarchy................................................................ 117
Running a batch through the workflow ..................................................................................................................... 118
TravelDocs: Using pixel threshold checkbox recognition (optional) ........................................................................ 119
Updating the Recognize OMR Fields rule to use RecogOMRThreshold ............................................................ 119
Determining appropriate threshold and background settings ................................................................................ 120
Data Validation ......................................................................................................................................... 123
Validating data .................................................................................................................................................................... 124
Checking that data formats are valid (currency, dates, field length, etc.) ............................................................. 125
Validating calculated fields ........................................................................................................................................... 126
Displaying validation failures to an operator ............................................................................................................ 127
Using external data sources during validation........................................................................................................... 129
Handling validation errors............................................................................................................................................ 129
TravelDocs: Updating the application to perform validation ..................................................................................... 130
Validating the currency fields ...................................................................................................................................... 130
Validating the flight cost............................................................................................................................................... 132
Using a lookup database to validate the car type...................................................................................................... 134
Creating a dictionary of valid car types ...................................................................................................................... 136
Running a batch through the workflow ..................................................................................................................... 137
Examining the page and field status values ............................................................................................................... 138
Creating recognition zones for the remaining fingerprints ..................................................................................... 141
Running a batch through the workflow ..................................................................................................................... 141
Interpreting the page and field status codes in the TravelDocs application ........................................................ 142
IV
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Data Verification ....................................................................................................................................... 143
Verifying data ...................................................................................................................................................................... 144
Options for data verification ....................................................................................................................................... 144
Understanding confidence levels and setting the page status................................................................................. 147
Overriding validation failures ...................................................................................................................................... 149
TravelDocs: Verifying the batch ...................................................................................................................................... 150
Setting the Car Type field to non-overrideable ........................................................................................................ 150
Using Batch Pilot for verification ............................................................................................................................... 151
Using DotEdit for verification .................................................................................................................................... 161
Using Taskmaster Web for verification ..................................................................................................................... 163
Data Export ............................................................................................................................................... 167
Exporting data .................................................................................................................................................................... 168
Exporting to a text file.................................................................................................................................................. 168
Exporting to a database ................................................................................................................................................ 169
Exporting to an XML file ............................................................................................................................................ 169
Exporting to a document management system ........................................................................................................ 169
TravelDocs: Exporting data to a database ..................................................................................................................... 170
Creating the export database ....................................................................................................................................... 170
Configuring the export database in the Taskmaster Application Manager .......................................................... 170
Creating the ExportDB ruleset ................................................................................................................................... 171
Adding the ruleset to the Export task profile ........................................................................................................... 172
Attaching the Export Rental Agreement Data rule to the rental agreement page .............................................. 172
Running a batch through the workflow ..................................................................................................................... 173
TravelDocs: Exporting data to an XML file .................................................................................................................. 174
Creating the ExportXML ruleset ................................................................................................................................ 174
Adding the ruleset to the Export task profile ........................................................................................................... 176
Attaching the Export XML rules to the document hierarchy ................................................................................ 176
Running a batch through the workflow ..................................................................................................................... 177
Application Debugging............................................................................................................................. 179
Taskmaster log files............................................................................................................................................................ 180
Enabling logging for Batch Pilot tasks ....................................................................................................................... 180
Enabling logging for Taskmaster Web tasks ............................................................................................................. 182
Rulerunner Service (RRS) log files .............................................................................................................................. 183
Task log files ................................................................................................................................................................... 185
Debugging your application from the Datacap Studio Test tab ................................................................................. 186
Using breakpoints .......................................................................................................................................................... 186
Single-stepping through your code ............................................................................................................................. 189
Examining log files from the Test tab........................................................................................................................ 190
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
V
Handling Line Item Grids ........................................................................................................................ 191
Defining the document hierarchy for line item grids ................................................................................................... 192
Creating rules to recognize line items ............................................................................................................................. 193
Using text matching to locate fields ................................................................................................................................ 195
Removing non-line items from the page data file ......................................................................................................... 196
Exporting data from a line item grid ............................................................................................................................... 197
TravelDocs: Adding new pages containing line item grids ......................................................................................... 198
Updating the Document hierarchy ............................................................................................................................. 198
Attaching the existing page rules to the new pages.................................................................................................. 200
Creating the page fingerprints ..................................................................................................................................... 201
Defining the recognition zones ................................................................................................................................... 201
TravelDocs: Recognizing line item grid data ................................................................................................................. 204
Creating the recognition rules for the line items ...................................................................................................... 204
Creating the recognition rule for the grid total ......................................................................................................... 205
Attaching the rules to the document hierarchy ........................................................................................................ 206
Running a batch through the workflow ..................................................................................................................... 207
Creating rules to remove the non-line items ............................................................................................................. 208
TravelDocs: Validating line item grid data ..................................................................................................................... 209
Validating the line item totals ...................................................................................................................................... 209
Validating the grid total ................................................................................................................................................ 211
Running a batch through the workflow ..................................................................................................................... 212
TravelDocs: Creating verification panels for the line item grid pages ....................................................................... 214
Using Batch Pilot for verification ............................................................................................................................... 214
Using DotEdit for verification .................................................................................................................................... 218
TravelDocs: Exporting line item grid data to a database ............................................................................................. 219
Exporting to a database ................................................................................................................................................ 219
Using Smart Parameters ........................................................................................................................... 223
General structure of a smart parameter .......................................................................................................................... 224
Using special variables to access application configuration settings .......................................................................... 226
Determining the correct key name ............................................................................................................................. 227
Storing passwords, connection strings, and other parameters in the .app file .................................................... 228
Referencing passwords, connection strings, and other parameters from your actions ...................................... 230
Accessing the runtime hierarchy ...................................................................................................................................... 231
Examples of using special variables to access the runtime hierarchy ................................................................... 232
Summary of special variables for accessing the runtime hierarchy ....................................................................... 233
Using navigation elements to access the runtime hierarchy ................................................................................... 234
Using other special variables ............................................................................................................................................ 235
Accessing job and task information............................................................................................................................ 235
Accessing other information........................................................................................................................................ 235
VI
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TravelDocs: Exporting line item grid data to an XML file ......................................................................................... 236
Exporting to an XML file ............................................................................................................................................ 236
Text Matching........................................................................................................................................... 241
Identifying pages using text matching............................................................................................................................. 242
Locating data using text matching ................................................................................................................................... 243
Locating labels................................................................................................................................................................ 243
Locating the field data .................................................................................................................................................. 245
Updating the runtime data file with the recognized text ......................................................................................... 246
Limitations of using text matching for data recognition ......................................................................................... 246
TravelDocs: Updating the application to use text matching ....................................................................................... 247
Identifying unrecognized pages using text matching ............................................................................................... 247
Recognizing data using text matching ........................................................................................................................ 248
Attaching the rules to the document hierarchy ........................................................................................................ 250
Running a batch through the workflow ..................................................................................................................... 251
Pattern Matching ...................................................................................................................................... 253
About pattern matching .................................................................................................................................................... 254
Considerations for using pattern matching ............................................................................................................... 255
Auto registration when using the FindFingerprint action....................................................................................... 256
Setting up anchor objects .................................................................................................................................................. 257
Setting the required confidence level for pattern matching .................................................................................... 258
Using geometric pattern matching .................................................................................................................................. 259
How the PatternMatch_Identify action works ......................................................................................................... 259
Using multiple anchors ................................................................................................................................................. 260
Using pat_RegisterZones to adjust the positions of individual fields ................................................................... 261
Using text-based pattern matching .................................................................................................................................. 262
How the pat_RecogMatch_Id action works ............................................................................................................. 263
Determining the runtime field positions using the anchor offsets ........................................................................ 264
Adjusting the positions of individual fields based on multiple anchors ............................................................... 264
TravelDocs: Using geometric pattern matching to identify pages ............................................................................. 265
Setting up the pattern match anchor objects ............................................................................................................ 265
Updating the PageID rule to use pattern matching ................................................................................................. 266
Running a batch through the workflow ..................................................................................................................... 267
Reviewing the runtime batch files ............................................................................................................................... 268
Workflow Automation, Routing, and Automatic Fingerprint Generation................................................ 269
Using Quattro to automate background tasks ............................................................................................................... 270
About Rulerunner Quattro .......................................................................................................................................... 270
Configuring Quattro ..................................................................................................................................................... 271
Running Quattro............................................................................................................................................................ 271
Quattro logging .............................................................................................................................................................. 271
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
VII
Using branching and splitting to route documents....................................................................................................... 272
Branching versus splitting ............................................................................................................................................ 272
Raising condition flags .................................................................................................................................................. 273
Defining a condition and the associated action ........................................................................................................ 275
Creating jobs to handle special conditions ................................................................................................................ 276
Generating fingerprints automatically ............................................................................................................................. 281
TravelDocs: Automating background processing using Quattro ............................................................................... 282
Defining the background tasks in the Taskmaster Application Manager ............................................................ 282
Setting up the background tasks in the Quattro Manager ...................................................................................... 283
Enabling Quattro logging............................................................................................................................................. 284
Setting up the Job Monitor .......................................................................................................................................... 284
Running a batch through the workflow ..................................................................................................................... 284
Examining the Quattro log .......................................................................................................................................... 285
Disabling Quattro logging ............................................................................................................................................ 286
TravelDocs: Implementing routing to handle document integrity failures............................................................... 287
Moving document creation and integrity checking into the PageID task profile ............................................... 287
Creating the new CreateDocs task and module........................................................................................................ 288
Configuring Quattro to run CreateDocs ................................................................................................................... 289
Running a batch through the workflow ..................................................................................................................... 290
TravelDocs: Adding routing to enable manual page identification............................................................................ 292
Adding a function for manual page identification ................................................................................................... 292
Updating the Recognize Page ruleset ......................................................................................................................... 293
Adding the conditional branch to the PageID task ................................................................................................. 294
Creating the ManualPageID project file .................................................................................................................... 295
Creating the ManualPageID job and task .................................................................................................................. 296
Configuring branching and creating a shortcut ........................................................................................................ 297
Configuring the Routing ruleset to handle manually identified pages .................................................................. 298
Running a batch through the workflow ..................................................................................................................... 299
Recognizing the data on the unidentified page ......................................................................................................... 301
TravelDocs: Generating fingerprints automatically ...................................................................................................... 303
Creating the AutoFingerprint ruleset ......................................................................................................................... 303
Assigning the rule to each page type .......................................................................................................................... 304
Adding the ruleset to the Verify task profile ............................................................................................................. 304
Enabling logging for Taskmaster Web....................................................................................................................... 305
Running a batch through the workflow ..................................................................................................................... 306
Reviewing the RRS log file ........................................................................................................................................... 308
TravelDocs: Splitting a document from the main batch.............................................................................................. 309
Updating the Routing ruleset to split the batch........................................................................................................ 309
Assigning the Batch Splitting rule to the batch‟s “Close” element ....................................................................... 310
VIII
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Routing the split document to a supervisor .............................................................................................................. 311
Running a batch through the workflow ..................................................................................................................... 313
Taskmaster Web and Remote Scanning ................................................................................................... 315
Moving the workflow to Taskmaster Web..................................................................................................................... 316
Remote scanning ................................................................................................................................................................ 317
Using the remote scanning client (scancl.aspx) ........................................................................................................ 318
Configuring the remote scanning client ..................................................................................................................... 319
Implementing a start panel........................................................................................................................................... 320
Using the remote scanning/page ID client (scanid.apsx) ....................................................................................... 322
Remote virtual scanning .................................................................................................................................................... 323
Verification using the Prelayout web client.................................................................................................................... 324
Using the batch tree view to restructure the batch .................................................................................................. 325
Configuring the Prelayout client ................................................................................................................................. 325
Configuring additional Prelayout settings .................................................................................................................. 327
Creating and using custom pages ................................................................................................................................ 328
Verification, page identification, and registration using AIndex ................................................................................ 332
Using the batch tree view to restructure the batch .................................................................................................. 333
Configuring the AIndex client ..................................................................................................................................... 333
Multi-pass verification................................................................................................................................................... 334
Manual page identification and registration .............................................................................................................. 338
Verification using the AVerify web client ...................................................................................................................... 340
Creating and using custom (“static”) panels ............................................................................................................. 341
Verification using the ImgEnter web client ................................................................................................................... 344
Manual page identification and batch restructuring using ProtoId ............................................................................ 345
Configuring ProtoId...................................................................................................................................................... 346
Web-based administration and job monitoring ............................................................................................................. 348
Application administration........................................................................................................................................... 348
Job monitoring ............................................................................................................................................................... 349
TravelDocs: Scanning from Taskmaster Web ............................................................................................................... 350
Creating a remote scan task ......................................................................................................................................... 350
Configuring the remote scanning client ..................................................................................................................... 352
Configuring the Upload task........................................................................................................................................ 352
Scanning and uploading a batch .................................................................................................................................. 353
Creating the Web Job CreateDocs task ..................................................................................................................... 354
Configuring Quattro to run web jobs ........................................................................................................................ 355
Modifying the Verify/FixUp shortcut ....................................................................................................................... 356
Opening the batch for verification ............................................................................................................................. 356
TravelDocs: Using AIndex for manual page identification and registration ............................................................ 357
Making a copy of the application ................................................................................................................................ 357
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
IX
Updating the application .............................................................................................................................................. 358
Updating ManualPageID.icp ....................................................................................................................................... 362
Creating the ManualIDValidate rule........................................................................................................................... 364
Running a batch through the workflow ..................................................................................................................... 365
Testing the ManualIDValidate rule ............................................................................................................................ 367
Fingerprint Management .......................................................................................................................... 369
Review of basic fingerprint functionality........................................................................................................................ 370
Creating fingerprint files ............................................................................................................................................... 371
Adding fingerprints to the fingerprint library ........................................................................................................... 371
Defining field zones ...................................................................................................................................................... 372
About the fingerprint database ........................................................................................................................................ 373
Using fingerprint XML files ............................................................................................................................................. 374
About FPXML ............................................................................................................................................................... 374
Enabling FPXML .......................................................................................................................................................... 375
Exporting existing position information from the document hierarchy .............................................................. 376
TravelDocs: Updating auto fingerprinting to use FPXML ......................................................................................... 377
Updating the AutoFingerprint ruleset ........................................................................................................................ 377
Updating the Recognize Page rule .............................................................................................................................. 378
Preparations for running a batch through the workflow ........................................................................................ 379
Running a batch through the workflow ..................................................................................................................... 379
Moving Your Application into Production ............................................................................................... 381
Taskmaster‟s multi-machine architecture ....................................................................................................................... 382
Locating applications and their components on the network..................................................................................... 383
TravelDocs: Migrating the Taskmaster databases to SQL Server .............................................................................. 384
Creating the SQL scripts .............................................................................................................................................. 384
Creating the SQL databases ......................................................................................................................................... 385
Enabling remote access to the SQL Server databases ............................................................................................. 386
Modifying the application configuration file to use the SQL databases ............................................................... 387
Running a batch through the workflow ..................................................................................................................... 388
TravelDocs: Moving the application to a production server ...................................................................................... 389
Setting up the production server ................................................................................................................................. 389
Copying the application to the production server.................................................................................................... 389
Configuring the application on the production server ............................................................................................ 391
Updating the fingerprint database with the server path .......................................................................................... 393
TravelDocs: Running the application in a multi-machine environment ................................................................... 394
Configuring the Quattro/Taskmaster Web server ................................................................................................... 394
Configuring the Taskmaster Web client .................................................................................................................... 395
Configuring a remote VScan task ............................................................................................................................... 396
Running the remote VScan task from the web client .............................................................................................. 397
X
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Uploading the files to the server ................................................................................................................................. 397
Starting Quattro and completing the batch ............................................................................................................... 398
TravelDocs: Role-based batch processing ..................................................................................................................... 399
Creating the Operator and Supervisor accounts ...................................................................................................... 399
Preparing to run a new batch through the workflow .............................................................................................. 400
Logging on as the operator and creating a batch ..................................................................................................... 401
Logging on as the supervisor ....................................................................................................................................... 402
Understanding the Taskmaster Object Model and Execution Environment .......................................... 403
Overview of main steps in executing a task profile ...................................................................................................... 404
Setting up the initial execution environment ................................................................................................................. 405
Transferring control to the Rulerunner service ............................................................................................................. 406
Structure of collection.xml ........................................................................................................................................... 407
Structure of a ruleset file .............................................................................................................................................. 408
Going through the document hierarchy .................................................................................................................... 409
Accessing runtime objects from an action ................................................................................................................ 411
TravelDocs: Executing a task profile using ProfileRunner ......................................................................................... 412
About ProfileRunner .................................................................................................................................................... 412
Opening the ProfileRunner project ............................................................................................................................ 413
Preparing a batch using VScan .................................................................................................................................... 413
Building and running the project ................................................................................................................................ 414
Setting a breakpoint in the Execute method ............................................................................................................. 415
Examining the initial state of the runtime objects ................................................................................................... 416
Populating the runtime DCO object .......................................................................................................................... 418
Populating the PilotProps object and the State.PilotProps property .................................................................... 419
Executing the task profile ............................................................................................................................................ 421
TravelDocs: Using a custom action to examine runtime objects ............................................................................... 422
Obtaining the Custom Action Library template....................................................................................................... 422
Creating a custom action library.................................................................................................................................. 422
Adding the custom action to the TravelDocs application ...................................................................................... 423
Executing the action within the Visual Studio debugger ........................................................................................ 424
Viewing page and field objects .................................................................................................................................... 427
Smart Parameter Special Variable Reference............................................................................................ 431
Special variables for accessing the application configuration file ............................................................................... 432
@APPPATH(<key_path>)........................................................................................................................................... 432
@APPVAR(<key_path>) ............................................................................................................................................. 433
Special variables for accessing the runtime hierarchy ................................................................................................... 434
@BATCHID.................................................................................................................................................................. 434
@ID ................................................................................................................................................................................. 434
@STATUS...................................................................................................................................................................... 434
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
XI
@VALUE ....................................................................................................................................................................... 435
@VAR(<variable_name>) .............................................................................................................................................. 435
@P\<field_name>[.<variable_name>]........................................................................................................................... 435
@F\<field_name>[.<variable_name>]........................................................................................................................... 436
@B.<variable_name> ...................................................................................................................................................... 436
@D.<variable_name> ..................................................................................................................................................... 436
@P.<variable_name> ...................................................................................................................................................... 437
@F.<variable_name> ...................................................................................................................................................... 437
Special variables for accessing job and task information ............................................................................................. 438
@JOBID ......................................................................................................................................................................... 438
@JOBNAME................................................................................................................................................................. 438
@OPERATOR.............................................................................................................................................................. 438
@STATION .................................................................................................................................................................. 438
@TASKID ..................................................................................................................................................................... 439
@TASKNAME ............................................................................................................................................................. 439
Miscellaneous special variables......................................................................................................................................... 440
@CHR(<ascii_value>) ................................................................................................................................................... 440
@DATE(<format>) ....................................................................................................................................................... 440
@DCO(<property_name>)............................................................................................................................................. 440
@DICT_VALUE(<field>)........................................................................................................................................... 441
@DICT_WORD(<field>)............................................................................................................................................ 441
@DICT_VINDEX(<csv_string>) ............................................................................................................................... 441
@DICT_WINDEX(<csv_string>) .............................................................................................................................. 441
@EMPTY....................................................................................................................................................................... 442
@PATH(<key>) ............................................................................................................................................................ 442
@PILOT(<property_name>) ......................................................................................................................................... 442
@PROJECTDIR........................................................................................................................................................... 443
@PROCESSDIR........................................................................................................................................................... 443
@STRING(<string_value>) .......................................................................................................................................... 443
@TIME(<format>) ........................................................................................................................................................ 443
@TYPE........................................................................................................................................................................... 443
Standard Variable Reference ..................................................................................................................... 445
Variables used on all object types .................................................................................................................................... 446
MAX_TYPES ................................................................................................................................................................ 446
MESSAGE ..................................................................................................................................................................... 446
MIN_TYPES ................................................................................................................................................................. 446
rules .................................................................................................................................................................................. 446
STATUS .......................................................................................................................................................................... 447
TYPE ............................................................................................................................................................................... 447
Batch variables .................................................................................................................................................................... 448
XII
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
LAST_RR_PROFILE .................................................................................................................................................. 448
Document variables ........................................................................................................................................................... 448
DD ................................................................................................................................................................................... 448
Page variables ...................................................................................................................................................................... 449
Confidence...................................................................................................................................................................... 449
DATAFILE .................................................................................................................................................................... 449
Fingerprint Created ....................................................................................................................................................... 449
Image_Offset ................................................................................................................................................................. 449
IMAGEFILE ................................................................................................................................................................. 449
PatternConfidence ......................................................................................................................................................... 450
PD .................................................................................................................................................................................... 450
ScanSrcPath .................................................................................................................................................................... 450
TEMPLATE IMAGE .................................................................................................................................................. 450
TemplateID .................................................................................................................................................................... 450
Field variables ..................................................................................................................................................................... 451
Datatype .......................................................................................................................................................................... 451
DensityString .................................................................................................................................................................. 451
DICT ............................................................................................................................................................................... 451
Index ................................................................................................................................................................................ 451
label .................................................................................................................................................................................. 452
Lookup ............................................................................................................................................................................ 452
LookupEx ....................................................................................................................................................................... 453
MaxLength ...................................................................................................................................................................... 453
METRIC ......................................................................................................................................................................... 453
MultiLine......................................................................................................................................................................... 454
MultiPunch ..................................................................................................................................................................... 454
PatternMatch .................................................................................................................................................................. 454
PictureString ................................................................................................................................................................... 455
Position ........................................................................................................................................................................... 456
ReadOnly ........................................................................................................................................................................ 456
RecogStatus .................................................................................................................................................................... 456
RecogType ...................................................................................................................................................................... 456
ReqConf .......................................................................................................................................................................... 456
SELECT.......................................................................................................................................................................... 457
ShowChar........................................................................................................................................................................ 457
Sticky................................................................................................................................................................................ 457
Text .................................................................................................................................................................................. 458
Zone_Offset ................................................................................................................................................................... 458
Action Library Summaries ........................................................................................................................ 459
Autodoc ............................................................................................................................................................................... 460
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
XIII
Barcode_P ........................................................................................................................................................................... 461
Barcode_X ........................................................................................................................................................................... 461
Cco2cco ............................................................................................................................................................................... 461
ColorToBW ......................................................................................................................................................................... 461
Convert ................................................................................................................................................................................ 462
Excel ................................................................................................................................................................................ 462
Images ............................................................................................................................................................................. 462
Outlook ........................................................................................................................................................................... 462
Pdf .................................................................................................................................................................................... 463
Tiff ................................................................................................................................................................................... 463
Word ................................................................................................................................................................................ 463
Dcclip ................................................................................................................................................................................... 463
DCImageFix ........................................................................................................................................................................ 463
DCO ..................................................................................................................................................................................... 464
dcpdf..................................................................................................................................................................................... 465
Documentum ...................................................................................................................................................................... 466
Email .................................................................................................................................................................................... 466
Equalize................................................................................................................................................................................ 466
Ewsmail ................................................................................................................................................................................ 467
Export .................................................................................................................................................................................. 467
ExportDB ............................................................................................................................................................................ 469
ExportXML......................................................................................................................................................................... 469
FileIO ................................................................................................................................................................................... 470
FileNetP8 ............................................................................................................................................................................. 470
FingerprintMaintenance .................................................................................................................................................... 471
FPXML ................................................................................................................................................................................ 471
Grayscale.............................................................................................................................................................................. 471
icr_c ...................................................................................................................................................................................... 471
ImageConvert ..................................................................................................................................................................... 472
ImageFix .............................................................................................................................................................................. 472
Imail ...................................................................................................................................................................................... 472
Imprint ................................................................................................................................................................................. 473
Intellocate ............................................................................................................................................................................ 473
IOverlay ............................................................................................................................................................................... 473
LiveLink ............................................................................................................................................................................... 473
Locate ................................................................................................................................................................................... 474
Lookup ................................................................................................................................................................................. 476
XIV
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Nenu ..................................................................................................................................................................................... 477
Logging actions .............................................................................................................................................................. 477
Batch processing actions .............................................................................................................................................. 477
Query setup actions....................................................................................................................................................... 478
Reporting actions ........................................................................................................................................................... 479
Application setup actions ............................................................................................................................................. 479
ocr_a ..................................................................................................................................................................................... 480
OCR_s.................................................................................................................................................................................. 481
ocr_sr.................................................................................................................................................................................... 481
OpenTextFaxServer ........................................................................................................................................................... 482
PatternMatch ....................................................................................................................................................................... 482
Picture .................................................................................................................................................................................. 482
Recog_Shared ..................................................................................................................................................................... 483
rrunner.................................................................................................................................................................................. 484
SPExport ............................................................................................................................................................................. 485
Split ....................................................................................................................................................................................... 485
TifMerge .............................................................................................................................................................................. 485
TM524 .................................................................................................................................................................................. 485
Validations ........................................................................................................................................................................... 485
Vote ...................................................................................................................................................................................... 489
Vscan .................................................................................................................................................................................... 489
Zones.................................................................................................................................................................................... 490
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
XV
INTRODUCTION
ABOUT THIS GUIDE
This book guides you through the steps involved in developing a simple Taskmaster application. It combines
concepts with practice – we‟ll describe general techniques used in the various stages of the Taskmaster
workflow, and then provide hands-on instructions so you can implement some of those techniques using
Taskmaster‟s application development tools. If you follow the chapters in sequence, by the time you reach the
end of chapter 12 you‟ll have built a complete Taskmaster application that:

Inputs a collection of page image files

Identifies the individual pages

Creates a set of structured documents

Captures the data from specific fields on each page

Runs rules to ensure the validity of the data

Displays pages to an operator for verification

Exports the captured data to a database or other repository
Subsequent chapters of the guide explore some more advanced topics.
In the practice sections, you‟ll create an application called “TravelDocs” that processes travel documents – car
rental agreements, hotel receipts, and flight tickets. To follow the instructions and build the application, you‟ll
need to download the sample image files we‟ve provided. These include several samples of each page type
from different vendors, so you‟ll see how to capture and verify data in a way that‟s independent of the
location of the data on the page. You can download the sample images from the same location you
downloaded this guide.
REQUIRED HARDWARE AND SOFTWARE
If you want to follow the steps in the hands-on practice sections, you‟ll need the following:

A computer with a complete installation of IBM Datacap Taskmaster Capture 8.0 or higher

The sample image files, as described above

Optional: A scanner with an ISIS® driver (for the section on setting up a local scanner) or a TWAIN
driver (for the section on remote scanning) – you can complete all the other sections without a scanner
If Taskmaster Capture not already installed, please refer to the IBM Datacap Taskmaster Capture Installation and
Configuration Guide for instructions.
PREREQUISITE KNOWLEDGE
Familiarity with the following is helpful but not required:

Basic structured and object oriented programming concepts

XML
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
1
Chapter 1
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
The first step in developing any Taskmaster application is to define the business requirements. This includes:

Identifying the types of documents the application will process

Identifying the page types associated with each document type

Deciding what data you want to capture from each page

Specifying the business rules that determine whether the captured data is valid or not

Determining how to handle documents that are structurally invalid, pages that aren‟t recognized, data that
doesn‟t meet the business rules, or characters that aren‟t recognized with high confidence

Deciding how you want to export or release the data at the end of the workflow
This chapter looks at how to go about developing the business requirements for a Taskmaster application. It
also looks at the general Taskmaster application architecture so you can begin mapping the business
requirements to the application model. By the end of the chapter, we‟ll have established the business
requirements for the sample TravelDocs application we‟ll be developing throughout this guide.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
3
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
DEVELOPING THE BUSINESS REQUIREMENTS
Taskmaster applications vary enormously in their scale and complexity, but they all seek to capture data from
unstructured documents. The documents can be printed pages or electronic images – the key point is that
data on the page must be first located and then interpreted with maximum accuracy
Vendor: Airline #3
Outbound From: ORD Chicago
Outbound To: BOS Boston
Outbound Date: OCT 26, 2010

Return From: BOS Boston
Return To: ORD Chicago
Return Date: OCT 29, 2010
Airfare: 233.00 USD
Taxes: 21.40 USD
Total Cost: 254.40 USD
Before starting implementation, it‟s important to define the business requirements through collaboration with
the various stakeholders. Initially this involves examining the documents you want to process, determining
which fields you need to capture, and deciding what to do with the data once you‟ve captured it.
If you‟re processing a variety of document types, you‟ll need to decide if the documents will be pre-sorted or
processed as a mixed batch. If they‟re pre-sorted, you may be able to simplify implementation by processing
each type independently – either with a separate application or a separate workflow for each type. However, if
they‟re to be processed as mixed batches, you‟ll need a more sophisticated system of page identification and
document assembly.
Although the goal is to create a fully automated system, there are inevitably points at which manual
intervention is required. The business requirements must specify how to determine if the information is
accurate and what you‟ll do if there‟s a problem. Only after you‟ve defined the business requirements can you
begin designing the application.
Since this is an introductory guide, we‟re not going to provide a detailed methodology for determining
business requirements. Instead, we‟ll take a look at the general Taskmaster application architecture and then
jump right in to examine the documents we‟ll be processing as we develop the TravelDocs application. This
sample application is designed to demonstrate basic techniques that implement the main steps in the
Taskmaster application workflow.
4
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
GENERAL TASKMASTER APPLICATION ARCHITECTURE
Although each Taskmaster application is different, most include the seven basic steps shown below.
Structured
business data
Scanned
pages
Taskmaster
Application
Electronic
documents
 Page input
 Page identification
 Document assembly
 Data recognition
 Data validation
 Data verification
 Data export
Page
input
Scan a batch of hardcopy pages or import electronic documents into your
application. The output from this stage is a “batch” of individual TIF image files. Each
page is initially assigned the page type “Other.”
Page
identification
Perform image enhancement to improve the image quality. Then determine each
page type, automatically or by displaying it to an operator for manual identification
if necessary. The goal is to identify the page type – not a specific variant of the type
(for example, an airline ticket – not a ticket from a specific airline).
Document
assembly
Organize the individual page files into single- or multi-page documents according to
predefined document definitions (for example, a job application form may have two
required pages plus an optional multi-page attachment). Run document integrity
checking to ensure each document meets the rules for that document type.
Data
recognition
For each page, locate the data fields defined for that page type (for example, an
airline ticket may have a passenger name, a departure airport, etc.). Then use one of
Taskmaster’s recognition engines to obtain the character data for each field. The
recognition engine indicates the degree of confidence for each character.
Data
validation
Check the validity of specific fields. For example, you can check that dates are valid,
numeric fields do not contain non-numeric characters, totals add up correctly, etc.
You can also perform lookups to make sure a state abbreviation is valid, a purchase
order number matches an item in a purchase order database, etc.
Data
verification
Data
export
Display low confidence data and fields that failed validation to an operator for
verification, correction, and exception handling. When the operator submits the
batch, the application runs the validation rules again to ensure all data meets the
validation criteria.
Export or release the data to a text file, an XML file, a database, a document
management system, or the next stage in a business workflow.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
5
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
TRAVELDOCS: DEVELOPING THE BUSINESS REQUIREMENTS
Throughout this guide, we‟ll be developing an application to process travel documents. The purpose is to
demonstrate how to implement some of the general techniques described for each of the basic steps in the
application workflow (document input, page identification, etc.).
Before we begin developing the application, we‟ll look at the documents and pages the application will need
to process, identify the fields we want to capture, and determine the other business requirements.
EXAMINING THE DOCUMENT AND PAGE TYPES
The documents we‟ll use in TravelDocs are simplified versions of typical travel-related documents that might
be submitted with an employee expense report – car rental receipts, hotel receipts, and air tickets. The
document types and page types are summarized in the table below.
Document type
Page types
 Car Rental
Rental Agreement
Optional Insurance
Room Receipt
Meals
 Hotel
Other Charges
 Flight
Air Ticket
Let‟s look at each document type in turn. We‟ll be looking at the sample images included in the download (see
“About this guide” on page 1 for download information).
6
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
CAR RENTAL
The car rental documents have one required page and one optional page. Initially, the application will support
documents from three car rental companies: Car Rental #1, Car Rental #3, and Car Rental #3.
The three sample rental agreement pages are shown below (Car1.tif, Car3.tif, and Car5.tif in the sample image
download). We‟ve highlighted the fields that include the data we want to extract. These fields are common to
all pages, although the position of each field is different for each page.
Vendor: Car Rental #1
Pickup Date: Mon, Oct 4, 2010
Pickup Location: New York (JFK)

Return Date: Fri, Oct 8, 2010
Return Location: New York (JFK)
Car Type: Full size
GPS  Child Seat  Fuel Service 
Total Cost: $582.77
Vendor: Car Rental #2
Pickup Date: Sun, Aug 1, 2010
Pickup Location: Los Angeles (LAX)

Return Date: Fri, Aug 6, 2010
Return Location: Los Angeles (LAX)
Car Type: Luxury
GPS  Child Seat  Fuel Service 
Total Cost: $503.39
Vendor: Car Rental #3
Pickup Date: Sun, Oct 24, 2010
Pickup Location: Chicago (ORD)

Return Date: Fri, Oct 29, 2010
Return Location: Chicago (ORD)
Car Type: Compact
GPS  Child Seat  Fuel Service 
Total Cost: $535.18
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
7
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
The three sample optional insurance pages are shown below (Car2.tif, Car4.tif, and Car6.tif in the sample
image download). Again, we‟ve highlighted the fields for which we want to extract the data.
Vendor: Car Rental #1
CDW: 

PAI:

PEP:

ELP:

Total Cost: $104.95
Vendor: Car Rental #2
CDW: 

PAI:

PEP:

ELP:

Total Cost: $0.00
Vendor: Car Rental #3
CDW: 

PAI:

PEP:

ELP:

Total Cost: $137.94
As with the rental agreement pages, the fields are common to all pages, but the position of each field is
different for each variant.
8
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
HOTEL
The hotel documents have one required page and two optional pages. Initially, the application will support
documents from three hotel chains: Hotel #1, Hotel #2, and Hotel #3.
The three sample room receipts are shown below (Hotel1.tif, Hotel2.tif, and Hotel3.tif in the sample image
download). As with the car rental pages, these fields are common to all pages, although the positions of the
fields are different for each page.
Vendor: Hotel #1

Arrival Date: Sept 24, 2010
Departure Date: Sept 26, 2010
Total Cost: $215.33
Vendor: Hotel #2

Arrival Date: Oct 14, 2010
Departure Date: Oct 16, 2010
Total Cost: $282.51
Vendor: Hotel #3

Arrival Date: Sun, Oct 24, 2010
Departure Date: Tues, Oct 26, 2010
Total Cost: $256.83
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
9
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
We‟ve included only one sample of each of the optional hotel pages (Hotel4.tif and Hotel5.tif). These are
shown in the table below.
Vendor: Hotel #3
Item
Date: 10-24-10
Description: Dinner
Cost: $48.81
Item
Date: 10-25-10
Description: Breakfast

Cost: $12.28
Item
Date: 10-25-10
Description: Dinner
Cost: $46.41
Item
Date: 10-26-10
Description: Breakfast
Cost: $12.28
Total Cost: $119.78
Vendor: Hotel #3
Item
Date: 10-24-10
Description: Internet
Cost: $5.95
Item
Date: 10-25-10
Description: Laundry

Cost: $14.00
Item
Date: 10-25-10
Description: Internet
Cost: $5.95
Item
Date: 10-26-10
Description: Parking
Cost: $52.35
Total Cost: $78.25
10
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
FLIGHT
Flight documents have one required page and no optional pages. Initially, the application will support
documents from three airlines: Airline #1, Airline #2, and Airline #3.
The three sample air ticket pages are shown below (Flight1.tif, Flight2.tif, and Flight3.tif in the sample image
download). As with the other pages, these fields are common to all pages, although the positions of the fields
are different for each page.
Vendor: Airline #1
Outbound From: New York/Newark (EWR)
Outbound To: San Francisco (SFO)
Outbound Date: 24JUL10

Return From: San Francisco (SFO)
Return To: New York/Newark (EWR)
Return Date: 28JUL10
Airfare: 760.27
Taxes: 64.56
Total Cost: 824.83
Vendor: Airline #2
Outbound From: Chicago (ORD)
Outbound To: Atlanta (ATL)
Outbound Date: MON OCT 25,2010

Return From: Atlanta (ATL)
Return To: Chicago (ORD)
Return Date: WED OCT 27, 2010
Airfare: $385.27
Taxes: $44.76
Total Cost: $430.03
Vendor: Airline #3
Outbound From: ORD Chicago
Outbound To: BOS Boston
Outbound Date: OCT 26, 2010

Return From: BOS Boston
Return To: ORD Chicago
Return Date: OCT 29, 2010
Airfare: 233.00 USD
Taxes: 21.40 USD
Total Cost: 254.40 USD
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
11
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
REQUIRED DOCUMENT STRUCTURE
As we looked at each travel document, we identified which pages are required and which pages are optional.
For example, in car rental documents:

The rental agreement page is required

The insurance page is optional
For documents that support multiple pages, there may also be requirements that define the page order and
required number of pages of each type. The structure of each travel document type is summarized in the table
below.
Number
Required?
Order
Any number per batch
No
Any position within batch
Rental Agreement
One per document
Yes
Must be first in document
Optional Insurance
One per document
No
Must be second in document
Any number per batch
No
Any position within batch
Room_Receipt
One per document
Yes
Must be first in document
Meals
Any number per document
No
May not be first in document
Other_Charges
Any number per document
No
May not be first in document
Any number per batch
No
Any position within batch
One per document
Yes
Must be first in document
Car Rental
Hotel
Flight
Air_Ticket
This structural information is an important element of the design requirements that we‟ll use later when we
implement the application‟s document hierarchy. When we go on to implement the document assembly stage
of the workflow, we‟ll use this information to determine if the pages in the batch meet the required structure
or if operator intervention is required to restructure the batch.
The assumption for the sample application‟s business model is that we‟ll be inputting batches of mixed
documents – in other words, a batch may include any number of car rental document, flight documents, and
hotel documents – but that the pages within each document will be consecutive and in the correct order. If
the batch meets these requirements, the application should assemble the documents automatically; however, if
there are orphan pages or pages that otherwise break the rules for document integrity, operator intervention
will be required.









Rental
Agreement
Optional
Insurance
Air
Ticket
Room
Receipt
Room
Receipt
Meals
Rental
Agreement
Optional
Insurance
Air Ticket









Optional
Insurance
Room
Receipt
Room
Receipt
Air
Ticket
Meals
Rental
Agreement
Optional
Insurance
Optional
Insurance
Air
Ticket
Batch
OK 
Batch
Not OK

 Orphaned optional insurance page (should follow a rental agreement page)
 Orphaned meals page (should follow a room receipt page)
 Two optional insurance pages in car rental document
12
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
FIELDS FOR EACH PAGE TYPE
When we looked at each sample page, we identified the fields of interest. We noted that each variant of a
given page type includes all of these fields, but the position of each field is different for each variant. The
table below summarizes the fields we‟ll capture for each page type.
Rental Agreement
Optional Insurance
Vendor
Vendor
Pickup_Date
Collision Damage Waiver
 CDW_Option
Pickup_Location
Return_Date
Personal Accident Insurance
 PAI_Option
Return_Location
Car Rental
Car_Type
Personal Effects Protection
Options
 PEP_Option
 Nav_System
 Child_Seat
 Fuel_Service
Extended Liability Protection
 ELP_Option
Total_Cost
Total_Cost
Room Receipt
Vendor
Hotel
Meals
[Item]
Other Charges
[Item]
Arrival_Date
Date
Date
Departure_Date
Description
Category
Total_Cost
Cost
Cost
Total_Cost
Total_Cost
Air Ticket
Vendor
Outbound_From, Outbound_To, Outbound_Date
Flight
Return_From, Return_To, Return_Date
Airfare
Taxes
Total_Cost
There are a couple of things worth mentioning in the table above. First, the two car rental pages both include
checkbox options. There‟s a requirement in Taskmaster that each checkbox option be the child of a parent
container field. On the Rental Agreement page we‟ve specified the three options as children of the same
parent field; on the Optional Insurance page we‟ve given each option its own parent. The implementation is a
little different depending on which method you use, so we‟ve done one of each so we can demonstrate both
techniques when we do the implementation. The choice is more an implementation decision than a business
decision, although it does affect the format of the export data.
Secondly, the optional hotel pages include repeating line items, each with the same structure. We don‟t know
ahead of time how many items might be on a given page. Taskmaster includes functionality for handling line
item grids that we‟ll introduce later in this guide (see “Handling Line Item Grids” on page 191).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
13
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
PERMITTED FIELD VALUES
In addition to specifying the fields, the business requirements may specify permitted formats and values for
each field. We‟ll do this for the rental agreement page only.
Page
Field
Permitted values
Rental Agreement
Vendor
Any text
Pickup_Date
Any valid date format
Pickup_Location
Any text
Return_Date
Any valid date format
Return_Location
Any text
Car_Type
Compact, Standard, Full size, SUV, or Other
Options
Checkbox fields – selected or not selected
Total_Cost
Any currency format ($999.99, 999.99, and 999.99 USD are valid)
BUSINESS VALIDATION RULES
So far, we‟ve defined the structure of each document type and the fields we want to capture from each page.
Next, we‟ll define how we want to validate the captured data to determine if it meets the business
requirements.
For the sample application, we‟ll keep things simple and will only validate a few of the fields. We‟ve chosen
these fields specifically so we can demonstrate a few generic but commonly used techniques when we
implement the data validation stage of the application workflow.
Page
Field
Rental Agreement
Total Cost
Optional Insurance
Total Cost
Room Receipt
Total Cost
Meals
Total Cost
Other Charges
Total Cost
Air Ticket
Air Fare
Validation rule
Is the field value in a valid currency format? Specifically, is the field
numeric with a 2-digit decimal portion?
Taxes
Total Cost
Rental Agreement
Car Type
Air Ticket
Air Fare
Taxes
Total Cost
Is the field value one of the following permitted values: Compact,
Standard, Full size, SUV, or Other?
Does the value of the Air Fare field plus the value of the Taxes field
equal the value of the Total Cost field?
Note that a validation failure does not necessarily mean the original page contains invalid data – it could mean
the recognition engine failed to recognize one or more characters correctly. Whatever the reason for the error,
the application developer can set the page status to make sure the page is displayed to an operator for
verification. There are many other things we could test, but this provides an example of the kinds of
validation tests you can perform.
14
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
BUSINESS REQUIREMENTS AND APPLICATION ARCHITECTURE
DATA EXPORT FORMAT
The last stage in developing the business requirements for the TravelDocs application is to specify how to
format the captured data for export. Taskmaster can export data to a text file, an XML file, a database, a
document management system, or the input stage of another business application.
For TravelDocs, we‟ll specify that data should be exported to a Microsoft® Access® database and also saved
in XML format. To simplify the implementation, we‟ll export only the rental agreement page data initially:

For the database export, the application should export the data from each rental agreement page as a
single record.

For the XML export, all rental agreement pages in the same batch should be written to a single XML file.
<?xml version='1.0' ?>
<Rental_Agreements>
<TM000001>
<Pickup_Date>Tues, Dec 7, 2010</Pickup_Date>
<Pickup_Location>Boston (BOS)</Pickup_Location>
<Return_Date>Fri, Dec 10, 2010</Return_Date>
<Return_Location>Boston (BOS)</Return_Location>
<Car_Type>Compact</Car_Type>
<Options>Fuel Service</Options>
<Total_Cost>$345.70</Total_Cost>
</TM000001>
<TM000003>
<Pickup_Date>Mon, Dec 6, 2010</Pickup_Date>
<Pickup_Location>San Francisco (SFO)</Pickup_Location>
<Return_Date>Fri, Dec 10, 2010</Return_Date>
<Return_Location>San Francisco (SFO)</Return_Location>
<Car_Type>SUV</Car_Type>
<Options>Child Seat</Options>
<Total_Cost>$489.31</Total_Cost>
</TM000003>
<TM000004>
<Pickup_Date>Mon, Dec 13, 2010</Pickup_Date>
<Pickup_Location>Newark (EWR)</Pickup_Location>
<Return_Date>Thur, Dec 16, 2010</Return_Date>
<Return_Location>Newark (EWR)</Return_Location>
<Car_Type>Other</Car_Type>
<Options>Navigation System Child Seat Fuel Service</Options>
<Total_Cost>$387.40</Total_Cost>
</TM000004>
</Rental_Agreements>
Later in the guide, we‟ll export some of the line item grid data as well.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
15
Chapter 2
INTRODUCING DATACAP STUDIO
Datacap Studio is Taskmaster‟s application development environment. It provides the tools you need to
develop and test your application. It also includes an Application Wizard that gives you a head start on
application development by generating a basic application framework, complete with the supporting folder
structure and control files.
This chapter looks briefly at each of the main tabs within the Datacap Studio window. Later in the chapter,
we‟ll use the Datacap Studio Application Wizard to start the TravelDocs application that we‟ll be developing
throughout this guide.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
17
INTRODUCING DATACAP STUDIO
QUICK TOUR OF THE USER INTERFACE
You‟ll be using Datacap Studio extensively throughout this guide to develop the sample TravelDocs
application. In this section, we‟ll take a quick tour of the Datacap Studio interface by opening one of the prebuilt sample applications that‟s installed with Taskmaster.
STARTING TASKMASTER SERVER
Datacap applications run under the control of Taskmaster Server, which runs in the background as a
Windows service. Depending on how your system is configured, Taskmaster Server may start automatically or
you may need to start it manually. If the server isn‟t running, you won‟t be able to log in to Taskmaster.
To start Taskmaster Server:
1. Click Start > All Programs > Datacap > Taskmaster Server > Taskmaster Server Manager.
2. In the Taskmaster Server Manager window, check to see if the status is “Running.” If it isn‟t, click Start.
3. Confirm the status is “Running” and then click Close. The server is now running in the background.
OPENING A SAMPLE TASKMASTER APPLICATION
When you‟ve confirmed that Taskmaster Server is running, you can start Datacap Studio and open any of the
sample applications.
To open one of the sample applications in Datacap Studio:
1. Click Start > All Programs > Datacap > Datacap Studio > Datacap Studio.
2. In the Select Application window, select one of the existing sample applications (for example, 1040EZ)
and click Next.
3. In the Taskmaster Login window, make sure the NT authentication checkbox is not checked.
4. Enter User ID: admin, Password: admin, Station ID: 1 and click Finish.
18
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
INTRODUCING DATACAP STUDIO
PANEL ORGANIZATION WITHIN THE DATACAP STUDIO WINDOW
The Datacap Studio window has three main tabs:

Rulemanager: This is the primary application development area.

Zones: This is where you create page fingerprints and set up recognition zones.

Test: This provides integrated execution and debugging tools for testing your application.
Each main tab has several tabbed panes. The default layout for the Rulemanager tab is shown below.
You can customize the workplace by reorganizing the panes, removing panes, and adding panes.
To move a pane:
Use the mouse to drag the pane‟s tab from its current location. You‟ll see a set of
insertion points located around the window. You can move the pane to the left, right,
top, or bottom of another pane, or to the left, right, top, or bottom of the window. The
center option lets you combine tabs. As you move the pointer over an insertion point,
you‟ll see a shaded area indicating the corresponding location. Drop the pane on an
insertion point to move the pane.
To remove a pane:
Right-click the pane‟s tab and choose Close.
To add a pane:
Right-click any tab and choose Show tabs. Then choose from the available panes. Once
initially placed, you can move the new pane as described above.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
19
INTRODUCING DATACAP STUDIO
WHAT’S ON THE RULEMANAGER TAB
The Rulemanager tab includes the following panes:
20
Panel
Description
Document hierarchy
Defines the structure of the documents you are processing and how each element
within the structure is processed (see “The Document Hierarchy” on page 25).
Rulesets
Defines the rules, functions, and actions that make up each ruleset (see “Rulesets,
rules, and actions” on page 42).
Task profiles
Defines the rulesets that are run by each task profile (see “Task profiles and
rulesets” on page 41).
Actions library
Provides access to the complete library of pre-built actions. To get help on an
action, select the action and click the Information
button.
Properties
Displays the properties for the selected document hierarchy or ruleset object. If
the corresponding pane is locked for editing, you can also modify existing
properties, including specifying action parameters.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
INTRODUCING DATACAP STUDIO
WHAT’S ON THE ZONES TAB
The Zones tab includes the following panes:
Panel
Description
Fingerprints
Displays the application’s fingerprint library and lets you add fingerprints for new
page types (see “Fingerprint matching” on page 62).
Document hierarchy
Defines the structure of the documents you are processing and how each element
within the structure is processed (see “The Document Hierarchy” on page 25).
Properties
Displays the properties for the selected document hierarchy object. If the
document hierarchy is locked for editing, you can also modify existing properties.
The Properties panel also lets you specify recognition options for the selected
object. Taskmaster supports multiple recognition engines. Tabs for ICR/C, BAR/P,
and OCR/S are displayed by default. You can access other tabs by right-clicking
within the Properties panel and choosing Show tabs.
Image View
Displays the selected fingerprint image and any recognition zones. This is also
where you draw new recognition zones (see “Using fingerprints to identify
recognition zones” on page 98). If you created the fingerprints using full page
recognition, the Text tab at the bottom lets you view the recognition results.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
21
INTRODUCING DATACAP STUDIO
WHAT’S ON THE TEST TAB
The Test tab includes the following panes:
Panel
Description
Workflow
Displays the job types and task profiles as defined in the Taskmaster Administrator
window (see “Managing the workflow from Taskmaster ” on page 43). This is also
where you can run a batch through the workflow.
Runtime batch hierarchy
When a batch is running, this panel displays the runtime batch hierarchy, including
any data values. If you select a page object, the page is displayed in the Image
panel.
Document hierarchy
Displays the structure of the documents you are processing and how each element
within the structure is processed.
Rulesets
Displays the rules, functions, and actions that make up each ruleset. As you step
through the workflow you can see the current execution point.
Image/Text
Displays the selected page in the runtime batch hierarchy.
Batch Pilot data
Displays batch level information for the batch that is running.
Properties
Displays the properties for the selected document hierarchy or ruleset object (read
only).
Breakpoints/Runtime
state/Call stack
This functionality is beyond the scope of this guide.
Some of the functionality on the Test tab is beyond the scope of this guide; however, the guide does cover
running batches from the Test tab, use of breakpoints, and single-stepping through the workflow.
22
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
INTRODUCING DATACAP STUDIO
TRAVELDOCS: STARTING THE TRAVELDOCS APPLICATION
CREATING THE APPLICATION FRAMEWORK
1. If the Datacap Studio window is not already open, click Start > All Programs > Datacap > Datacap
Studio > Datacap Studio. Then, in the Select Application dialog, click Close.
2. In the Datacap Studio window, click the Rulemanager tab, and then click the Datacap Application
Wizard
button at the top right.
3. In the Datacap Application Wizard dialog, click Next.
4. Select Create a new RRS application and click Next.
5. In the Application Name field, type TravelDocs.
6. Leave the Datacap Folder and Destination fields set to C:\Datacap.
7. Click Next through the remaining dialog boxes and then click Finish to create the new application.
 Although you can specify dictionaries, fingerprints, and sample images from the wizard, we‟ll do
this later from within Datacap Studio.
8. When the Summary dialog is displayed, click Close.
CONNECTING TO THE APPLICATION
Datacap applications run under the control of Taskmaster Server Manager. When you create a new
application using the Application Wizard, the application is registered with the local Taskmaster Server.
Before you can work with the application in Datacap Studio, you must connect to the application.
To connect to the application from Datacap Studio:
1. In the Datacap Studio window, click the Connection Wizard button
at the top right.
2. Select the TravelDocs application and click Next.
3. Log in using User ID: admin, Password: admin, and Station: 1.
4. Click Finish. Datacap Studio displays the new project.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
23
Chapter 3
THE DOCUMENT HIERARCHY
The document hierarchy (also referred to as the “setup DCO”) defines:

The structure of the documents you are processing

How Taskmaster processes each element within the structure
This chapter focuses on the first of these items – how the structural elements (documents, pages, and fields)
are defined within the document hierarchy. We‟ll cover how Taskmaster processes each element later in this
guide (see “Rule Execution” on page 73).
Also in this chapter, we‟ll begin developing the TravelDocs application. The TravelDocs application lets you
put into practice some of the concepts covered in each chapter, so here we‟ll implement the document
hierarchy as specified by the business requirements.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
25
THE DOCUMENT HIERARCHY
UNDERSTANDING THE DOCUMENT HIERARCHY
DOCUMENT STRUCTURE
The document hierarchy describes the structure of the documents your application is designed to handle. The
levels within the hierarchy are batch, document, page, and field.
Batch
Document
Page
Field
Document
Page
Field
Field
Page
Field
Field
Page
Field
Field
Field
At the top of the document hierarchy is the batch, which refers to all pages of all document types. Beneath
the batch level, the document hierarchy defines:

The document types your application can process. You may have only one type, or you may have
multiple types.
Example: The TravelDocs application processes car rental documents, hotel expense documents, and
flight documents.

The page types within each document type. Each document may have only one page type, or it may have
multiple types.
Example: The car rental document includes the rental agreement page and the optional insurance page,
while the flight document has only an air ticket page.

The number and order of pages within each document type. Pages can be required or optional.
Example: A car rental document has at most two pages. The rental agreement page is required and must
come first; the insurance coverage page is optional.

The data fields within each page type. Data fields too can be required or optional.
Example: The hotel document‟s “Other Charges” page has fields for expense category, number of items,
unit cost, and total cost.
26
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
GENERATING STRUCTURED CONTENT FROM UNSTRUCTURED DOCUMENTS
In a typical Taskmaster application, documents start as a batch of unidentified image files – one image per
page. A single batch may contain a mix of document types, and each document may contain a different
number of pages of different types. There is nothing within the page image that identifies the page type or any
of the data on the page. In other words, the page images do not contain any structured content.
Before Taskmaster can begin to extract data it must identify the individual page types. There are several ways
to do this, but the most common technique is called fingerprint matching (described later in this section).
Taskmaster then maps pages to documents and fields to pages, using the information in the document
hierarchy. After identifying the fields and their locations within each page, Taskmaster can then extract the
data and stores it in a structured format, known as the runtime batch hierarchy.
Batch
(TravelDocs)
1. Input batch
2. Identify pages
3. Create documents
Batch
(TravelDocs)
Page 1
Page 2
Page 3
Document 1 (Car_Rental)
Page 5
Page 4
Document 2 (Hotel)
Page 6
Document 3 (Flight)
1. Locate fields
2. Recognize values
Car_Rental
Hotel
Room_Receipt
Meals
Laundry
Other_Charges
Category
Laundry
Num_Items
4
Unit_Cost
$3.50
Total_Cost
$14.00
Flight
Field1 (Category)
4
Field2 (Num_Items)
$3.50
Field3 (Unit_Cost)
$14.00
Field4 (Total_Cost)
Page 5 (Other_Charges)
Runtime batch hierarchy
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
27
THE DOCUMENT HIERARCHY
MAPPING THE DOCUMENT HIERARCHY TO THE RUNTIME BATCH HIERARCHY
The document hierarchy describes the general structure of the documents your application supports in terms
of document types, page types, and fields. A runtime batch, on the other hand, describes specific documents
containing specific pages and specific data. Using object oriented terminology:

The document hierarchy defines the document, page, and field classes.

The runtime batch describes a set of objects built from those classes. Each object has a set of variables
derived from the parent class, and each variable has a value.
While the document hierarchy describes a single, generalized version of each document and page type, a
runtime batch can have any number of documents and pages. In the TravelDocs application below, the
document hierarchy on the left defines the three document types: Car_Rental, Hotel, and Flight. The runtime
batch on the right includes two car rental documents, two hotel documents, and two flight documents. Each
runtime document has one or more pages, and each page has the number of fields defined in the document
hierarchy for that page type. You can see the field values for one instance of the “Other Charges” page.
Document Hierarchy (DCO)
Runtime Batch Hierarchy
Batch 20100100.001 (TravelDocs)  Batch
TravelDocs
Other
Document 20100100.001.01 (Car_Rental)
Car_Rental
Rental_Agreement
Optional_Insurance
Hotel
Page TM000001 (Rental_Agreement)
Page TM000002 (Optional Insurance)
Document 20100100.001.02 (Hotel)  Document
Page TM000003 (Room_Receipt)
Room_Receipt
Page TM000004 (Meals)
Meals
Page TM000005 (Other_Charges)  Page
Other_Charges
Category
Num_Items
Unit_Cost
Total_Cost
Flight
Category
Laundry
 Field
 Value
Num_Items
4
Unit_Cost
$3.50
Total_Cost
$14.00
Document 20100100.001.03 (Flight)
Page TM000006 (Air_Ticket)
Document 20100100.001.04 (Car_Rental)
Page TM000007 (Rental_Agreement)
Document 20100100.001.05 (Flight)
Page TM000008 (Air_Ticket)
Document 20100100.001.06 (Hotel)
Page TM000009 (Room_Receipt)
Page TM000010 (Other_Charges)
28
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
HANDLING DIFFERENT VERSIONS OF EACH PAGE TYPE
The runtime batch hierarchy shown on the previous page includes two car rental documents, two hotel
documents, and two flight documents. The car rental documents may be from different car rental companies;
the hotel documents may be from different hotel chains; and the flight documents may be from different
airlines.
Although individual pages within a runtime batch may be of the same type (for example, our runtime batch
has two pages of type Rental_Agreement), the pages may look very different. Structurally, however, they
contain the same data – at least in terms of the data that interests us.




 Pickup_Date

Air_Ticket
 Pickup_Location
 Return_Date
 Return_Location
 Total_Cost





How do you know where the data you need is located if it‟s not always in the same position on the page? The
answer is you create a “fingerprint” for each variant and store the field location information for each variant
in the document hierarchy. We‟ll cover this later.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
29
THE DOCUMENT HIERARCHY
TRAVELDOCS: CREATING THE DOCUMENT HIERARCHY
In this section we‟ll create the document hierarchy for the TravelDocs application we‟re developing. The goal
here is to create generalized definitions (classes) for the document types, page types, and fields the application
supports. This will enable our Taskmaster application to convert a collection of unstructured page images into
a structured runtime batch hierarchy containing the relevant business data.
Document 1 (Car_Rental)
Page
images
Page 1
Batch
(TravelDocs)
Page 2
Laundry
1. Input batch
2. Identify pages
3. Create docs
4. Locate fields
5. Recognize values
Field1 (Category)
4
Field2 (Num_Items)
$3.50
Page 3
Page 4
Field3 (Unit_Cost)
Page 5
$14.00
Document 2 (Hotel)
Field4 (Total_Cost)
Page 5 (Other_Charges)
Page 6
Document 3 (Flight)
Car_Rental
Hotel
Room_Receipt
Meals
Other_Charges
Category
Runtime batch
hierarchy
Laundry
Num_Items
4
Unit_Cost
$3.50
Total_Cost
$14.00
Flight
30
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
EXAMINING THE DEFAULT DOCUMENT HIERARCHY
The Datacap Studio Application Wizard creates a default document hierarchy you can use as a starting point.
The default hierarchy includes:

A batch node (with the same name as the application)

A page type “Other” (this is the default type Taskmaster assigns to all pages prior to page identification)

A default document type called “Document”

A default page type called “Page”

One default field called “Field” associated with the page type “Page”
Application
Batch node
Page type “Other”
Default document type
Default page type
Default field type
Within the document hierarchy the “Open” and “Close” nodes define the rules that are assigned to each
element within the hierarchy. For example, the “Open” node beneath the page type “Other” defines the rules
and actions that Taskmaster executes when it begins processing a page of type “Other.” Rule execution is
covered in detail later in this guide (see “Rule Execution” on page 73).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
31
THE DOCUMENT HIERARCHY
CREATING NEW DOCUMENT TYPES
The business requirements specification for the TravelDocs application defines three document types:

Car rental

Hotel

Flight
We‟ll begin by adding these document types to the hierarchy.
1. In the Document Hierarchy pane, click the Lock DCO for editing button (the padlock button) to lock
the document hierarchy for editing.
 You‟ll find that terms “DCO” and “document hierarchy” are used interchangeably.
2. Expand the tree so you can see the default document and page types.
3. Select the Document node and then click on it once to edit the name. Change the name from
Document to Car_Rental and press Enter.
 You can‟t include spaces in any of the document hierarchy node names.
4. Right click on the TravelDocs batch
the box and press Enter.
node and choose Add multiple > Documents. Then type 2 in
5. Rename the new documents from Document1 and Document2 to Flight and Hotel. The document
hierarchy should look like the one below.
6. Click the Save button.
32
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
CREATING NEW PAGE TYPES
The business requirements specification defines the following page types for each document type:
Document types:
Car_Rental
Page types:
Hotel
Rental_Agreement 
Room_Receipt 
Optional_Insurance 
Meals 
Flight
Air_Ticket 
Other_Charges 
To simplify the application slightly, we‟ll skip the Meals and Other_Charges pages for now.
1. Make sure the document hierarchy is still locked for editing (you should see a closed padlock
yellow button). If it isn‟t, click the button to lock the document hierarchy.
on a
2. Beneath the Car_Rental document node, select the default Page node and change the name from Page
to Rental_Agreement.
3. Right-click on the Car_Rental document node and choose Add > Page. Then change the name of the
page from Page1 to Optional_Insurance.
4. Right click on the Flight document node and choose Add > Page. Then expand the Flight node and
change the name of the page from Page1 to Air_Ticket.
5. Right click on the Hotel document node and choose Add > Page. Then expand the Hotel node and
change the name of the page from Page1 to Room_Receipt.The document hierarchy should look like
the one below.
6. Click the Save button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
33
THE DOCUMENT HIERARCHY
SPECIFYING THE STRUCTURE OF DOCUMENTS AND PAGES WITHIN THE BATCH
The business requirements specify the following rules for the structure of each document type:
Number
Required?
Order
Any number per batch
No
Any position within batch
Rental Agreement
One per document
Yes
Must be first in document
Optional Insurance
One per document
No
May not be first in document
Any number per batch
No
Any position within batch
One per document
Yes
Must be first in document
Any number per batch
No
Any position within batch
One per document
Yes
Must be first in document
Car Rental
Flight
Air_Ticket
Hotel
Room_Receipt
Within the document hierarchy, the following variables define the structure of the batch:

Max: Maximum number of objects of this type for
each parent object. 0 means no maximum; 1 means
Taskmaster creates a new document each time it
encounters a page of this type; etc.
Max
Min
Order
0
0
0
Rental Agreement
1
1
1
Optional Insurance
1
0
2
0
0
0
1
1
1
0
0
0
1
1
1
1. Make sure the document hierarchy is still locked for editing (you should see a closed padlock
yellow button). If it isn‟t, click the button to lock the document hierarchy.
on a


Min: Minimum number of objects of this type for
each parent object. 0 means no minimum; 1 means
there must be at least one; etc.
Order: Position of this object relative to other child
objects of the same parent. 0 means any position.
Using these variables we can define the structure of the
batch as shown on the right.
Car Rental
Flight
Air_Ticket
Hotel
Room_Receipt
To specify the structure of documents and pages within the batch:
2. Right click on the Car_Rental document node and choose Manage variables.
3. Make sure the Max, Min, and Order values are as specified in the table above (the Car_Rental document
is 0, 0, 0). Then click Done.
4. Right click on the Rental_Agreement page node and choose Manage variables. Then enter the Max,
Min, and Order values as specified in the table above (for example, the Rental_Agreement page is 1, 1, 1)
and click Done.
5. Repeat for each of the remaining document and page types shown in the table. Then click Save.
34
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
CREATING DATA FIELDS
The business requirements specification defines the following fields for each page type:
Rental_Agreement
Optional_Insurance
Air_Ticket
Room_Receipt
Vendor 
Vendor 
Vendor 
Vendor 
Pickup_Date 
CDW 
Outbound_From 
Arrival_Date 
Outbound_To 
Departure_Date 
Outbound_Date 
Total_Cost 
Pickup_Location 
Return_Date 
Return_Location 
Car_Type 
Options 
Nav_System 
Child_Seat 
Fuel_Service 
CDW_Option 
PAI 
PAI_Option 
PEP 
PEP_Option 
ELP 
ELP_Option 
Total_Cost 
Return_From 
Return_To 
Return_Date 
Airfare 
Taxes 
Total_Cost 
Total_Cost 
To simplify the application slightly, we‟ll skip the fields marked  for now.
1. Make sure the document hierarchy is still locked for editing.
2. Expand the Rental_Agreement page, select the default Field node, and change the name from Field to
Pickup_Date.
3. Right click on the Rental_Agreement page and choose Add multiple > Fields. Then type 6 in the box
and press Enter.
4. Rename the new fields Pickup_Location, Return_Date, Return_Location, Car_Type, Options, and
Total_Cost.
5. Right click on the Options field and choose Add multiple > Fields. Type 3 in the box and press Enter.
6. Expand the Options and rename the new fields Nav_System, Child_Seat, and Fuel_Service.
7. Click the Save button.
8. Use the same procedure to add the fields to the Optional_Insurance page. This page has four fields, each
of which has one sub-field. Then click Save.
 The Rental_Agreement, Room_Receipt, and Air_Ticket pages all have a field called “Total_Cost.”
When you add this field to the Room_Receipt and Air_Ticket pages, Datacap Studio displays a
message asking if you want to reference the existing object. Click Yes. You‟ll see the same message
when you add the Return_Date field to the Air_Ticket page. Click Yes again. For an explanation,
see “Sharing field definitions across the document hierarchy” on page 38.
9. Repeat the steps above for the Air_Ticket and Room_Receipt pages to add the fields marked  in the
table:

The Air_Ticket page has nine fields

The Room_Receipt page has three fields
Click Save after each page. The finished hierarchy should look like the one shown on the next page.
10. Click the Save button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
35
THE DOCUMENT HIERARCHY
COMPLETE DOCUMENT HIERARCHY FOR THE TRAVELDOCS APPLICATION
Car_Rental document
36
Flight document
Hotel document
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE DOCUMENT HIERARCHY
SPECIFYING THE STRUCTURE OF FIELDS WITHIN EACH PAGE
In the same way you specified the structure of documents and pages within the batch, you can specify the
structure of fields within each page type. The default values for fields are Max=0, Min=0, and Order=0.
The business requirements for the TravelDocs application specify that there must be one and only one
instance of each field on each page. The order of the fields within the page is not important. This means each
field should be set up with Max=1, Min=1, and Order=0.
 Specifying the values for all of the fields in the TravelDocs application takes a while and is not actually
necessary for the purposes of this tutorial (since all the sample pages comply). You can skip this step if
you want. If you do skip this step, make sure you click the Save button and then click the Unlock
DCO (padlock) button when done.
To specify the structure of fields within each page:
1. Right-click each field node in turn and choose Manage variables. Set Max=1, Min=1, and Order=0 and
click Done.
2. When you have done this for all fields in the document hierarchy, including the sub-fields, click the Save
button. Then and then click the Unlock DCO (padlock) button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
37
THE DOCUMENT HIERARCHY
SHARING FIELD DEFINITIONS ACROSS THE DOCUMENT HIERARCHY
The document, page, and field information you specify in the document hierarchy pane is stored in the file
C:\Datacap\dco_<application_name>\<application_name>.xml. This file is referred to as the document hierarchy or
the setup DCO.
The document hierarchy for the TravelDocs application is C:\Datacap\dco_TravelDocs\TravelDocs.xml.
This file defines the structure of the batch, and the structure of each document type, page type, and field
within the batch. Although the batch structure is hierarchical, the structure of the file is flat, and so the name
of each document, page, and field object must be unique.
Document (Batch) Hierarchy
TravelDocs
Setup DCO File
<B type=”TravelDocs”>...</B>
Car_Rental
<D type=”Car_Rental”>...</D>
Flight
<D type=”Flight”>...</D>
Hotel
<D type=”Hotel”>...</D>
Room_Receipt
<P type=”Room_Receipt”>...</P>
Arrival_Date
<F type=”Arrival_Date”>…</F>
Departure_Date
<F type=”Departure_Date”>…</F>
Total_Cost
<F type=”Total_Cost”>…</F>
Within the document hierarchy file, the object definition specifies the child objects referenced by each parent
object. For example, the “Room_Receipt” definition specifies that a room receipt page has three child fields:
<P type="Room_Receipt">
<V n="rules"></V>
<F type="Arrival_Date" pos="0" min="0" max="0"/>
<F type="Departure_Date" pos="0" min="0" max="0"/>
<F type="Total_Cost" pos="0" min="0" max="0"/>
</P>
This structure allows multiple parent objects to reference the same child object.
In the TravelDocs application, the “Rental_Agreement,” “Air_Ticket,” and “Room_Receipt” pages all have a
“Total_Cost field”. The first time you add the field “Total_Cost” to the document hierarchy, Taskmaster
adds it to the document hierarchy. Later, when you add the field to the other page types, Taskmaster asks if
you want to reference the existing field definition.
38
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 4
THE TASKMASTER WORKFLOW
During the data capture process, documents go through a workflow consisting of several discrete tasks: page
identification, character recognition, field validation, verification, etc. Some tasks require operator
intervention, whereas other run automatically.
This chapter examines how Taskmaster‟s queuing mechanism moves batches of documents through the
workflow and how tasks are implemented programmatically in terms of rulesets, rules, and actions.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
39
THE TASKMASTER WORKFLOW
UNDERSTANDING THE TASKMASTER WORKFLOW
WORKFLOWS, JOBS, AND TASKS
A workflow consists of a series of tasks and defines a way to process documents. Although Taskmaster
applications can include multiple workflows, we‟ll be focusing on single workflow applications throughout
this guide. The standard workflow generated by the Application Wizard includes three job types:

Main Job: This is the standard workflow for processing documents from Taskmaster Client (the “thick”
client). It takes a batch of documents through each of the processing steps identified earlier (input
documents, identify pages, etc.) and is the workflow we‟ll focus on initially.

Fixup Job: This job is used only when there are document integrity problems and displays the batch to
an operator for corrective action (see “Handling document integrity problems” on page 89).

Web Job: This job is like the Main Job except that it defines the workflow for jobs initiated from the
Taskmaster Web client. It supports remote scanning and lets users upload new batches to the server.
A job consists of one or more tasks. To process a batch of documents, you must run the batch through each
task in the selected job. Some tasks (for example, Export) run without operator intervention, whereas others
(for example, Verify) require an operator.
The tasks in the workflow are determined by the job type you select. You can see the tasks associated with
each job type by looking in the Workflow pane on the Datacap Studio Test tab. The workflow for “Main
Job” includes five tasks: VScan, PageID, Rulerunner, Verify, and Export. Each task is linked to a task profile
(see “Mapping a task to a task profile” on page 48). Descriptions of each task profile are provided below.
40
Task Profile
Description
VScan
A “virtual scanning” profile that gets pages into your application by copying images files
from a specified location.
PageID
Identifies the incoming pages by comparing them to known page types using fingerprint
matching. Depending on the identification method used, this profile may perform full page
OCR. It may also perform image cleanup.
Rulerunner
Organizes pages into documents, locates the fields defined for that page type, and performs
OCR to recognize the field data (or obtains the data from the full page OCR results). Also
runs validation rules to ensure the data is valid.
Verify
Runs during the verification stage, when pages are displayed to an operator to ensure
recognition was accurate and to handle any validation errors.
Export
Exports the structured document data to an output file, a document management system, a
database, or an external business process (can also include the original image).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE TASKMASTER WORKFLOW
In addition to the task profiles that run as part of the “Main Job” workflow, there are two other important
task profiles the Application Wizard generates: FingerprintAdd and ImageFix.
Task Profile
Description
FingerprintAdd
Generates the fingerprint files when you add new page types to the application from the
Datacap Studio Zones tab.
ImageFix
Runs when you enhance a fingerprint image using the Image Processing window from the
Zones tab.
TASK PROFILES AND RULESETS
Each task is linked to a task profile that includes one or more rulesets. The default rulesets generated by the
Application Wizard are displayed in the Task Profiles pane on the Datacap Studio Rulemanager tab.
The default “Main Job” workflow uses all of these task
profiles. Within each task profile, rulesets run in the order
shown here, although a ruleset will not do anything if the
rules in it aren’t associated with any objects in the
document hierarchy. This is covered later.
The FingerprintAdd profile runs when you add a new fingerprint
to the application from the Zones tab.
The Imagefix profile runs when you enhance a fingerprint image
using the Image Processing window from the Zones tab.
Each ruleset defines one or more rules that you can run on specific documents, pages, or fields, or on the
entire batch. The task profile specifies only that certain rulesets are associated with that profile. Nothing runs
until you actually associate a specific rule with specific document, page, or field, or with the batch, as
described under “Rule Execution” on page 73.
 The order of the rulesets within the task profile is important, since it defines the order in which
Taskmaster runs rules. For example, you can‟t check the integrity of a document before you‟ve created
it, so the CreateDocs ruleset must come before the Document Integrity ruleset.
Note that multiple task profiles can reference the same ruleset. In the example above, the Rulerunner and
Verify profiles both reference the Validate ruleset. This is because you typically run validation rules after data
recognition, and run the same rules again after verification by the operator.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
41
THE TASKMASTER WORKFLOW
RULESETS, RULES, AND ACTIONS
A ruleset consists of one or more rules. In the example below, the default PageID ruleset has two rules:
PageID and Set Fingerprint Params. You can see the rules associated with each ruleset in the Rulesets pane
on the Datacap Studio Rulemanager tab.
 PageID ruleset
 PageID rule
 Set Fingerprint Params rule
Rules are assigned to process specific objects in the document hierarchy (for example, to analyze each page
and identify its type). However, the rule itself is defined by the programmed functions and actions within it.
 PageID rule
 Function
 Actions
The default PageID rule consists of one function and two actions, as shown above. The PageID function first
executes the AnalyzeImage action. If AnalyzeImage is successful (returns True), the function executes
FindFingerprint. If AnalyzeImage fails (returns False), the function fails and Taskmaster executes the
next function within the rule. In this case there isn‟t one, but you could add an exception handling function to
handle the error. Rule execution is covered in more detail later (see “Rule Execution” on page 73).
42
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE TASKMASTER WORKFLOW
MANAGING THE WORKFLOW FROM TASKMASTER CLIENT
To open the Taskmaster Client window:
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.
2. Select the application you want to open (for example, TravelDocs) and click OK.
3. Log in using User ID: admin; Password: admin; Station ID: 1.
The Taskmaster Client window provides three key functions:

Lets operators run batches through the application workflow

Lets administrators and privileged operators monitor the job queue

Lets administrators configure the application and its components
RUNNING BATCHES THROUGH THE WORKFLOW
The Operations window displays shortcuts to the various tasks in the application workflow. The default
shortcuts as generated by the Application Wizard are shown below.
Shortcut
Description
Runs the Main Job’s Verify task profile.
Runs the Main Job’s Export task profile.
Runs the Main Job’s VScan task profile.
Runs the Fixup Job’s FixUp task profile. This is used when there are document integrity problems and
displays the batch to an operator (see “Handling document integrity problems” on page 89).
Runs the Main Job’s Rulerunner task profile.
Runs the Main Job’s PageID task profile.
This shortcut can run any task profile that does not require operator intervention. What runs
depends on the status of batches in the job queue.
 The shortcuts do not map directly to the task profiles you configure in Datacap Studio. The mappings
are explained later in this section (see “Configuring shortcuts” on page 46).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
43
THE TASKMASTER WORKFLOW
MONITORING THE JOB QUEUE
During the data capture process, documents go through a workflow consisting of several discrete tasks: input,
page identification, recognition, validation, verification, etc. Taskmaster uses a queuing mechanism to move
batches of documents through the workflow. The Job Monitor lets you view the current status of all batches.

To open the Job Monitor, click the Job Monitor
button.
Each row in the job monitor represents one batch. For each batch you can see its current position in the
workflow. For example, in the screen above:

The top batch has been scanned and is now ready for page identification (Main Job.PageID: pending).

The next batch has been through page identification and is now ready to go through the stage that
includes document creation, recognition, and validation (Main Job.Rulerunner: pending).
You can execute a batch by double-clicking its row number in the Job Monitor window, but typically an
operator selects a task via its shortcut (for example, Verify/Fixup) and Taskmaster runs the first batch that is
queued for that task. This lets multiple operators work from the same job queue and Taskmaster delivers
batches to them on demand.
Some tasks do not require an operator. For example, page identification, recognition, and validation are all
“background” tasks, meaning they can run without operator intervention. You can configure Taskmaster
Quattro to run background tasks automatically. Taskmaster Quattro monitors the job queue for batches that
are pending for specific background tasks. When a batch is ready, it takes the batch and processes it
automatically. We‟ll cover Taskmaster Quattro later in this guide (see “Using Quattro to automate
background tasks” on page 270).
44
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE TASKMASTER WORKFLOW
WHAT’S IN THE TASKMASTER ADMINISTRATOR WINDOW
The Taskmaster Administrator window lets you configure your application and its components.
To open the Taskmaster Administrator window:

Click the Administrator
button on the Taskmaster Client toolbar.
The Taskmaster Administrator window includes the following tabs:
Tab
Description
More Information
Workflow
Displays the application’s workflows, jobs, and tasks
Modules
Displays the application’s task modules
See “Configuring jobs and
tasks” on page 47
Groups
Configures user groups and permitted tasks for each group
Users
Configures users and permitted tasks for each user
Stations
Configures stations and permitted tasks for each station
Shortcuts
Configures the batch selection mode and the icons displayed
in the Operations window
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Beyond the scope of this
guide
See “Configuring
shortcuts” on page 46
45
THE TASKMASTER WORKFLOW
CONFIGURING SHORTCUTS
The shortcuts in the Taskmaster Client Operations window do not map directly to the task profiles you
configure in Datacap Studio.
To see which tasks are associated with a shortcut:

Right-click the shortcut and look at the bottom entry in the popup menu.
You configure the shortcut-task mappings on the Taskmaster Administrator Shortcuts tab. In the screen
below, the Verify/Fixup shortcut is mapped to the Main Job‟s Verify task.
To see which task profile is mapped to the Verify task, you must look in the task‟s Setup window (see
“Mapping a task to a task profile” on page 48).
46
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
THE TASKMASTER WORKFLOW
CONFIGURING JOBS AND TASKS
The Workflow and Modules tabs let you configure workflows, jobs, and tasks.
Tab
Description
Workflow
Displays the application’s workflows, jobs,
and tasks:
 The Module field displays the module
associated with the selected task.
 The Setup button opens the selected
task’s setup window. Settings are saved
to the task’s project (.bpp or .icp) file
(see the Modules tab to get the project
file name).
 Use the Add button to create new jobs
and tasks.
Modules
Displays the application’s modules:
 The Parameters field displays the
project (.bpp or .icp) file associated with
the selected module.
On the Workflow tab you can see the three job types generated by the Application Wizard:

Fixup Job

Main Job

Web Job
Each job type has one or more tasks, and each task is associated with a module. For example, the PageID task
in the Main Job workflow is associated with the rrsAssemble module.
On the Modules tab you can see the application‟s task modules. Each module is associated with a project
(.bpp or .icp) file. For example, the rrsAssemble module is associated with the file rrs_assemble.bpp. This file
is stored in the application‟s dco_<application_name> folder.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
47
THE TASKMASTER WORKFLOW
MAPPING A TASK TO A TASK PROFILE
When you run a task from Taskmaster Client, Taskmaster runs the associated task profile. The Batch Pilot
Setup window lets you map a task to a task profile.
1. On the Taskmaster Administrator Workflow tab, select the task (for example, PageID).
2. Click the Setup button.
The associated task profile is marked with an arrow  and is highlighted in gray. The profile name is
stored in the project (.bpp or .icp) file as the TProfile entry.
48
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 5
DOCUMENT INPUT
Taskmaster works primarily with TIF image files, so the first activity in any Taskmaster workflow is usually to
get the documents into the application‟s input repository and into TIF format. Documents may be hardcopy
or electronic – if hardcopy, they must be scanned and the resulting files moved to the application repository;
if electronic, they may come from a variety of sources and may be in a variety of formats.
This chapter examines different ways to get documents into your application for processing. At the end of the
chapter, we‟ll implement a simple form of document input, virtual scanning, within the TravelDocs application.
We‟ll also show you how to set up a scanner for use with Taskmaster. This final step requires that you have a
scanner with an ISIS driver attached to the computer you‟re using, but you can skip this step if you don‟t.
 We‟ll cover remote scanning using a TWAIN scanner later in this guide (see “Remote scanning” on
page 317).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
49
DOCUMENT INPUT
ENTERING ELECTRONIC DOCUMENTS (“VIRTUAL SCANNING”)
If your application is processing documents that are already available in electronic format, you can use
Taskmaster‟s “virtual scanning” capability to input the documents. Taskmaster can handle a wide range of
document types, including PDF files, fax files, and Microsoft Office documents.
The application framework generated by the Application Wizard includes a basic virtual scanning task that
copies files from the specified folder to the runtime batch folder.
Note that the Scan action copies the documents to the target location, leaving the originals in the “images”
folder. This is useful during application development and testing, but in a production environment you may
want to back up the source images and remove them from the images folder. You can do this by including the
MoveImageFileToDirectory action before the Scan action.
Library
Action
Description
VScan
MoveImageFileToDirectory
Moves the current source image to the location you specify. This
action must precede the Scan action. If the Scan action is not called,
the file is not moved.
DOCUMENT CONVERSION
If your documents are not already in TIF format, you must convert them during the first stage of the
processing workflow. The action categories in the “Convert” library handle a variety of file types:
50

Excel®

Image files (JPEG, BMP, PNG, and GIF)

Outlook®

PDF

Word
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT INPUT
SCANNING HARDCOPY DOCUMENTS
Taskmaster supports two basic scanning modes:

Local scanning using a scanner attached to and controlled from Taskmaster Client. Taskmaster Client
supports ISIS scanners only.

Remote scanning using the Taskmaster Web client to scan and then upload the documents. Taskmaster
Web supports TWAIN scanners only.
Make sure your scanner driver is installed and the scanner is functioning properly before attempting to
configure Taskmaster to use the scanner.
LOCAL SCANNING
When you scan from Taskmaster Client using a local scanner, the scanned image files are delivered directly to
the application‟s runtime batch folder. The scan task is responsible for creating the runtime batch files.
The application framework generated by the Application Wizard does not include a scan task, so before you
can scan locally you must create one. To create a scan task:

Create a new Batch Pilot project for the scan task.

Create a new module and link it to the Batch Pilot project.

Remove the existing VScan task from the “Main Job” workflow or create a new workflow for scanning
(since a job can have only one batch creation task).

Add a scan task to the workflow and link it to the new module.

Create a shortcut for the new scan task.
Detailed instructions are provided under “Setting up a local scanner” on page 55. The instructions are specific
to the TravelDocs application, but you can generalize then for any Taskmaster application.
REMOTE SCANNING
Taskmaster‟s distributed capture capability enables users to scan documents into a Taskmaster application
using the Taskmaster Web client. This is typically a 2-step process:

Use a “web scan” task to scan the pages and save the image files locally.

Use an upload task to upload the image files and runtime batch files to the application‟s “batches” folder.
The default application framework does not include a web scan task, so you‟ll need to create one. The process
is similar to the one described for local scanning, except that you can use the Batch Pilot project file
(rscan.icp) that‟s generated by the Application Wizard:

Create a new module and link it to the existing “rscan” project by entering /inet rscan.icp as the
module parameter.
 /inet indicates that this is a web module. If you don‟t include /inet, the task shortcut will not
appear in the web client.

Remove the existing iVScan task from the “Web Job” workflow or create a new workflow for scanning.

Add a scan task to the web workflow and link it to the new module.

Create a shortcut for the new scan task.
Remote scanning is covered in detail later in this guide (see “Remote scanning” on page 317).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
51
DOCUMENT INPUT
TRAVELDOCS: CREATING A BATCH USING VSCAN
COPYING THE SAMPLE DOCUMENTS INTO THE APPLICATION’S “IMAGES” FOLDER
The VScan (“virtual scanning”) ruleset copies images from the “images” folder into the application‟s runtime
batch folder.
Before you can run the application, you need to place some sample document images in the application‟s
“images” folder.
The image files we‟ll use are those provided in the sample images download, as summarized in the table
below.
Rental_Agreement
Optional_Insurance
Air_Ticket
Room_Receipt
Images_Page_01.tif
Images_Page_02.tif
Images_Page_06.tif
Images_Page_09.tif
Images_Page_03.tif
Images_Page_05.tif
Images_Page_07.tif
Images_Page_10.tif
Images_Page_08.tif
Images_Page_11.tif
Images_Page_04.tif
1. Locate the sample image files, Images_Page_01.tif through Images_Page_11.tif.
2. Copy the sample image files into C:\Datacap\TravelDocs\images.
 Don‟t copy Images_Page_12.tif or Images_Page_13.tif.
52
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT INPUT
MODIFYING THE VSCAN RULESET
The default VScan ruleset copies only the first four files. Here, we‟ll modify the VScan ruleset to copy up to
20 files.
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the VScan ruleset and click the
Lock/Unlock ruleset (padlock) button to lock the ruleset for editing.
2. Expand the VScan ruleset completely.
3. Select the SetMaxImageFiles action.
4. In the Properties pane, under Parameters, change the StrParam value from 4 to 20.
5. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset.
RUNNING VSCAN TO GENERATE A BATCH
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar.
5. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow, which is PageID.
6. If the runtime batch hierarchy is not already visible, click the Runtime batch hierarchy tab. If you scroll
through the list you should see 11 page objects of type “Other” (since “Other” is assigned to all pages
automatically prior to page identification).
 If you do not see any pages, make sure you copied the sample image files as described under
“Copying the sample documents into the application‟s “images” folder” on page 52.
7. There‟s no point running the PageID task profile yet (we haven‟t developed the rules to identify pages),
so right-click the “running” batch
in the Workflow pane and choose Cancel.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
53
DOCUMENT INPUT
EXAMINING THE FILES IN THE RUNTIME BATCH FOLDER
When you start a new batch, Taskmaster creates a runtime batch folder beneath the application‟s “batches”
folder. The name of the folder matches the numeric batch ID that Taskmaster generates automatically.
 C:\Datacap\TravelDocs
 batches
 20100332.001
 Runtime batch folder
Taskmaster stores all of the files associated with this batch in the runtime batch folder.
1. Open the application‟s most recent batch folder (C:\Datacap\TravelDocs\batches\<batch_id>). The
folder contains the following files:
File
Description
TM00000*.tif
A copy of each of the original sample image files (copied from the “images” folder).
VScan.script
A file to aid in debugging (not covered in this guide).
VScan.xml
The runtime document hierarchy generated by the VScan task profile.
Vscan_rrs.log
The log file generated by the VScan task profile. The log file contains detailed
descriptions of all the actions executed by the task profile and is useful for
troubleshooting (see “Taskmaster log files” on page 180).
PageID.xml
A copy of the runtime document hierarchy ready for use by the next task profile in the
workflow (PageID).
2. Open the file VScan.xml in any XML editor or text editor.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<B id="20100332.001">
 Runtime batch ID
<V n="TYPE">TravelDocs</V>
<V n="LAST_RR_TPROFILE">VScan:m:eRun</V>
<P id="TM000001">
<V n="TYPE">Other</V>
 Page type “Other” is assigned to all pages initially
<V n="STATUS">49</V>
 Status 49 indicates page scanned successfully
<V n="IMAGEFILE">tm000001.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\images_page_01.tif</V>
</P>
<P id="TM000002">
<V n="TYPE">Other</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000002.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\images_page_02.tif</V>
</P>
etc.
3. Close the file.
54
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT INPUT
TRAVELDOCS: SETTING UP A LOCAL SCANNER (OPTIONAL)
 If you do not have an ISIS scanner attached to your computer, you can skip this section. If you have an
ISIS scanner, make sure the ISIS driver is installed and the scanner is working before you proceed.
The default application framework does not include a scan task for scanning hardcopy documents into your
application. In this section we‟ll create a scan task for the TravelDocs application.
Taskmaster does not let you include both a scan task and a virtual scan task in the same job workflow (you
can have only one batch creation task per workflow). Since we‟ll need the VScan task for the remainder of
this guide, we‟ll copy the Main Job workflow, delete the VScan task from the copy, and then create the scan
task. Before we do that, we‟ll need to create a new Batch Pilot project for the new task and create a new
module.
CREATING A NEW BATCH PILOT PROJECT FOR THE SCAN TASK
1. Click Start > All Programs > Datacap > Batch Pilot > Batch Pilot.
2. Click File > New Project.
3. Select the file C:\Datacap\TravelDocs\dco_TravelDocs\TravelDocs.xml and click Open.
4. In the pane at the bottom of the Batch Pilot window, select the TravelDocs batch node. Then right-click
in the Form Path field and choose Pick form.
5. Browse to the C:\Datacap\BPilot\ISscan folder, select the file isscan.dcf, and click Open.
 This Batch Pilot .dcf file defines the default scan setup and runtime forms, as well as the default
project settings.
6. In the Batch Pilot window, click File > Save Project As.
7. Browse to the folder C:\Datacap\TravelDocs\dco_TravelDocs.
8. In the File name field, type MyISscan and click Save. This saves the project as MyISscan.bpp.
9. Close the Batch Pilot window and click Yes to save the project.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
55
DOCUMENT INPUT
CREATING A NEW MODULE
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.
2. Select the TravelDocs application and click OK.
3. Log in using User ID: admin; Password: admin; Station ID: 1.
4. In the Taskmaster Client window, click the Administrator
button. Then click the Modules tab.
5. On the Modules tab, click Add to create a new module. Then on the right side of the window, enter the
values for the new module as follows:

ID: MyISscan

Description: MyISscan

Type: Batch creation

Program: Batch Pilot

Parameters: MyISscan.bpp (do not use the browse button – type in the file name only)

Statistics table: <leave blank>

Batch ID field: <leave blank>
6. Click Apply to add the new module to the Task Modules list.
CREATING THE SCAN TASK
1. In the Taskmaster Administrator window, click the Workflow tab.
2. Select the Main Job workflow and click Copy. Name the new job Scan Job and click OK.
3. Expand the new Scan Job, select the VScan task and click Remove. Click Yes to confirm.
 A job can have only one batch creation task. Since Vscan and Scan are both batch creation tasks,
we need to remove VScan.
4. Select Scan Job and click Add to create a new task. Name the task MyISscan.
5. In the right side of the Workflow tab, enter the values for the new task as follows:

ID: MyISscan

Description: MyISscan

Module: MyISscan

Task Monitor: Normal

Queue to: Anybody anywhere

Store: Nothing
6. Select the new MyScan task and press Ctrl+ (up arrow) to move it to the top of the workflow.
7. Click Apply. Then click Done to close the Taskmaster Administrator window.
56
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT INPUT
CONFIGURING THE SCANNER SETTINGS
 After creating the scan task, you must close the Taskmaster Administrator window and click the
Unload all tasks button before you can proceed with scanner setup. This unloads the current tasks
from memory and reloads the Batch Pilot project settings from the .bpp file.
1. In the Taskmaster Client window, click the Unload all tasks
2. Click the Administrator
button.
button to open the Taskmaster Administrator window.
3. Expand the Scan Job workflow, select the MyISscan task, and click Setup. If you see any VBScript
error messages, click OK.
4. In the Settings File field, type C:\Datacap\scanner.ini.
 You can save the scanner settings locally or on the network. You can only share scanner settings
between scanners of the same type.
5. In the Imprint Script File and StartBatch fields, remove any existing text and leave these fields blank.
 Imprinting is where the scanner adds an identifier to each scanned page. A StartBatch panel is a
window that lets the user enter indexing information that is then included with the batch.
6. Click Done.
7. In the Taskmaster Administrator window, click Apply. Then click Setup to reopen the setup panel. Click
OK to any error message indicating that no scanner is selected.
8. Click Select Scanner, select your scanner, and click OK. Then click OK again.
9. Select the default scanner settings.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
57
DOCUMENT INPUT
10. Click File > Task Settings and select the Create Batch Dir under option.
11. Click OK and then click Done.
12. In the Taskmaster Administrator window, click Done and then click Yes in the message box. Then click
Done to close the Taskmaster Administrator window.
CREATING A SHORTCUT FOR THE NEW SCAN TASK
1. In the Taskmaster Client window, click the Administrator
button.
2. Click the Shortcuts tab.
3. On the Shortcuts tab, click Add to create a new shortcut.
4. In the Shortcut ID field, type Scan and click Apply. Then click OK in the message box.
5. Under Batch Selection Mode, select Manual for Hold only.
6. Under Permissions, expand Scan Job, check  the MyISscan task.
7. Click Apply and then click OK in the message box.
8. In the Taskmaster Administrator window, click Done to close the Taskmaster Administrator window.
58
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT INPUT
RUNNING THE SCAN TASK
1. In the Taskmaster Client window, click the Unload all tasks
button.
 You don‟t need to do this every time. We‟re doing it here since we just finished the project setup
and want to make sure we have the new settings loaded.
2. Load a page into your scanner‟s feeder.
3. Double-click the new Scan shortcut.
4. In the Batch Selection Mode dialog, select Auto and click OK.
5. When the page is scanned and displayed in Batch Pilot, click Finish. Then click Stop in the Task
Finished dialog.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
59
Chapter 6
PAGE IDENTIFICATION
Page identification is one of the first steps in any Taskmaster application. All incoming pages are initially
assigned the default page type “Other,” but before Taskmaster can assemble those pages into documents and
extract data from the pages, it must determine the correct type for each page.
This chapter covers page identification methods, including fingerprint recognition, structure-based
identification, text matching, and manual page identification. It also covers image enhancement, which is
typically done prior to page identification to remove lines, shading, etc. that might interfere with the
recognition process. At the end of the chapter we‟ll implement some page identification and image
enhancement techniques in the TravelDocs application.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
61
PAGE IDENTIFICATION
PAGE IDENTIFICATION METHODS
Taskmaster supports several methods for page identification, including but not limited to:




Fingerprint matching
Structure-based identification
Text matching
Manual page identification
Additionally, if your application supports only a single page type, you can simply assign a static page type to all
incoming pages. This section provides an overview of these page identification methods.
FINGERPRINT MATCHING
With fingerprint matching, Taskmaster generates a “fingerprint” that describes each incoming page. The
fingerprint can include information about the relative densities of different regions of the page or the location
of text on the page. Taskmaster then compares the new fingerprint to a library of fingerprints for known page
types. When it finds a match it assigns the corresponding page type.
Incoming page fingerprint (type unknown)
?
?
 No
Car Rental #1
rental agreement
?
 No
Car Rental #2
rental agreement
?
 No
Airline #1
air ticket
 Yes
Fingerprint library
Hotel #1 room
receipt
In the example above, the incoming page matches the Hotel #1 room receipt. Taskmaster assigns it the type
“Room_Receipt” and records the ID of the matching fingerprint in the runtime batch hierarchy. The match
will not be exact since the data on the page will most likely be different, but we‟re looking for the best match.
SELECTING THE FINGERPRINT CREATION MODE
Taskmaster provides two primary methods for generating page fingerprints:

Image analysis: This scans the page image to identify the composite “blackness” of different regions of
the page. This method provides fast page identification, but requires that you perform recognition later.

Full page recognition: This performs optical character recognition to identify the locations of text
within the page. This method takes longer, especially with pages that include handwritten text, but cuts
time from subsequent workflow tasks since the full page recognition results are available for use.
Both of these methods write the resulting information to a “CCO” file that‟s stored with the original TIF
image file in the application‟s “fingerprint” folder.
When deciding which fingerprint creation method to use, the key point to remember is that:
The method you use for creating library fingerprints must be the same as the method you use to
generate runtime fingerprints during page identification.
For example, if you decide to use image analysis, then you must use image analysis in the fingerprint creation
ruleset (“FingerprintAdd”) and in the page identification ruleset (“PageID”).
 Do not try to combine these two methods, as the recognition results will most likely not be accurate.
62
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PAGE IDENTIFICATION
USING IMAGE ANALYSIS
Image analysis uses a pixel-based algorithm to generate a fingerprint (CCO) file that represents the relative
blackness of different regions of the page.
The AnalyzeImage action in the “Recog_Shared” actions library performs image analysis on an image file.
Library
Action
Description
Recog_Shared
AnalyzeImage
Converts the image (TIF) file representing the current page to a
fingerprint (CCO) file.
USING FULL PAGE RECOGNITION
Full page recognition, as its name suggests, uses the text and location of text on the page to generate the
fingerprint (CCO) file. Taskmaster includes three optical character recognition (OCR) engines, plus one
intelligent character recognition (ICR) engine that you can use to perform full page recognition:

OCR_a: ABBYY FineReader OCR engine

OCR_s: Nuance (formerly ScanSoft) OmniPage OCR engine

OCR_sr: Newer implementation of the Nuance OmniPage OCR engine

ICR_c: Open Text RecoStar ICR engine
Additional ICR engines are also available as options. As a general rule, the OCR engines work well with
machine printed text, whereas the ICR engine works well with hand printed as well as machine printed text.
Taskmaster include actions libraries for each recognition engine (ocr_a, OCR_s, ocr_sr, and icr_c). Each
library includes its own version of the full page recognition action.
Library
Action
Description
ocr_a
RecognizePageOCR_A
Recognizes all characters on the current page and populates the
page's fingerprint (CCO) file with the recognition results.
OCR_s
RecognizePageOCR_S
Recognizes all characters on the current page and populates the
page's fingerprint (CCO) file with the recognition results.
ocr_sr
RecognizePageOCR_S
Recognizes all characters on the current page and populates the
page's fingerprint (CCO) file with the recognition results.
icr_c
RecognizePageICR_C
Recognizes all characters on the current page and populates the
page's fingerprint (CCO) file with the recognition results.
PERFORMING FINGERPRINT MATCHING
The action used for all fingerprint matching regardless of the creation method used is FindFingerprint.
Library
Action
Description
AutoDoc
FindFingerprint
Tries to match the current page fingerprint to a fingerprint in the
application’s fingerprint library.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
63
PAGE IDENTIFICATION
STRUCTURE-BASED PAGE IDENTIFICATION
Structure-based identification uses the position of a page within the batch to determine its type. If your
application handles only one page type, or if the document structure is consistent (for example, all documents
are two pages with a main page and a trailing page), you can assign page types based on position. You can do
this using the SetPageType action.
Library
Action
Description
DCO
SetPageType
Assigns a page type to the current page.
DCO
SetPageStatus
Sets the status of the current page.
If a batch contains documents of varying length, you can use separator pages between documents. For an
example that uses barcoded separators, look at the Taskmaster Accounts Payable (APT) foundation
application included with Taskmaster.
When you identify a page using structure-based identification, the page is not matched to a fingerprint, and so
there are no recognition zones for your application to locate data during recognition. You can design your
application to locate data fields using keyword identification or pattern matching techniques that do not rely
on recognition zones. We‟ll do this in a later chapter in this guide.
TEXT MATCHING
To perform page identification using text matching, you must first perform full page recognition. You can
then search the recognition results for a string that‟s unique to each page type.
In the example below, the first function performs full page recognition and looks for the string “Pickup” on
the current page. If it finds it, it assigns the page type “Rental_Agreement”; if it doesn‟t the function fails and
the second function looks for the string “Flight.” If it finds it, it assigns the page type “Air_Ticket”; if it
doesn‟t the function fails and the third function looks for the string “Room.” If it finds it, it assigns the page
type “Room_Receipt”; if it doesn‟t the page remains with the page type “Other.”
As with the structure-based techniques, when you identify a page using text matching, the page is not
matched to a fingerprint, so you‟ll have to use a recognition technique that does not rely on recognition
zones. We‟ll cover this later in the chapter on text matching.
MANUAL PAGE IDENTIFICATION
The page identification techniques described so far all identify pages automatically. It‟s also possible to
configure your application to display unrecognized pages to an operator for manual identification (see
“Adding a function for manual page identification” on page 292).
64
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PAGE IDENTIFICATION
IMAGE ENHANCEMENT
GOAL OF IMAGE ENHANCEMENT
The goal of image enhancement is to eliminate lines, noise, and other artifacts that can interfere with the
recognition process. This includes:

Problems such as lines and shading that may be inherent to the forms you are processing

Problems such as noise and misalignment that may be introduced during scanning
Taskmaster‟s Image Processing tool provides a set of image enhancement capabilities you can configure to
handle various problem types, including those listed above as well as many more. However, finding the best
combination of image enhancement settings can take time, especially if your application must handle multiple
page types. Since image enhancement is done before the page type is known (in other words, before page
identification), you must set up the image processing properties in a way that works well for all page types.
The default image processing properties are designed to work well with typical printed pages that use plain
black text on a white background. Establishing settings that work well for the pages your application must
handle requires experimentation. The section “Determining appropriate image processing settings” on
page 68 describes how to go about doing this. The settings you establish are stored in the file
imagefix.ini in the application‟s dco_<application_name> folder.
WHEN TO PERFORM IMAGE ENHANCEMENT
Typically, you perform image enhancement on fingerprint images when you‟re setting up the fingerprint
library, and again on your document images after input but prior to page identification.
When you add fingerprints to the fingerprint library, Taskmaster asks if you want to run image enhancement.
Typically, you‟ll need to experiment to find settings that work well for all page types. You may want to skip
image enhancement initially and then go back and enhance the fingerprint images later, once you‟ve
determined appropriate settings. You can do this as described under “Using the new image processing
settings to enhance the fingerprint images” on page 70.
You use the same image processing settings when running image enhancement after document input. This is
done by the ImageFix ruleset. The default ImageFix ruleset includes two rules:

The first rule (ImageFix Load Settings) reads the image processing properties from the settings (.ini) file.

The second rule (Enhance Image) performs image processing on each page using those settings and
creates a backup of the original with a .tio extension.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
65
PAGE IDENTIFICATION
TRAVELDOCS: CREATING THE INITIAL FINGERPRINT LIBRARY
In this section we‟ll create the initial fingerprint library for the TravelDocs application.
CHANGING THE FINGERPRINT CREATION METHOD
The application framework generated by the Application Wizard uses the image analysis method for
fingerprint creation. Since all the pages we‟ll be processing in the TravelDocs application are machine printed
and since we‟ll require full page text recognition later, we‟ll convert the application to use full page
recognition. In the sample application we‟ll use the OCR_s engine.
To change the fingerprint creation method you must edit two of the rulesets defined on the Datacap Studio
Rulemanager tab. The FingerprintAdd ruleset runs whenever you add a new fingerprint to the fingerprint
library. PageID generates the runtime fingerprints and performs matching to determine the type of each
incoming page. We‟ll modify these two rulesets to perform full page recognition instead of image analysis.
To modify the FingerprintAdd and Page ID rulesets:
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the FingerprintAdd ruleset and
click the Lock/Unlock ruleset (padlock) button to lock the ruleset for editing.
2. Expand the FingerprintAdd ruleset completely.
3. Right-click the AnalyzeImage action and choose Remove.
4. Click the Actions library tab.
5. Expand the OCR_S library and select RecognizePageOCR_S.
6. Make sure FingerprintAdd: Other Function 1 is selected in the Rulesets pane.
7. Click the Add to function
the FingerprintAdd ruleset.
button at the left side of the Actions Library pane to add the action to
8. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset.
9. Select the PageID ruleset and click the Lock/Unlock ruleset (padlock) button to lock the ruleset for
editing. Then expand the ruleset and the PageID rule.
10. Remove the AnalyzeImage action and replace it with the RecognizePageOCR_S action. If necessary,
use the  buttons to move the action to the correct position within the function.
11. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset.
66
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PAGE IDENTIFICATION
CREATING FINGERPRINTS FOR KNOWN PAGE TYPES
CREATING FINGERPRINT CLASSES
“Classes” let you categorize fingerprints within your application. The default framework includes two classes:

<Global>: This class includes the generic “555” fingerprint with page type “Other.” The generic
fingerprint is useful since it lets you begin application development without any actual page fingerprints.

<New>: The FindFingerprint action, used during page identification, lets you create fingerprints for
unrecognized pages automatically. If you call FindFingerprint with the parameter “True” and
Taskmaster doesn‟t find a matching fingerprint, it adds the runtime fingerprint to the “<New>” class.
Here we‟ll create a class for each document type: Car_Rental, Hotel, and Flight. Categorization by document
type is not required but provides a useful way to organize fingerprints, especially if there are many.
To create the fingerprint classes:
1. On theDatacap Studion Zones tab, in the Fingerprints pane, click the Add new item
choose Add fingerprint class.
button and
2. Enter “Car_Rental” and click OK.
3. Repeat for “Flight” and “Hotel.”
ADDING INDIVIDUAL FINGERPRINTS
1. In the Fingerprints pane, right-click the new Car_Rental class and choose Add fingerprint.
2. Browse to the folder where the TravelDocs fingerprint images are located.
3. Select Car1.tif, and click Open. When asked if you want to enhance the image, click No (we‟ll do this
later). It may take a few moments to add the new fingerprint since Taskmaster must OCR the full page.
4. Repeat to add Car2.tif, Car3.tif, Car4.tif, Car5.tif, and Car6.tif. Again, do not enhance the images.
5. In the Fingerprints pane, select the first car rental fingerprint and confirm that it‟s a rental agreement
page. Then click the Type drop-down at the top of the pane and choose Rental_Agreement.
6. Repeat to assign page types to the remaining car rental fingerprints. Use Rental_Agreement for the
rental agreement pages and Optional_Insurance for the optional insurance pages.
7. Add Flight1.tif, Flight2.tif, and Flight3.tif to the Flight class and assign the type Air_Ticket.
8. Add Hotel1.tif, Hotel2.tif, and Hotel3.tif to the Hotel class and assign the type Room_Receipt.
 Do not add Hotel4.tif or Hotel5.tif.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
67
PAGE IDENTIFICATION
TRAVELDOCS: ENHANCING THE SAMPLE FINGERPRINT IMAGES
DETERMINING APPROPRIATE IMAGE PROCESSING SETTINGS
Since most Taskmaster applications must handle multiple page types, but since image enhancement is done
before the page type is known (in other words, before page identification), you must set up the image
processing properties in a way that works well for all page types.
The default image processing properties are designed to work well with typical printed pages that use plain
black text on a white background. One of the sample air ticket pages contains white text on a black
background. Since this will be the trickiest page to process, let‟s look at this page first.
1. In the Fingerprints pane, expand the Flight class and select the third fingerprint (Airline #3).
2. In the Image View pane, click the Open image processing settings
button at the top right.
 The image on the left is the fingerprint before enhancement and the image on the right is the image
after enhancement. Since we haven‟t applied enhancement yet, the images are the same.
68
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PAGE IDENTIFICATION
3. Click the Run image processing  button to apply the default image processing properties, as defined
in the Properties pane. Most but not all of the vertical and horizontal lines disappear, the top of the page
is clipped, and the white text on a black background remains. These are problems we must fix.
4. Click the Reset image
button to revert to the original image.
5. In the Properties pane, change the settings as follows:
Category
Property
Default setting
New setting
Border Removal
Border Removal
True
False
Inverse Text Correction
Minimum Area Width
300
100
Line Removal
Minimum Length
50
30
6. Click the Save button and choose Save settings. Then click OK.
 When you save the settings, Taskmaster saves the new image enhancement properties in the file
C:\Datacap\TravelDocs\dco_TravelDocs\imagefix.ini. Taskmaster uses the same settings file for
the image processing that takes place prior to page identification (“ImageFix”).
7. Click the Run image processing  button to apply the new image processing properties. This time all
of the vertical and horizontal lines disappear, the top of the page is not clipped, and the white text on a
black background in converted to black text on a white background.
8. Click the  button to close the Image Processing window without saving the enhanced image.
9. Next, in the Fingerprints pane, select the second air ticket fingerprint (Airline #2). This page is also
problematic with the default settings, but we can try it with the new settings.
10. In the Image View pane, click the Open image processing settings
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
button at the top right.
69
PAGE IDENTIFICATION
11. Click the Run image processing  button to apply the new image processing properties.
The horizontal lines are removed while everything else is intact, so the settings work for this page as well.
12. Click the  button to close the Image Processing window without saving the enhanced image.
USING THE NEW IMAGE PROCESSING SETTINGS TO ENHANCE THE FINGERPRINT IMAGES
Now that we‟ve determined appropriate image processing settings, we can apply these to all of the sample
fingerprint files.
1. In the Fingerprints pane, expand the Car_Rental class and select the first Rental_Agreement
fingerprint.
2. In the Image View pane, click the Open image processing settings
button at the top right.
3. Click the Run image processing  button to apply the image processing properties.
4. Click the Save button, choose Save image, and click OK. Then click  to close the Image Processing
window.
5. Repeat to apply the same image processing properties to all of the other fingerprints. Make sure you
explicitly save each image after image processing.
70
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PAGE IDENTIFICATION
TRAVELDOCS: RUNNING A BATCH THROUGH THE WORKFLOW
Let‟s recap what we‟ve done so far in developing the TravelDocs application. We:

Created an application framework for the TravelDocs application using the Datacap Studio Application
Wizard.

Modified the default document hierarchy to include the document types and pages types the TravelDocs
application supports.

Specified the required structure for documents and pages within a batch according to the business
requirements.

Within the document hierarchy, defined the fields of interest for each page type.

Created the initial fingerprint library using one sample image for each known variant of each page type.
In terms of implementing the workflow we have not yet attached any rules to the document hierarchy,
although there are some default rules attached to the default elements. However, we can run a batch through
the PageID task to make sure the application is handling page identification correctly.
PROCESSING A BATCH
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar.
5. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (PageID).
6. Click the Process rules for target object  button on the main Test tab toolbar and wait while the
task profile executes. It may take a few moments as Taskmaster must perform full page OCR on all the
images in the “images” folder.
7. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (Rulerunner).
8. On the Runtime batch hierarchy tab, scroll through the list to see the page type assigned to TM000001,
TM000002, etc.
9. Since there‟s no point running the Rulerunner task profile yet (we haven‟t assigned any rules), right-click
the “running” batch
in the Workflow pane and choose Cancel.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
71
PAGE IDENTIFICATION
EXAMINING THE RUNTIME BATCH FOLDER
Open the application‟s most recent batch folder (C:\Datacap\TravelDocs\batches\<batch_id>). The folder
contains the following files:
File
Description
TM00000*.tif
An image-enhanced version of each of the sample image files.
TM00000*.tio
A copy of each of the original image files.
TM00000*c.xml
The results of full page recognition for each image file.
TM00000*.cco
The fingerprint file for each of the image files.
PageID.xml
The runtime document hierarchy generated by the PageID task profile.
pageid_rrs.log
The log file generated by the PageID task profile.
VScan.xml
The runtime document hierarchy generated by the VScan task profile.
vscan_rrs.log
The log file generated by the VScan task profile.
Rulerunner.xml
A copy of the runtime document hierarchy ready for use by the next task profile in the
workflow (Rulerunner).
CHECKING THE CONFIDENCE LEVELS ON THE RUNTIME PAGES
During page identification, Taskmaster assigns a confidence level to each page, indicating the degree of
similarity between the runtime page and the fingerprint that matches most closely. You can see the confidence
level for each page by looking in runtime batch file generated by the PageID task profile (PageID.xml).
1. Open the application‟s most recent batch folder (C:\Datacap\TravelDocs\batches\<batch_id>).
2. Open the file PageID.xml in an XML viewer or in Notepad. This file includes the confidence level
assigned to each page in the batch, as well as the ID of the matching fingerprint.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<B id="20100321.002">
<V n="TYPE">TravelDocs</V>
<V n="LAST_RR_TPROFILE">PageID:m:eRun</V>
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000001.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\images_page_01.tif</V>
<V n="RecogStatus">0</V>
<V n="Confidence">0.9727517</V>  Confidence level
<V n="Image_Offset">0,0</V>
<V n="TemplateID">556</V>
 ID of matching fingerprint
<V n="Fingerprint Created">No</V>
</P>
etc.
The default application framework uses a confidence threshold of 0.71, so anything above 0.7 is considered a
match. In the example above, the confidence level is 0.97, so the page is a good match. If multiple
fingerprints match with a confidence level above 0.7, Taskmaster selects the fingerprint with the highest
confidence value.
1
The SetProblemValue action in the PageID: Set Fingerprint Params rule specifies the minimum required confidence level.
72
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 7
RULE EXECUTION
Earlier we looked at the Taskmaster workflow from a high level and saw how batches move through the
workflow from task to task. We also saw how task profiles are implemented as rulesets and how rulesets are
constructed using rules and actions.
This chapter examines the mechanics of rule execution – how you associate rules with specific objects in the
document hierarchy and how Taskmaster executes rules as it processes a batch of documents. At the end of
this chapter, we‟ll revisit the TravelDocs application and use Datacap Studio‟s debugging tools to step
through the PageID task profile to see firsthand how Taskmaster executes rules.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
73
RULE EXECUTION
ASSOCIATING RULES WITH OBJECTS
Rules run only when they are assigned to specific objects in the document hierarchy, and then only if the
parent rulesets are included in the current task profile (this is explained in more detail later in this section).
Taskmaster can run rules on specific documents, pages, and fields, as well as at the batch level.
Batch
Document
Page
Field
Document
Page
Field
Field
Page
Field
Field
Page
Field
Field
Field
EXAMPLE 1
In the default “Rulerunner” task profile, the CreateDocs ruleset includes a rule called “Create Docs.” This
rule assembles individual pages into documents based on the structure described in the document hierarchy.
For example, the car rental document type has a rental agreement page and an insurance coverage page.
Car_Rental  Document
Open
Rental_Agreement
 Page
Optional_Insurance  Page
From the object general information shown on the right above, we can see that the rental agreement page is
required (Min=1), that there can be only one rental agreement page within each car rental document (Max=1),
and that it‟s always the first page (Order=1); the insurance page is optional (Min=0).
 To see the object general information with the rules for each page type within a document, click
to
lock the Document Hierarchy pane, then right-click the page type and choose Manage variables.
The Create Docs rule must run at the batch level since it must assemble multiple document types.
TravelDocs  Batch
Open
(global)
VScan : VScan
ImageFix : ImageFix Load Settings
PageID : Set Fingerprint Params

CreateDocs : Create Docs 
Document Integrity : Batch Document Integrity Check
Export : Set Export Params
Batch level rules
When the Create Docs rule encounters a page of type “Rental_Agreement” it creates a new document in the
runtime batch hierarchy. If a rental agreement page is followed immediately in the batch by a page of type
“Optional_Insurance,” Taskmaster adds the insurance page to the same car rental document; otherwise it
creates a new document.
Batch 20100100.001 (TravelDocs)
Document 20100100.001.01 (Car_Rental)
Page TM000001 (Rental_Agreement)
You can see the runtime batch
information in the Runtime Batch
Hierarchy pane on the Datacap
Studio Test tab.
Page TM000002 (Optional Insurance)
74
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
RULE EXECUTION
EXAMPLE 2
The default Recognize ruleset includes a rule called “Recognize Page.” This rule locates each of the field
zones on the current page using the positional information in the document hierarchy and then uses OCR to
get the data within each zone.
 Recognize ruleset
 Recognize Page rule
The rule must run at the page level since:

The fields are different for each page type (for example, the rental agreement page and the optional
insurance page have different fields).

The field positions are different for each variation of the page type (for example, the “Car_Type” field is
at a different location on the page for each rental car company).
The position variables define the position of the field for
each of the car rental agreement forms identified during
fingerprinting. 678, 695, and 696 are the fingerprint IDs.
 To see the variable information for any field, click
click the field and choose Manage variables.
to lock the Document Hierarchy pane, then right-
The Document Hierarchy shows that the Recognize Page rule runs at the page level for each page type.
Car_Rental
Open
Rental_Agreement
 Page
Open
(global)
CreateDocs : Create Fields
 Recognize : Recognize Page 
etc.
etc.
Optional_Insurance  Page
Open
(global)
CreateDocs : Create Fields
 Recognize : Recognize Page 
etc.
etc.
Note, however, that the rule is not assigned to the page type “Other” since there are no fields defined for this
page type.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
75
RULE EXECUTION
ORDER OF RULE EXECUTION
A rule only runs when it‟s associated with a specific object in the document hierarchy, and then only when the
parent ruleset is included in the current task profile. The order in which rules run is determined by:

The position of the ruleset within the current task profile

The position of the associated object within the document hierarchy
Note that the position of a rule within its ruleset does not affect when it runs.
When you run a task profile that has more than one ruleset, Taskmaster executes the rulesets in the order they
are displayed in the Task Profiles pane on the Datacap Studio Rulemanager tab. For example, when running
the PageID task profile, Taskmaster runs the ImageFix ruleset first on the entire batch and then the PageID
ruleset on the entire batch.
 Runs first
 Runs second
When running a ruleset on a batch, Taskmaster goes through the runtime batch hierarchy iteratively, as
illustrated in the example below (Page 1 of Document 1 is processed in its entirety before moving on to
Page 2 of Document 1, etc.).
1
2
3
4
Document 1
Page 1
Field 1 5
6
Field 2
Batch
7
Field 1
9
Page 2
8
10
Field 2 11
Document 2
Page 1
Field 1 12
13
Field 2 14
Field 1
Page 2
15
Field 2
For each object, Taskmaster runs any rule in the current ruleset that‟s included in the object‟s “Open”
element. For example, if you‟re running the Recognize ruleset on a page of type “Rental_Agreement,”
Taskmaster executes the Recognize Page rule.
Car_Rental
Open
Rental_Agreement
 Page
Open
(global)
CreateDocs : Create Fields
 Recognize : Recognize Page 
etc.
This is illustrated more fully in the example on the next page.
 You can only include one rule from a given ruleset in an object‟s “Open” element.
76
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
RULE EXECUTION
The illustration below shows a portion of the document hierarchy for the TravelDocs application. Notice that
each object also has a “Close” element. Taskmaster runs rules in the object‟s “Close” element when leaving
the object. For objects at the lowest level of the hierarchy, “Close” rules run immediately after “Open” rules,
but for other objects the “Close” rules will not run until Taskmaster has processed lower level objects.
TravelDocs  Batch
Open
(global)
VScan : VScan
ImageFix : ImageFix Load Settings
PageID : Set Fingerprint Params
CreateDocs : Create Docs
Document Integrity : Batch Document
Export : Set Export Params
Other
These rules run when batch processing begins,
depending on the ruleset you’re running. For
example, if you’re running the VScan ruleset,
Taskmaster executes the VScan rule.
 Page
Open
(global)
ImageFix : Enhance Image
PageID : PageID
FingerprintAdd: FingerprintAdd
Close
These rules run each time Taskmaster begins
processing a page of type “Other” (the default
page type assigned to all pages initially). For
example, if you’re running the PageID ruleset,
Taskmaster executes the PageID rule, which
assigns the correct type to each page.
Car_Rental  Document
Open
Rental_Agreement
There are no document-level rules in this example.
 Page
Open
(global)
CreateDocs : Create Fields
Recognize : Recognize Page
Validate : Validate Page
Routing : Routing Rule 1
Export : Export Page Fields
These rules run each time Taskmaster begins
processing a page of type “Rental_Agreement.”
For example, if you’re running the CreateDocs
ruleset, Taskmaster executes the Create Fields
rule, which creates a page data file to store the
data captured later by the Recognize ruleset.
Vendor
Pickup_Date  Field
Open
(global)
Validate : Validate Date
This rule runs when Taskmaster is processing a
page of type “Rental_Agreement” and encounters
a field of type “Pickup_Date.”
Close
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
77
RULE EXECUTION
EXAMPLE 1
The PageID task profile includes a ruleset called “PageID.” The PageID ruleset includes a rule called
“PageID” whose job is to identify the type of each incoming page. It does this by comparing the page image
to the known page types using fingerprint recognition.
 PageID task profile
 PageID ruleset
 PageID ruleset
 PageID rule
The Sync DCO View with Ruleset View
button at the left of the Rulesets pane lets you see which
documents, pages, or fields are associated with the selected rule. For example, if you select the PageID rule
and click the
button, Datacap Studio expands the document hierarchy to show you the objects associated
with the PageID rule.
 Page object “Other”

PageID ruleset
 PageID rule
In this example, the PageID rule is associated only with the page type “Other.” This means the rule will run
whenever Taskmaster processes a page of type “Other” while executing the PageID task profile. Note that
“Other” is a default page type Taskmaster assigns to all incoming pages, since all pages are initially unknown.
Taskmaster will run this rule on all pages during page identification and, assuming the page matches one of
the know types, assign the correct type (for example, “Rental_Agreement” or “Air_Ticket”).
78
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
RULE EXECUTION
EXAMPLE 2
The Rulerunner and Verify task profiles both include the “Validate” ruleset. In the example below, the
Validate ruleset includes a rule called “Validate Date” that returns True if a field value conforms to one of the
supported date formats.
 Validate ruleset
 Validate ruleset
 Validate Date rule
 Validate ruleset
If you select the Validate Date rule and click the Sync DCO View with Ruleset View
button, Datacap
Studio expands the document hierarchy to show you which objects are associated with the rule.
 Page object “Room_Receipt”
 Field object “Arrival_Date”
 Validate Date rule
 Field object “Departure_Date”
 Validate Date rule
In this example, the Validate Date rule is associated with the “Arrival_Date” and “Departure_Date” fields on
pages of type “Room_Receipt.” This means the rule will run whenever Taskmaster processes an
“Arrival_Date” or “Departure_Date” field on a page of type “Room_Receipt” while executing the
Rulerunner or Verify task profile.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
79
RULE EXECUTION
SUMMARY OF ORDER OF RULE EXECUTION
The following pseudo-code describes the order of rule execution:
for each ruleset in the current task profile
run batch-level "Open" rules
for each document type
run document-level "Open" rules
for each page type in the current document type
run page-level "Open" rules
for each field in the current page type
run field-level "Open" rules
run field-level "Close" rules
next field
run page-level "Close" rules
next page
run document-level "Close" rules
next document
run batch-level "Close" rules
next ruleset
80
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
RULE EXECUTION
TRAVELDOCS: STEPPING A BATCH THROUGH THE PAGEID TASK PROFILE
We haven‟t added any new functionality to the application in this section, but we can run the VScan and
PageID rulesets again, this time stepping through the actions.
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar.
5. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (PageID).
6. Click the Step in
button on the main Test tab toolbar.
 The PageID task profile is running
 We’re executing rules on the batch object
 The current ruleset is ImageFix (first ruleset in the PageID profile)
7. Click the Step in
button a few times. Execution remains at the batch level. Since the ImageFix Load
Settings rule is assigned at the batch level, Taskmaster expands the rule and prepares to execute the
LoadSettings action.
8. Continue clicking the Step in
button to execute the ImageFix Load Settings action and complete the
rule. The Enhance Image rule is not assigned at the batch level so does not run.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
81
RULE EXECUTION
9. Continue clicking the Step in
button until the runtime batch hierarchy indicates that first page is
selected. Taskmaster is now ready to run page level rules on each page in the runtime hierarchy, starting
with page TM000001.
10. Continue clicking the Step in
button. Since the Enhance Image rule is assigned at the page level,
Taskmaster expands this rule and prepares to execute the ImageEnhance action.
11. In the center pane of the Datacap Studio window, click the Image tab if it‟s not already active. The
Image tab display the current runtime object.
12. Click the Step in
button to execute the ImageEnhance action. The Image pane displays the current
page image after image enhancement. The checkmark beside the action in the Rulesets pane indicates that
the action returned True.
13. Now that you‟ve seen how to step through the workflow, right-click the batch
and choose Cancel.
in the Workflow pane
 We‟ll cover single-stepping and use of breakpoints in detail in a later chapter of this guide.
82
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 8
DOCUMENT ASSEMBLY
We saw earlier how Taskmaster identifies incoming pages and assigns the correct page type using fingerprint
matching or one of the other identification methods. The next step is to take a batch of individual pages and
assemble them into documents according to the rules defined within the document hierarchy. Then, in
preparation for the recognition phase, we need to create a runtime data file for each page.
This chapter examines document assembly and data file creation. It also covers document integrity checking –
making sure the documents within the batch conform to the predefined rules and taking corrective action if
they don‟t. At the end of the chapter, we‟ll implement some of the techniques described here within the
context of the TravelDocs application.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
83
DOCUMENT ASSEMBLY
CREATING STRUCTURED DOCUMENTS
CREATING DOCUMENTS BASED ON THE DOCUMENT HIERARCHY
The document hierarchy defines the document types your application supports, plus the page types associated
with each document type. The TravelDocs application‟s document hierarchy defines three document types
and four associated page types, plus the generic page type “Other.”
 Document type “Car_Rental”
 Page type “Rental_Agreement”
 Page type “Optional_Insurance”
 Document type “Flight”
 Page type “Air_Ticket”
 Document type “Hotel”
 Page type “Room_Receipt”
After page identification assigns the correct page type to each incoming page, your application must use the
information in the document hierarchy to determine the corresponding document. For example, a page of
type “Rental_Agreement” is part of a car rental document; whereas a page of type “Air_Ticket” is part of a
flight document.
Taskmaster then uses the information in the document hierarchy to assemble individual pages into multi-page
documents. Each page has three variables that define the structure of the parent document:

Max: Maximum number of pages of this type for each document (0 means no maximum).

Min: Minimum number of pages of this type for each document (0 means no minimum).

Order: Position of this page relative to other pages in the same document (0 means any position).
For example, here are the variables we specified earlier for each of the TravelDocs pages.
Max
Min
Order
Description
Rental Agreement
1
1
1
One per document; required; must be first
Optional Insurance
1
0
2
One per document; optional; must be second
Air_Ticket
1
1
1
One per document; required; must be first
Room_Receipt
1
1
1
One per document; required; must be first
From the information above, we know that for each car rental document there must be one rental agreement
page and that it may be followed by an optional insurance page. If Taskmaster identifies a rental agreement
page that‟s immediately followed in the batch by an optional insurance page, it groups these two pages
84
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
together as a single document. Below is a portion of the runtime data file (PageID.xml) after document
creation.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<B id="20100321.011">
<V n="TYPE">TravelDocs</V>
<V n="LAST_RR_TPROFILE">Rulerunner:m:eRun</V>
 Document ID
<D id="20100321.011.01">
<V n="TYPE">Car_Rental</V>
<V n="STATUS">0</V>
 Document type
<P id="TM000001">
 Page ID
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000001.tif</V>
etc.
</P>
 Page type
<P id="TM000002">
 Page ID
<V n="TYPE">Optional_Insurance</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000002.tif</V>
etc.
</P>
</D>
etc.
 Page type
ASSEMBLING DOCUMENTS
The Taskmaster Actions libraries include the following action to assemble pages into documents.
Library
Action
Description
DCO
CreateDocuments
Assembles the current batch’s pages into documents based on the
structure defined in the document hierarchy and the min , max , and
order properties.
You must execute this action within a rule that runs at the batch level, as shown in the example below.
 Batch
 The Create Docs rule executes the CreateDocs action
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
85
DOCUMENT ASSEMBLY
CREATING THE PAGE DATA FILES
Having grouped individual pages into documents, you must create a runtime data file for each page
(TM000001.xml, TM000001.xml, etc.). The “DCO” actions library includes the following action to create a
runtime data file for the current page.
Library
Action
Description
DCO
CreateFields
Creates a page data (.xml) file for the current page. The file includes an
element for each field defined in the document hierarchy for the current
page type. Each field has an ID and three properties (TYPE, Position, and
Status) with default values.
You must execute this action within a rule that runs at the page level, as shown in the example below.
 Page
 The Create Fields rule executes the CreateFields action
Initially the data file is empty, but the CreateFields action uses the structure defined in the document
hierarchy to create a shell with an element for each data field. The shell gets populated later during
recognition.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000001">
 Page data file for first page in batch (type Rental_Agreement)
<F id="Pickup_Date">
 Pickup_Date field (no data)
<V n="TYPE">Pickup_Date</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
<F id="Pickup_Location">
 Pickup_Location field (no data)
<V n="TYPE">Pickup_Location</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
etc.
86
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
CHECKING DOCUMENT INTEGRITY
The document hierarchy defines the required structure of the batch, but what happens when Taskmaster
encounters a batch that does not conform to the required structure? For example, a batch in the TravelDocs
application might have a rental agreement page followed by two optional insurance pages, or a batch might
have an optional insurance page that isn‟t preceded by a rental agreement page.
Below is an example of a runtime batch with the two structural integrity issues described above.
 Orphaned optional insurance page
 Orphaned optional insurance page

In the first case, the CreateDocuments action grouped the first optional insurance page with the
preceding rental agreement page and put the second insurance page in a separate document of an
unidentified type.

In the second case, the CreateDocuments action again placed the orphaned insurance page in a
document of an unidentified type.
In both cases the batch has broken the document integrity rules defined in the document hierarchy. However,
the CreateDocuments action leaves the document status set to 0 (OK) for each document and the page
status set to 49 (ScanOK) for each page.
<D id="20100322.007.03">
<V n="TYPE"></V>
<V n="STATUS">0</V>
 Document status is “OK” (0)
<P id="TM000003">
<V n="TYPE">Optional_Insurance</V>
<V n="STATUS">49</V>
etc.
</P>
 Page status is “ScanOK” (49)
</D>
etc.
<D id="20100322.007.07">
<V n="TYPE"></V>
<V n="STATUS">0</V>
 Document status is “OK” (0)
<P id="TM000007">
<V n="TYPE">Optional_Insurance</V>
<V n="STATUS">49</V>
etc.
</P>
</D>
 Page status is “ScanOK” (49)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
87
DOCUMENT ASSEMBLY
USING THE CHECKALLINTEGRITY ACTION
The “rrunner” actions library includes the following action to check the structural integrity of a batch.
Library
Action
Description
rrunner
CheckAllIntegrity
Returns True if the current batch meets the requirements defined in the
document hierarchy; returns False otherwise.
You must execute this action within a rule that runs at the batch level, as shown in the example below.
 Batch

The Batch Document Integrity Check rule
executes the CheckAllIntegrity action
CheckAllIntegrity does not change the status variable on non-conforming documents. Instead, the action
returns False if there is a document integrity issue, and you can use this to trigger corrective action, as
described in the next section.
 If you search for CheckAllIntegrity in the log file generated by the Rulerunner task profile
(rulerunner_rrs.log), you can see the code returned by CheckAllIntegrity if the action returns False.
If the batch includes multiple problems, the code represents the last problem. The codes are:
1 = Has more child objects that allowed by “max” attribute
2 = Has fewer child objects than required by “min” attribute
3 = A child object is not of a type supported by parent
4 = A child object is in the wrong position as specified by the “pos” attribute
88
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
HANDLING DOCUMENT INTEGRITY PROBLEMS
The Document Integrity ruleset demonstrates how handle document integrity problems. In the example
below, the function “Batch Route To Fixup” executes only if CheckAllIntegrity returns false.
 Function executes only if CheckAllIntegrity returns False
If CheckAllIntegrity identifies a document integrity issue (for example, the pages are in the wrong order,
a required page is missing, etc.), the application sends the batch to the “Fixup” job so an operator can fix the
problem and then returns the batch to the main workflow. Moving a batch out of the current workflow in
this way is known as branching. The actions that implement branching are described on the next page.
Document
integrity OK?
No
Raise “Document Integrity
Failed” condition
Yes
Move batch to FixUp job,
status Pending
Yes
Execute remaining
rulesets in current
task profile
Condition
raised?
No
Run next task
profile
Operator fixes doc
integrity problems
Return batch to main job,
status Pending
 Since the Rulerunner task profile generated by the Application Wizard includes recognition and
validation, Taskmaster will perform recognition and validation before branching to the FixUp job, so
some pages may have Status = 1 (indicating a problem). You will be unable to complete the Fixup
job without setting the status on all pages to BP_PAGE_OK (0). You may prefer to configure your
application to use two separate task profiles so Taskmaster branches to FixUp before recognition if
there are document integrity problems. We‟ll do this in a later section of this guide (see “Creating the
new CreateDocs task and module” on page 288).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
89
DOCUMENT ASSEMBLY
The “Batch Route To Fixup” function uses the actions below to branch to the Fixup job.
Library
Action
Description
rrunner
Task_NumberOfSplits
Specifies the number of jobs the batch is sent to before returning to the
main workflow (almost always 1).
rrunner
Task_RaiseCondition
Specifies the group index (almost always 0) and the index of the
condition to raise from the list on the Taskmaster Client Workflow tab
(where 0 is the first condition).
 First condition in list so index = 0
To view the Workflow tab, start Taskmaster Client and select TravelDocs.
Then click the Administrator
button. See the section referenced
below for details.
Before Taskmaster will branch to the FixUp job, you must configure the settings on the “Document Integrity
Failed” condition. The steps to do this are provided later for the TravelDocs application under “Configuring
branching” on page 93.
90
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
TRAVELDOCS: CREATING DOCUMENTS AND SETTING UP PAGE FILES
RUNNING A BATCH THROUGH THE WORKFLOW
 We haven‟t added anything to the CreateDocs ruleset so we‟ll be running the default version generated
by the Application Wizard.
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar.
5. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (PageID).
6. Click the Process rules for target object  button on the main Test tab toolbar.
7. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (Rulerunner).
8. Click the Process rules for target object  button on the main Test tab toolbar.
9. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow (Verify).
10. On the Runtime batch hierarchy tab, expand each document node to see how the individual pages are
now grouped into documents. The flight and hotel document all have only one page, but two of the car
rental documents have multiple pages.
11. When you‟ve finished looking at the batch structure, right-click the “running” batch
pane and choose Cancel.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
in the Workflow
91
DOCUMENT ASSEMBLY
EXAMINING THE RUNTIME BATCH FOLDER
Open the application‟s most recent batch folder (C:\Datacap\TravelDocs\batches\<batch_id>). The folder
contains the following files:
File
Description
TM00000*.tif
An image-enhanced version of each of the sample image files.
TM00000*.tio
A copy of each of the original image files.
TM00000*.xml
The page data file for each image file (see below).
TM00000*c.xml
The results of full page recognition for each image file.
TM00000*.cco
The fingerprint file for each of the image files.
Rulerunner.xml
The runtime document hierarchy generated by the PageID task profile.
rulerunner_rrs.log
The log file generated by the Rulerunner task profile
PageID.xml
The runtime document hierarchy generated by the PageID task profile.
pageid_rrs.log
The log file generated by the PageID task profile.
VScan.xml
The runtime document hierarchy generated by the VScan task profile.
vscan_rrs.log
The log file generated by the VScan task profile.
Verify.xml
A copy of the runtime document hierarchy ready for use by the next task profile in the
workflow (Verify).
REVIEWING THE PAGE DATA FILES
The CreateFields action in the Create Fields rule is responsible for creating a page data file for the current
page. The data file includes all fields identified for the current page type in the document hierarchy. Each field
has an ID and three properties: TYPE, Position, and Status.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000001">
 Page data file for first page in batch (type Rental_Agreement)
<F id="Pickup_Date">
 Pickup_Date field (no data)
<V n="TYPE">Pickup_Date</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
<F id="Pickup_Location">
 Pickup_Location field (no data)
<V n="TYPE">Pickup_Location</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
etc.
Later, actions of various kinds can assign other values to these properties and add more properties as needed.
92
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
TRAVELDOCS: HANDLING DOCUMENT INTEGRITY ISSUES
CONFIGURING BRANCHING
The Document Integrity ruleset generated by the Application Wizard is configured to identify document
integrity problems and send the batch to the FixUp task if required. However, additional setup steps are
required in Taskmaster Administrator to configure the required branching.
To configure branching in the TravelDocs application:
1. If Taskmaster Client for TravelDocs is not already open:

Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.

Select the TravelDocs application and click OK.

Log in using User ID: admin; Password: admin; Station ID: 1.
2. Click the Administrator
button to open the Taskmaster Administrator window.
3. On the Modules tab, select rrsRulerunner.
4. Click the Values field beside the Parameters label and then click the Browse […] button beside the
default value (rrs_RuleRun.bpp).
5. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_rulerun.bpp and click Open.
6. Click Apply and then click Done.
7. Click the Administrator
button to reopen the Taskmaster Administrator window.
8. On the Workflow tab, expand Main Job and then expand Rulerunner.
9. Select the Document Integrity Failed condition and, if necessary, configure the values as follows:

Action: Branch

Child Job: Fixup Job

Parent Status: Pending

Child Status: Pending

Steps: 1
10. Click Apply (if you had to change the default values) and then click Done. The application is now
configured to branch to the FixUp job if document integrity fails and then return to the main job with
status “pending.”
11. Leave the Taskmaster Client window open as you‟ll need to use it in the next section.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
93
DOCUMENT ASSEMBLY
RUNNING A BATCH WITH DOCUMENT INTEGRITY PROBLEMS
In order to introduce a document integrity problem, we‟ll add an optional insurance page to the end of the
batch. Since the page won‟t immediately follow a car rental agreement page, the CheckAllIntegrity action
will generate an error during document integrity checking.
1. Open C:\Datacap\TravelDocs\images.
2. Make a copy of Images_Page_02.tif (the first optional insurance page) and name the copy
Images_Page_12.tif. This will create an orphaned insurance page at the end of the batch.
3. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
4. Double-click the PageID icon. When the task completes, click Stop.
5. Double-click the Rulerunner icon. When the task completes, the Status field indicates that the task raised
additional conditions.
6. Click Stop.
7. Click the Job Monitor
button to display the job queue.
Notice that the batch has two entries in the queue:

The first entry indicates that the batch has status “Pending” for the FixUp task.

The second entry indicates that the batch has status “Waiting” for the Verify task.
8. Double-click the row number
94
for the first entry (FixUp).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DOCUMENT ASSEMBLY
9. Click Yes to execute the selected batch. The batch opens in the Batch Pilot FixUp window. The last
document is selected and the Comments field indicates that the document has an invalid member (the
orphaned insurance page).
10. Select the page TM000012 and click Delete. Then click OK to confirm. Batch Pilot deletes the page and
the parent document.
11. Expand the Batch Pilot window if necessary so the Finish button is visible.
12. Click Finish and then click OK in the “Task Finished” message box.
13. Open the Job Monitor window if necessary and press F5 to refresh the view.
The FixUp task now has status “Job done” and the batch is now pending for the Verify task.
14. Before continuing, open the C:\Datacap\TravelDocs\images folder and delete file
Images_Page_12.tif so we don‟t run into this problem again.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
95
Chapter 9
DATA RECOGNITION
Data recognition is the stage during which you locate the fields you want to capture and then convert the
fields into character-based data. The data obtained from recognition is stored in the page data files we set up
in the document assembly stage.
Earlier, in the section on page identification, we identified several techniques you can use to identify pages. Of
these, the most widely used is fingerprint matching. If you used fingerprint matching for page identification,
you‟ll most likely use the fingerprint images to define the recognition zones – the fields you want to read on
each page. If you used full page recognition, as we did in the TravelDocs application, you can obtain the field
data directly from the full page recognition results; otherwise you‟ll need to run the recognition engine on
each field zone to capture the data. This chapter discusses both methods.
The other recognition techniques do not use fingerprint zones to locate the field data. Instead, they use text
matching or pattern matching to analyze the page and identify the fields. We‟ll look at these techniques in
later sections of this guide.
At the end of this chapter we‟ll continue working on the TravelDocs application. We‟ll begin by specifying
text recognition zones on the fingerprints we created earlier. Then we‟ll assign rules to the document
hierarchy to implement the recognition stage. Finally, we‟ll set up the recognition zones and the associated
rules to handle the checkbox options.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
97
DATA RECOGNITION
RECOGNIZING PAGE DATA
USING FINGERPRINTS TO IDENTIFY RECOGNITION ZONES
We saw earlier how Taskmaster uses a technique called fingerprint matching to identify incoming pages. For each
incoming page, Taskmaster generates a fingerprint file that describes the page. It then compares the new
fingerprint to a library of fingerprints for known page types. When it finds a match it assigns the
corresponding page type.
The fingerprint library has a second purpose, which is to let you identify the position of each field for each
known page type. Since there can be many variants of each page type and the position of each field is most
likely different for each variant, you must identify the recognition zones for each variant2.




 Pickup_Date

Air_Ticket
 Pickup_Location
 Return_Date
 Return_Location
 Total_Cost





There are other methods to locate data that do not require positional information, including text matching, which is
discussed in a later chapter of this guide. Another approach lets you add fingerprints to the fingerprint library on-the-fly.
During verification, the operator clicks in the positional information for each field and Taskmaster stores this information in
the document hierarchy. This type of automatic fingerprint generation is also discussed in a later chapter.
2
98
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
STORING THE RECOGNITION ZONE INFORMATION
Datacap Studio stores the coordinates for each field zone as a variable in the document hierarchy. In the
example below you can see the zones for the pickup date field on the Car Rental #1 and Car Rental #2 pages:

The Car Rental #1 fingerprint has ID 695, so “Pos695” defines the position of the pickup date field for
Car Rental #1.

The Car Rental #2 fingerprint has ID 678, so “Pos678” defines the position of the pickup date field for
Car Rental #2.
You can see the position coordinates for a given field by right-clicking the field name in the Document
Hierarchy pane and choosing Manage variables (the document hierarchy must be locked). Alternatively, you
can select the field and look in the Properties pane.
In this example, whenever Taskmaster identifies a page as a Car Rental #2 rental agreement, it knows the
pickup date is located at coordinates (579,353,943,415)3.
The coordinates are relative to a reference point on the page. If the incoming page does not align precisely with the
reference page, Taskmaster calculates an offset and uses this to determine the actual field positions. We‟ll discuss this in more
detail later, in the chapter on pattern matching.
3
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
99
DATA RECOGNITION
READING DATA FROM THE PAGE
The method you use to read data from a page depends on the method you used to generate the runtime
fingerprints:

If you used full page recognition to generate the runtime fingerprints, you can obtain the field data
directly from the fingerprint (CCO) file.

If you used AnalyzeImage to generate the runtime fingerprints, you must perform recognition on each
of the field zones to obtain the field data.
The actions to implement these techniques are described in the sections below.
USING FULL PAGE RECOGNITION RESULTS TO POPULATE THE PAGE DATA FILES
The Taskmaster Actions libraries include actions that take the character data from the fingerprint (CCO) file
and apply it to the runtime batch hierarchy.
Library
Action
Description
Zones
ReadZones
Loads the zone position information for the current fingerprint.
Recog_Shared
SnapCCOtoDCO
Transfers the recognition results from the current page's fingerprint (CCO)
file to the appropriate field objects in the runtime batch hierarchy.
You must run ReadZones before you can run SnapCCOtoDCO.
USING ZONE OCR TO POPULATE THE PAGE DATA FILES
If you used AnalyzeImage to generate the runtime fingerprints, the following actions are available for
performing zone-based recognition.
100
Library
Action
Description
Zones
ReadZones
Loads the zone position information for the current fingerprint.
ocr_a
RecognizePageFieldsOCR_A
Recognizes the characters within each of the field zones using the
position information in the current fingerprint.
OCR_s
RecognizePageFieldsOCR_S
Recognizes the characters within each of the field zones using the
position information in the current fingerprint.
ocr_sr
RecognizePageFieldsOCR_S
Recognizes the characters within each of the field zones using the
position information in the current fingerprint.
icr_c
RecognizePageFieldsICR_C
Recognizes the characters within each of the field zones using the
position information in the current fingerprint.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
HANDLING CHECKBOX OPTIONS
OVERVIEW OF CHECKBOX RECOGNITION METHODS
Taskmaster employs optical mark recognition (OMR) to determine whether a checkbox option is selected or
not. There are two basic techniques:

OCR/A checkbox recognition method: This is very easy to set up and works well with non-dropout
checkboxes (where the checkbox outline remains on the page image), but not so well for drop-out
checkboxes (where the outline drops out during scanning). The OCR/A recognition engine determines
whether the specified region represents a selected checkbox (1) or a non-selected checkbox (0).
Selected

Selected
Not selected
Pixel threshold evaluation method: This is more difficult to set up but is more reliable for drop-out
checkboxes and can also be used to read fill-in “bubbles” () on a response form. It calculates the
percentage of black pixels within a specified zone and compares the result to a pre-determined threshold
value. For example, if the threshold is 20%, any OMR zone with more than 20% black pixels is
considered selected (1), while any zone with 20% or less is considered not selected (0).
> 20% black
> 20% black
<= 20% black
In this section we‟ll examine both checkbox recognition techniques. The field setup requirements are the
same for both, so we‟ll look at these first.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
101
DATA RECOGNITION
ESTABLISHING PARENT FIELDS
When processing pages with checkbox options:

You must define the checkbox options as sub-fields of a parent field within the document hierarchy.

You must outline the sub-fields and the parent field when drawing the recognition zones.
Earlier, we defined sub-fields for the options on the rental agreement page in the TravelDocs application.
When we define the recognition zones, as we‟ll do later for the TravelDocs application, we‟ll need to define
the positions of the parent fields and the sub-fields.
Although it‟s tempting to draw the child field zone within the checkbox outline, recognition will only succeed
if the checkboxes on the runtime page image align perfectly with the fingerprint image. Even a slight
misalignment can result in a false positive, so the best approach is to draw the recognition zone around the
checkbox, as shown in the examples above.
 Parent field
 Options as sub-fields
We also defined parent fields and sub-fields on the optional insurance page, although in this case we created a
parent field for each checkbox. When we define the zones, we‟ll need a zone for each parent and each subfield.
 Parent field
 Sub-field
SETTING THE REQUIRED VARIABLES ON THE PARENT FIELD
To process checkbox options as OMR fields, you must set the RecogType variable on the parent field equal
to 4. This tells the Taskmaster recognition engine to use mark recognition rather than character recognition.
Additionally, if the business requirements indicate that multiple options within the group can be selected, you
must set the MultiPunch variable on the parent field equal to 1.
 Parent field
 RecogType and MultiPunch variables
The RecogType and MultiPunch variables are not included by default, so you must add them manually to
the parent object, as described later for the TravelDocs application (see “Setting the required variables on the
Options and Insurance fields” on page 115).
102
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
USING THE OCR/A CHECKBOX RECOGNITION METHOD
The OCR/A checkbox recognition method uses the RecognizeFieldOCR_A action to determine whether
the zone represents a selected checkbox or a non-selected checkbox. You must include this action in a rule
that‟s bound to the parent zone and you must configure the OCR/A settings for the parent zone (not the
individual checkbox sub-fields).
You configure the OCR/A settings for specific zones using the OCR/A tab in the Properties pane on the
Datacap Studio Zones tab. The OCR/A tab is not displayed by default, so enable it as shown below:

Right-click any existing tab and choose Show tabs. Then select the OCR/A option.
Clicking the OCR/A tab displays the settings that the OCR/A recognition engine uses when performing
recognition on the selected field.
 OMR parent field selected
 OCR/A OMR settings for selected field
The three OMR settings are:
Checkmark type
Select “Square background” to read non-dropout checkboxes. This setting is stored in the
document hierarchy using the OMRType variable, where 0 is “Square background”:
<V n="OMRType">0</V>
Length
This reflects the number of OMR sub-fields and is set automatically.
Multipunch
This is the same as the MultiPunch variable we looked at earlier, where 1 is “Yes”:
<V n="MultiPunch">1</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
103
DATA RECOGNITION
USING THE PIXEL THRESHOLD EVALUATION METHOD
The pixel threshold evaluation method uses the RecogOMRThreshold action in the “Recog_Shared” library.
SPECIFYING THE THRESHOLD AND BACKGROUND LEVELS
The RecogOMRThreshold action takes two parameters:

Threshold: Specifies the percentage of black pixels above which the option is considered selected.

Background: Used to determine the confidence level and specifies the percentage that can be attributed
to the checkbox outline plus any scanner noise:

Any zone with below this value is considered not selected with high confidence; any zone with
between this value and the threshold value is considered not selected with low confidence.

Any zone with above (2 * Threshold - Background) is considered selected with high confidence; any
zone with between Threshold and (2 * Threshold - Background) is considered selected with low
confidence.
100%
Option: Selected
Confidence: High
(2 * Threshold) – Background
Option: Selected
Confidence: Low
Threshold
Option: Not selected
Confidence: Low
Background
Option: Not selected
Confidence: High
0%
However, if MultiPunch=0 (or is not specified) then only the zone with the highest percentage is selected.
For example, if the threshold value is 20 and the background value is 15 then the high confidence threshold is
(2 * 20 – 15) = 25. If you execute RecogOMRThreshold(20,15) on an OMR group field with
MultiPunch=1, then:

Any zone with more than 25% black pixels is considered selected with high confidence

Any zone with between 20% and 25% is considered selected with low confidence

Any zone with between 15% and 20% is considered not selected with low confidence

Any zone with 15% or less black pixels is considered not selected with high confidence
 Confidence: High
25%
 Confidence: Low
20%
 Confidence: Low
15%
 Confidence: High
104
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
DETERMINING THE APPROPRIATE THRESHOLD AND BACKGROUND VALUES
To determine appropriate values for the threshold and background parameters you must determine the
percentage of pixels within the OMR zone that can be attributed to the checkbox outline plus any scanner
noise. The easiest way to do this is to run a page containing checked and unchecked option boxes though the
workflow and get the pixel counts from the page data file.
When Taskmaster executes a RecogOMRThreshold action, it counts the number of black pixels within each
OMR zone and writes the resulting values to the page data file as a density string.
The DensityString has one character per OMR zone. In the example above, the “Options” field has three
OMR zones and the DensityString value is FBG. Each of the characters corresponds to a percentage value
using the following formula:
Percentage black pixels = character’s ASCII code value – 48
In the example above:

The ASCII code for each of the three characters is 70, 66, and 71 respectively.

The percentage of black pixels for each of the three zones is 22%, 18%, and 23% respectively.
Having obtained the percentage values, you can then refer to the original page image to see if the
corresponding checkbox is selected or not selected. The example above was obtained from a page where the
first and third options were selected, and the second option was not selected.
Checkbox
Percentage filled
22%
18%
23%
Based on these three checkboxes, you would set the threshold and background values somewhere between 18
and 22. (Note that fractional values are permitted for the threshold and background parameters.) However, it
would be prudent to scan additional pages and check their density strings before setting final values.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
105
DATA RECOGNITION
IMPLICATIONS OF USING RECOGOMRTHRESHOLD
Since the RecogOMRThreshold action relies upon pixel counts within the OMR zone, it‟s important that all
OMR zones have the same dimensions, or as close as possible.
When drawing OMR zones on the Datacap Studio Zones tab this is sometimes hard to achieve, so you may
want to establish approximate zone boundaries by drawing the bounding boxes on the Image View tab, and
then edit the coordinates in the Pos variables manually in the Properties pane (you‟ll need to lock the DCO
for editing).
The coordinates correspond to the top left and bottom right corners of the bounding box, for example x1,
y1, x2, y2:
(x1, y1)
(x2, y2)
106
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
TRAVELDOCS: SPECIFYING RECOGNITION ZONES
In this section we‟ll define the positions of the various fields for one variant of each page type. Eventually
we‟ll need fingerprints for all variants, but to get started we‟ll do just one for each page type. As you locate the
field zones on the fingerprint images, you‟ll see how Datacap Studio writes the position information for each
field into the document hierarchy.
CREATING THE TEXT ZONES ON THE RENTAL_AGREEMENT PAGE
1. In the Fingerprints pane on the Datacap Studio Zones tab, select the first Rental_Agreement page
(Car Rental #1).
2. In the Image View pane, click the Zoom button to enlarge the page so you can see the fields clearly.
3. In the Document hierarchy pane, click the Lock DCO for editing button.
4. In the Document hierarchy pane, expand the Rental_Agreement page and select the Pickup_Date
field. Then use the mouse to draw a bounding box around the pickup date on the page image.
5. Repeat for the Pickup_Location, Return_Date, Return_Location, Car_Type, and Total_Cost fields.
Make sure you provide enough horizontal space around the field in case the text on the runtime page
image is longer than the text on the fingerprint image.
6. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button.
7. In the Document hierarchy pane, select any one of the fields you just defined (for example, the
“Car_Type” field). Then look in the Properties pane at the bottom left of the Zones tab and make sure
the zone coordinates are saved in the field‟s Pos<id> variable, where <id> is the fingerprint you
selected.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
107
DATA RECOGNITION
CREATING THE OMR ZONES ON THE RENTAL AGREEMENT PAGE
The Rental_Agreement page type includes an Options field with three sub-fields:

Nav_System

Child_Seat

Fuel_Service
These options are checkbox options that you handle using optical mark recognition (OMR). OMR zones
must always be subfields of a parent field. There are two approaches you can use:

You can create a single parent field containing all of the OMR sub-fields. We‟ll use this technique for the
rental agreement page.

You can create a separate parent field for each OMR sub-field. We‟ll use this technique when we do the
optional insurance page.
To create the OMR zones on the rental agreement page:
1. In the Document Hierarchy pane, click the Lock DCO button.
2. In the Document hierarchy pane and expand the Rental_Agreement page if necessary. Then select the
Options field and draw a bounding box around the Options region on the page image.
3. Expand the Options field node, select the Nav_System sub-field and draw a bounding box around the
GPS Navigation checkbox. Then repeat for the Child_Seat and Fuel_Service options. Try to make the
bounding boxes as close to the same size as possible (see “Implications of using RecogOMRThreshold”
on page 106).
4. In the Document Hierarchy pane, click the Save button.
108
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
CREATING THE ZONES FOR THE OTHER PAGE TYPES
1. In the Fingerprints pane, select the first Optional_Insurance page (Car Rental #1).
2. In the Document hierarchy pane, expand the Optional_Insurance page, select the CDW field, and
draw a bounding box around the Collision Damage Waiver option on the page image.
3. Repeat for the PAI, PEP, and ELP fields. The size of the bounding boxes around the parent fields is not
critical since it determines only what gets displayed to the operator during verification.
 The parent field defines the region displayed during verification, so the dimensions are not critical.
4. In the Document hierarchy pane, expand the CDW field node, select the CDW_Option sub-field, and
draw a bounding box around the Collision Damage Waiver checkbox on the page image. Try to make the
bounding box the same size as the ones you drew on the rental agreement page.
5. Repeat for the PAI_Option, PEP_Option, and ELP_Option sub-fields. Try to make the bounding
boxes around the checkboxes the same size as the ones you drew for the options on the previous page.
6. In the Document Hierarchy pane, click the Save button.
7. In the Fingerprints pane, select the first Air_Ticket page (Airline #1).
8. In the Document Hierarchy pane, expand the Air_Ticket page. Then select each of the fields in turn
and draw a bounding box around the corresponding region on the page image. Then click Save.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
109
DATA RECOGNITION
9. Repeat for each of the fields on the first Room_Receipt page (Hotel #1).
10. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button
110
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
TRAVELDOCS: ASSIGNING THE DEFAULT RULES TO THE DOCUMENT HIERARCHY
When we built the document hierarchy for the TravelDocs application, we ignored the rules required to
process each element. The application framework generated by the Application Wizard attaches default rules
to default elements, but when we created the new elements we did not attach any rules.
ASSIGNING THE DEFAULT PAGE LEVEL RULES TO NEW PAGES
When we built the document hierarchy, we renamed the default page type from “Page” to
“Rental_Agreement.” The “Rental_Agreement” page therefore includes the default rules.
Rental_Agreement
Open
(global)
CreateDocs : Create Fields
Recognize : Recognize Page
Validate : Validate Page
Routing : Routing Rule 1
Export : Export Page Fields
These are the default page level rules
The other page types we created, however, have no rules attached.
Optional_Insurance
Open
 No page level rules
To assign the default page level rules to the new pages:
1. On the Datacap Studio Rulemanager tab, in the Document Hierarchy pane, click the Lock DCO
button.
2. Expand the document hierarchy so you can see all the page nodes as shown below (Rental_Agreement,
Optional_Insurance, Air_Ticket, and Room_Receipt).
3. In the Rulesets pane, expand each of the rulesets so you can see the rules they contain (VScan, ImageFix
Load Settings, Enhance Image, PageID, etc.).
4. In the CreateDocs ruleset, select the Create Fields rule.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
111
DATA RECOGNITION
5. In the Document Hierarchy pane, select the page node Optional_Insurance. Then click the Add to
DCO
button on the left side of the Rulesets pane. This adds the Create Fields rule to the
Optional_Insurance page‟s Open element.
6. Repeat for the Air_Ticket and Room_Receipt pages.
7. In the Recognize ruleset, select the Recognize Page rule. Then add the rule to the
Optional_Insurance, Air_Ticket, and Room_Receipt pages.
8. Repeat to add the Validate: Validate Page, Routing: Routing Rule 1, and Export: Export Page
Fields rules to the same pages. Each of the pages should now have an Open element like the one below.
Optional_Insurance
Open
(global)
CreateDocs : Create Fields
Recognize : Recognize Page
Validate : Validate Page
Routing : Routing Rule 1
Export : Export Page Fields
9. Click the Save button.
ASSIGNING THE DEFAULT FIELD LEVEL RULES TO NEW FIELDS
When we built the document hierarchy, we renamed the default field type from “Field” to “Pickup_Date.”
The “Pickup_Date” field therefore includes the default rules.
Pickup_Date
Open
(global)
Open
(global)
Clean : Fields Clean
 This is the default field level rule
The other fields we created, however, have no rules attached.
Return_Date
Open
 No field level rules
To assign the default field level rule to the new fields:
1. In the Document Hierarchy pane, make sure the DCO is still locked for editing ( ).
2. Expand the document hierarchy so you can see all the field nodes (Pickup_Date, Pickup_Location, etc.).
3. In the Clean ruleset, select the Fields Clean rule.
4. In the Document Hierarchy pane, select the field node Pickup_Location. Then click the Add to
DCO
button on the left side of the Rulesets pane. This adds the Fields Clean rule to the
Pickup_Location field‟s Open element.
5. Repeat for the remaining fields, but do not add the rule to the Options field and its subfields on the
rental agreement page, or to the fields on the optional insurance page, since they are the container groups
for the checkbox options. Also, since the Total_Cost and Return_Date field definitions are shared
across multiple pages, you will only be able to add the rule to the first instance of each field.
6. When you have finished, click the Save button and then click the Unlock DCO button.
112
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
UPDATING THE RECOGNIZE PAGE RULE
The purpose of the Recognize Page rule is to locate the fields on each page and capture the data.
Hotel
Room_Receipt
Laundry
Category
Meals
Other_Charges
4
Num_Items
Category
Laundry
$3.50
Num_Items
Unit_Cost
4
Unit_Cost
$14.00
$3.50
Total_Cost
Total_Cost
$14.00
Other_Charges page
Runtime batch
hierarchy
The default Recognize Page rule uses the information in the document hierarchy to locate the field zones. It
then performs text recognition on those zones using OCR_s to extract the data
In the TravelDocs application we did full page OCR during fingerprint creation and page identification, so it‟s
not necessary to do OCR again on the individual fields. Instead, we can use the SnapCCOtoDCO action to
take the recognition data from the runtime fingerprint CCO file and apply it to the runtime batch hierarchy.
To update the Recognize Page rule:
1. On the Datacap Studio Rulemanager tab, select the Recognize ruleset and click the Lock/Unlock
ruleset (padlock) button to lock the ruleset for editing.
2. Expand the Recognize ruleset completely, as shown above.
3. Right-click the RecognizePageFieldsOCR_S action and choose Remove.
4. Click the Actions library tab.
5. Expand the Recog_Shared library and select SnapCCOtoDCO.
6. Make sure Recognize: Page Function 1 is selected in the Rulesets pane.
7. Click the Add to function button at the left side of the Actions Library pane.
8. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
113
DATA RECOGNITION
RUNNING A BATCH THROUGH THE WORKFLOW
Now that we‟ve attached the default rules to the document hierarch, we can run a batch through the
workflow to see how the application is progressing.
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar.
5. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow, which is PageID.
6. Click the Process rules for target object  button on the main Test tab toolbar and wait while the
task profile executes. It may take a few moments as Taskmaster must perform full page OCR on all the
images in the “images” folder.
7. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow, which is Rulerunner.
8. Click the Process rules for target object  button on the main Test tab toolbar and wait while the
Rulerunner task profile executes.
9. When asked if you want to release the batch, click Advance. This moves the batch to the next step in the
workflow, which is Verify.
10. On the Runtime batch hierarchy tab, expand the first car rental agreement page to see each of the
defined fields and the associated data. In the next section we‟ll examine the Options checkboxes and
update the application to process them correctly.
 The Pickup_Date in the first sample image includes a spurious character we introduced so we can
examine confidence levels later.
11. Since we have some more work to do before we‟re ready to run the Verify task profile, right-click the
batch in the Workflow pane and choose Cancel.
114
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
TRAVELDOCS: UPDATING THE APPLICATION TO HANDLE CHECKBOX OPTIONS
SETTING THE REQUIRED VARIABLES ON THE OPTIONS AND INSURANCE FIELDS
1. On the Datacap Studio Rulemanager tab in the Document Hierarchy pane, click the Lock DCO
button.
2. Expand the Car_Rental document and the Rental_Agreement page.
3. Right-click the Options field and choose Manage variables.
4. Click New, type RecogType, and press Enter. Then click New, type MultiPunch, and press Enter.
 Variables are case sensitive, so make sure you capitalize RecogType and MultiPunch as shown here.
5. Enter the values RecogType=4 and MultiPunch=1. Then click Done.
6. Expand the Optional_Insurance page.
7. Right-click the CDW field and choose Manage variables.
8. Click New, type RecogType, and press Enter. Note that MultiPunch is not required, since the parent
field has only one sub-field.
9. Enter the value RecogType=4 and click Done.
10. Repeat for the other parent fields (PAI, PEP, and ELP).
11. In the Document Hierarchy pane, click Save and then click Unlock DCO.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
115
DATA RECOGNITION
SPECIFYING THE CHECKMARK TYPE
When using the OCR_A engine for checkmark recognition, you must specify whether or not the checkmarks
have bounding boxes. The default setting is for no bounding box (“Clear background”). Since all of our
checkmarks have square bounding boxes, we need to change the “Checkmark type” setting in on the OCR/A
settings tab. The settings tabs for the various recognition engines are displayed along the bottom of the
Properties pane on the Datacap Studio Zones tab.
1. Click the Datacap Studio Zones tab.
2. In the Document Hierarchy pane, click the Lock DCO button.
3. Expand the Rental_Agreement page and select the Options field.
4. Look at the bottom of the Properties pane. If the OCR/A tab is not visible, right-click any existing tab
and choose Show tabs and select the OCR/A option.
5. Click the OCR/A tab.
6. In the Properties pane, scroll down to the OMR section and set the Checkmark type to Square
background, as shown above.
 Select “Square background”
7. In the document hierarchy pane, click the Save button.
8. Expand the Optional_Insurance page and do the same for the CDW, PAI, PEP, and ELP fields.
9. In the Document Hierarchy pane, click Save and then click Unlock DCO.
116
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
CREATING A RULE TO RECOGNIZE THE OMR FIELDS
1. In the Rulesets pane, select the Recognize ruleset and click the Lock/Unlock ruleset (padlock) button
to lock the ruleset for editing.
2. Right-click the Recognize ruleset and choose Add Rule.
3. Rename the new rule from Rule1 to Recognize OMR Fields.
4. Rename the default function from Function1 to Recognition: OMR.
5. Click the Actions library tab.
6. Expand the ocr_a library and select RecognizeFieldOCR_A.
7. Make sure the Recognition: OMR function is selected in the Rulesets pane.
8. Click the Add to function button at the left side of the Actions Library pane.
9. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
ADDING THE “RECOGNIZE OMR FIELDS” RULE TO THE DOCUMENT HIERARCHY
1. In the Document Hierarchy pane, click the Lock DCO button.
2. Expand the document hierarchy so the Options field on the Rental_Agreement page and CDW, PAI,
PEP, and ELP parent fields on the Optional_Insurance page are visible.
3. In the Recognize ruleset, select the Recognize OMR Fields rule.
4. In the Document Hierarchy pane, select the Options field node. Then click the Add to DCO
button on the left side of the Rulesets pane. This adds the Recognize OMR Fields rule to the Options
field‟s Open element.
5. Repeat to add the rule to each of the parent fields on the Optional_Insurance page.
6. In the Document Hierarchy pane, click the Save button and then click Unlock DCO.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
117
DATA RECOGNITION
RUNNING A BATCH THROUGH THE WORKFLOW
1. Run a batch through the VScan, Page ID, and Rulerunner task profiles as described earlier (see “Running
a batch through the workflow” on page 114).
2. When the Rulerunner task completes and you have advanced the batch to the Verify task, expand the first
car rental page on the Runtime batch hierarchy tab to see the fields and the associated data.
In the example above, the value of the Options field is 001, indicating that Taskmaster interpreted the
first option as “Not selected,” the second option as “Not selected,” and the third option as “Selected.”
3. Expand the first optional insurance page and make sure the values are correct.
4. Right-click the batch in the Workflow pane and choose Cancel.
118
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
TRAVELDOCS: USING PIXEL THRESHOLD CHECKBOX RECOGNITION (OPTIONAL)
As we saw in the previous section, the OCR/A checkbox recognition method works well for the TravelDocs
application and is therefore the preferred method since it‟s easy to set up. In this optional section, we‟ll go
through the pixel threshold method to demonstrate how it works.
UPDATING THE RECOGNIZE OMR FIELDS RULE TO USE RECOGOMRTHRESHOLD
1. In the Rulesets pane, select the Recognize ruleset and click the Lock/Unlock ruleset (padlock) button
to lock the ruleset for editing.
2. Expand the Recognize ruleset, the Recognize OMR Fields rule, and the Recognition: OMR
function.
3. Right-click the RecognizeFieldOCR_A action and choose Remove.
4. Click the Actions library tab.
5. Expand the Recog_Shared library and select RecogOMRThreshold.
6. Make sure the Recognition: OMR function is selected in the Rulesets pane.
7. Click the Add to function button at the left side of the Actions Library pane.
8. Select the RecogOMRThreshold action and set the parameters in the Properties pane as follows:

Threshold = 20

Background = 20
 These are our “starter” values. We‟ll calculate the proper values in the next section.
9. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
119
DATA RECOGNITION
DETERMINING APPROPRIATE THRESHOLD AND BACKGROUND SETTINGS
When we set the parameters on the RecogOMRThreshold action we used the values 20, 20, where the first
number is the threshold parameter and the second number is the background parameter. Obtaining optimum
values is a little tricky and typically requires that you run multiple sample pages through the workflow and
then check the density strings and confidence levels.
CHECKING THE OPTION VALUES AND OBTAINING THE DENSITY STRING VALUES
1. Run a batch through the VScan, Page ID, and Rulerunner task profiles as described earlier (see “Running
a batch through the workflow” on page 114).
2. When the Rulerunner task completes and you have advanced the batch to the Verify task, expand the first
car rental page on the Runtime batch hierarchy tab to see the fields and the associated data.
In the example above, the value of the Options field is 001, indicating that Taskmaster interpreted the
first option as “Not selected,” the second option as “Not selected,” and the third option as “Selected.”
3. Select the Options field in the runtime batch hierarchy. Then click the Image tab in the center pane of
the Datacap Studio window if it not already visible. The Options zone is highlighted.
In this example, the first option is not selected, the second is not selected, and the third is selected. In this
case Taskmaster determined the values correctly. However, depending on the amount of space you left
around the checkboxes, your version may or may not work. Even if the 20,20 values worked, you must
continue in order to determine optimum parameter values.
120
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA RECOGNITION
4. Open the most recent runtime batch folder in C:\Datacap\TravelDocs\batches.
5. Open the file tm000001.xml and scroll almost to the bottom so you can see the Options field data.
6. Make a note of the DensityString value (“ACG” in this example) and then close the file.
 Note the three lines following the density string line. These represent the three checkbox options.
The values 48 and 49 are the ASCII values for 0 (not selected) and 1 (selected) respectively. The
“cn” attributes represent the confidence level, where 10 is high confidence and 5 is low confidence.
7. Repeat the steps above to check the values for the Insurance options and make a note of the
DensityString value in tm000002.xml.
From tm000002.xml:
CDW: <V n="DensityString">C</V>
PAI: <V n="DensityString">C</V>
PEP: <V n="DensityString">B</V>
ELP: <V n="DensityString">G</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
121
DATA RECOGNITION
INTERPRETING THE DENSITY STRING VALUES
Using the density string characters and the formula presented earlier (see “Determining the appropriate
threshold and background values” on page 105), we can calculate the percentage of black pixels for each
OMR zone. Using the examples on the previous page yields the results shown below.
Parent field
Sub-field
Options
Insurance
Checkbox
Density value
Percentage filled
Nav_System
A
17%
Child_Seat
C
19%
Fuel_Service
G
23%
CDW
C
19%
PAI
C
19%
PEP
B
18%
ELP
G
23%
Using the data above, we might reasonably determine that threshold = 20.5 and background = 20 represent
good values. Using the (2 * threshold – background) formula sets the high confidence threshold at 21.
However, it would be prudent to scan additional pages and check their density strings before setting final
values.
Parent field
Sub-field
Options
Nav_System
Checkbox
Percentage filled
Result in runtime hierarchy
17%
Option: Selected
Confidence: High
Child_Seat
19%
Option: Not selected
Confidence: High
Fuel_Service
23%
Option: Selected
Confidence: High
Insurance
CDW
19%
Option: Selected
Confidence: High
PAI
19%
Option: Not selected
Confidence: High
PEP
18%
Option: Not selected
Confidence: High
ELP
23%
Option: Not selected
Confidence: High
Since the OMR zones you drew are most likely slightly different from the ones that generated the data above,
your values will likely be different. Use the density characters you obtained to determine appropriate values
for the threshold and background parameters, and update the Recognize OMR Fields rule. Then run a new
batch and review the results, including the confidence values in the page data files.
122
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 10
DATA VALIDATION
Data validation is where you determine if the data you‟ve captured meets the rules for data integrity as defined
in the business requirements. For example, when we established the business rules for the TravelDocs
application, we decided to test if the cost fields are in a valid currency format, if the car type on the rental
agreement page is one of a set of predefined values, and if the total cost on the air ticket page equals the air
fare plus taxes.
A validation failure does not necessarily mean the original page contains invalid data – it could mean the
recognition engine failed to recognize one or more characters correctly. Whatever the reason for the error,
you can set the page status to make sure the page is displayed to an operator for verification.
This chapter covers some basic validation techniques and identifies some of the key actions that are available
for performing validation. At the end of the chapter, we‟ll update the TravelDocs application to implement
the validation rules we defined earlier in the business requirements.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
123
DATA VALIDATION
VALIDATING DATA
The purpose of validation is to determine whether captured data conforms to specified business rules. For
example:

Does an expense lie within permitted limits?

Are dates valid and within a permitted range?

Is the total cost calculated correctly?

Does the vendor information match the information stored in a database of approved vendors?

Does a field value match one of a set of permitted values?
Taskmaster performs validation using rules you create and attach to specific items in the document hierarchy.
For example, to check whether an expense lies within permitted limits, you might first create a rule that:

Ensures the expense field contains numeric data in a valid currency format

Determines if the value is less than or equal to the maximum permitted limit

Performs exception handling if the value is invalid or above the permitted limit
You can then attach the rule to the expense field in the document hierarchy.
Total_Cost
 Field
Open
(global)
Validate : Validate Expense Field
 Validation rule
Close
124
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
CHECKING THAT DATA FORMATS ARE VALID (CURRENCY, DATES, FIELD LENGTH, ETC.)
Before you can apply specific business rules to a field, you must typically make sure the data format is valid.
For example, there‟s no point testing whether an expense lies within permitted limits until you‟ve determined
that the field contains a valid currency value.
The application‟s business requirements should specify valid formats for all the fields your application is
testing. Below are examples of three acceptable currency values as defined by the business requirements for
the TravelDocs application:

$477.82

824.83

254.40 USD
The “Validations” actions library includes several actions that test a field‟s format, including:
Library
Action
Description
Validations
IsFieldCurrency
Returns True if the field is numeric and includes a 2-digit decimal
amount; returns False otherwise. The action ignores any leading
currency symbol (for example, $).
Validations
IsFieldDate
Returns True if the field is in one of the supported date formats;
returns False otherwise.
Validations
IsFieldLengthMax
Returns True if the field contains no more than the specified number of
characters; returns False otherwise.
Validations
IsFieldLengthMin
Returns True if the field contains at least the specified number of
characters; returns False otherwise.
Validations
IsFieldPercentAlpha
Returns True if the field contains no numeric or special characters;
returns False otherwise.
Validations
IsFieldPercentNumeric
Returns True if the field contains no alphabetic or special characters;
returns False otherwise.
For detailed information on these and other actions in the “Validations” library, select the action on the
Actions Library tab and click the Display information
button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
125
DATA VALIDATION
VALIDATING CALCULATED FIELDS
A calculated field gets its value from one or more independent fields. For example, on the TravelDocs air
ticket, the total cost equals the airfare plus taxes and fees.
You can perform validation to ensure that a calculated value is correct. In the air ticket example, you might
use validation to ensure that the total cost is correct based on the airfare and tax fields. Note that an error
does not necessarily mean that the value was calculated incorrectly on the original page – it could mean one of
the values was recognized incorrectly. Whatever the reason for the error, you can set the page status to make
sure the page is displayed to an operator for verification.
The “Validations” actions library includes a Calculate action that lets you perform arithmetic operations
on numeric field values.
Library
Action
Description
Validations
Calculate
Returns True if the arithmetic expression is valid; returns False otherwise.
The example below returns True if the total cost equals the airfare plus taxes.
Calculate("'Total_Cost' = 'Airfare' + 'Taxes'")
Note that the fields you reference must contain numeric data, otherwise the action will fail.
Since a calculation involves multiple field values, and since the Calculate action operates only on child
objects, you cannot perform the calculation at the field level. Instead, you must perform the calculation at the
page level.
Page
Field
(Airfare)
Field
(Taxes)
 Perform calculation at page level
Field
(Total_Cost)
 You can perform calculations on values from different pages or even different documents within the
batch. To do this you must use variables, which are beyond the scope of this introduction.
126
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
WHEN TO PERFORM VALIDATION ON CALCULATED FIELDS
The Calculate action works only on numeric fields. If your Validation ruleset includes rules to validate and
possibly correct the format of individual fields (for example, removing a “USD” suffix from a currency field),
you must run the page-level validation after you have executed all field-level validations. You can do this using
the page‟s “Close” element.
Air_Ticket
Open
Item_Cost
Open
(global)
Validate : Validate Currency Field
Close
Taxes
Open
(global)
Validate : Validate Currency Field
Close
Total_Cost
Open
(global)
Validate : Validate Currency Field
Close
Close
(global)
Validate : Validate Total Cost
This rule runs when Taskmaster has finished
processing all of the page’s child fields
DISPLAYING VALIDATION FAILURES TO AN OPERATOR
Taskmaster maintains a Status variable for each object in the runtime batch hierarchy. As it executes rules, it
updates the status accordingly:

A status of 0 indicates operations on this object were successful

A status of 1 indicates a problem or potential problem
Taskmaster uses a few other status codes (for example, 49 indicates that a page was scanned successfully), but
in terms of determining which pages are displayed to an operator it‟s generally the status code 1 that is most
important. By default, Taskmaster displays all pages, but later we‟ll configure the TravelDocs application to
display only pages with a status of 1 (see „Determining which pages to display to the operator” on page 159).
Problems that result in a status code of 1 include:

Unrecognized or low confidence characters: By default, if a page has any low confidence characters,
Taskmaster sets the page status to 1 and displays fields with low confidence characters in yellow in the
verify panel. Confidence levels are discussed in more detail later (see “Understanding confidence levels
and setting the page status” on page 147).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
127
DATA VALIDATION

Validation failures: If a field fails validation, Taskmaster sets the field status to 1 and the page status to 1
and displays the field in red in the verify panel.
Very important in setting the status correctly is the use of the Status_Preserve_OFF action (in the
“rrunner” library). This action and the related Status_Preserve_ON action determine whether or not
validation rules can update an object‟s status.
Library
Action
Description
rrunner
Status_Preserve_OFF
Turns the “Status Preserve” setting of a page and its child fields
to OFF, meaning validation rules can update an object’s status if
a validation fails.
rrunner
Status_Preserve_ON
Turns the “Status Preserve” setting of a page and its child fields
to ON, meaning validation rules cannot change an object’s status.
In most situations, you want to make sure “Status Preserve” is turned OFF at the start of validation.
The default page-level rule “Validate Page” in the Validate ruleset generated by the Application Wizard sets
“Status Preserve” OFF for you.
The rule is attached to the default page type, but you must attach it manually to any new pages you create.
READING THE STATUS VARIABLE
You can‟t check the status variable for a page or field from within Datacap Studio, so you must read the
runtime batch data files:

You can get the status of each page from the task profile‟s data file (for example, Rulerunner.xml).

You can get the status of each field from the page data files (for example, tm000001.xml).
<P id="TM000001">  Page definition
<F id="Pickup_Date">  Field definition
<V n="TYPE">Rental_Agreement</V>
<V n="TYPE">Pickup_Date</V>
<V n="STATUS">1</V>
<V n="Position">194,402,563,458</V>
<V n="IMAGEFILE">tm000001.tif</V>
<V n="STATUS">0</V>
etc.
<C cn="10" cr="203,416,225,438">77</C>
</P>
<C cn="10" cr="230,423,245,438">111</C>
etc.
</F>
Rulerunner.xml (page status)
128
tm000001.xml (field status)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
USING EXTERNAL DATA SOURCES DURING VALIDATION
Sometimes you may need to compare runtime field values to an external data source to determine if the
values on a page are valid. For example, you may need to determine if the vendor information matches the
information stored in a database of approved vendors.
The “Lookup” library includes actions for connecting to external data sources and executing SQL statements.
The available actions include:
Library
Action
Description
Lookup
OpenConnection
Uses a data source name or connection string to open a connection to
a database.
Lookup
ExecuteSQL
Executes a SQL statement. Returns True if the SQL statement executes
successfully and any SELECT statement returns a value.
Lookup
CloseConnection
Closes an open database connection.
There are other actions you can use to do things like populate other fields in the runtime hierarchy using data
from the database, but these are beyond the scope of this guide.
For an example of how to use an external data source to perform validation, see “Using a lookup database to
validate the car type” on page 134.
HANDLING VALIDATION ERRORS
Although we‟ve seen how validation actions set the object‟s Status variable if there‟s an error and how the
Status variable determines which pages Taskmaster displays to the operator, it‟s sometime helpful to
perform additional error handling within the application.
The rule below has two functions and each function has two actions. Suppose that Function1 contains
validation actions and Function2 is there to handle validation errors.
Rule: Validation rule
Function1: Perform validation
Action1
Action2
Function2: Handle validation errors
Action3
Action4
Using the rules for execution of functions and actions within a rule:

If Action1 returns False, Taskmaster skips Action2 and executes Function2.

If Action1 returns True and Action2 returns False, Taskmaster executes Function2.

If Action1 and Action2 both return True, Taskmaster does not execute Function2.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
129
DATA VALIDATION
TRAVELDOCS: UPDATING THE APPLICATION TO PERFORM VALIDATION
VALIDATING THE CURRENCY FIELDS
The currency field validation rule we‟ll create here uses the following actions.
Action
Description
DeleteAllAlpha
Deletes all alphabetic characters from the field. We’ll use this to remove the “USD”
suffix used by one of the airlines.
DeleteSelectedChars
Deletes the specified characters from the field. We’ll use this to remove any spaces.
IsFieldCurrency
Returns True if the field is numeric and includes a 2-digit decimal amount; returns
False otherwise. The action ignores any leading currency symbol (for example, $).
We‟ll assign the rule to all of the currency fields in the document hierarchy.
CREATING THE “VALIDATE CURRENCY FIELD” RULE
1. On the Datacap Studio Rulemanager tab in the Rulesets pane, select the Validate ruleset and click the
Lock/Unlock ruleset (padlock) button to lock the ruleset for editing.
2. Right-click the Validate ruleset and choose Add Rule.
3. Rename the new rule from Rule1 to Validate Currency Field.
4. Rename the default function from Function1 to Validation: Currency.
5. Click the Actions library tab.
6. Expand the Validations library and select the DeleteAllAlpha action.
7. Make sure the Validation: Currency function is selected in the Rulesets pane.
8. Click the Add to function button at the left side of the Actions Library pane.
9. On the Actions library tab, select the DeleteSelectedChars action and click Add to Function.
10. On the Actions library tab, select the IsFieldCurrency action and click Add to Function.
11. Select the DeleteSelectedChars action and set the strParam parameter to ' ' (a single space) in the
Properties pane.
12. In the Rulesets pane, click the Save button.
130
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
ADDING THE “VALIDATE CURRENCY FIELD” RULE TO THE DOCUMENT HIERARCHY
1. In the Document Hierarchy pane, click the Lock DCO button.
2. Expand the Car_Rental > Rental_Agreement page so the fields are visible.
3. In the Validate ruleset, select the Validate Currency Field rule.
4. In the Document Hierarchy pane, select the Total_Cost field node. Then click the Add to DCO
button on the left side of the Rulesets pane. This adds the Validate Currency Field rule to the Total_Cost
field‟s Open element.
5. Expand the Flight > Air_Ticket page so the fields are visible.
6. Select the Airfare field and click the Add to DCO
Airfare field‟s Open element.
7. Select the Taxes field and click the Add to DCO
Taxes field‟s Open element.
button to add the Validate Currency Rule to the
button to add the Validate Currency Rule to the
 Since the Total_Cost field definition is shared across pages, the Validate Currency Rule is already
included in the Total_Cost field on the Air_Ticket and Room_Receipt pages.
8. In the Document Hierarchy pane, click the Save button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
131
DATA VALIDATION
VALIDATING THE FLIGHT COST
The rule we‟ll create here to validate the total cost field uses the following action.
Action
Description
Calculate
Returns True if the arithmetic expression is valid; returns False otherwise.
We‟ll assign the rule to the Air_Ticket page‟s “Close” element in the document hierarchy.
CREATING THE “VALIDATE FLIGHT COST” RULE
1. Make sure the Validate ruleset is locked for editing.
2. Right-click the Validate ruleset and choose Add Rule.
3. Rename the new rule from Rule1 to Validate Flight Cost.
4. Rename the default function from Function1 to Validation: Flight Cost.
5. On the Actions library tab, select the Calculate action (also in the Validations library).
6. Make sure the Validation: Flight Cost function is selected in the Rulesets pane.
7. Click the Add to function button at the left side of the Actions Library pane.
8. Select the Calculate action and in the Properties pane set the strParam parameter to:
'Airfare' + 'Taxes' = 'Total_Cost'
 These are the names of the fields as specified in the document hierarchy.
9. In the Rulesets pane, click the Save button.
132
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
ADDING THE FLIGHT COST RULE TO THE DOCUMENT HIERARCHY
1. In the Document Hierarchy pane, make sure the document hierarchy is locked for editing.
2. Select the Close element at the end of the Air_Ticket page definition.
3. In the Validate ruleset, select the Validate Flight Cost rule.
4. Click the Add to DCO
element.
button to add the Validate Flight Cost rule to the Air_Ticket page‟s Close
5. In the Document Hierarchy pane, click the Save button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
133
DATA VALIDATION
USING A LOOKUP DATABASE TO VALIDATE THE CAR TYPE
In this section, we‟ll use the car type field on the rental agreement page to demonstrate how to test whether a
field contains a value that‟s in a list of permitted values. The list of permitted car types we‟ll use is:





Compact
Standard
Full size
SUV
Other
For each car type field, we‟ll test the field value and set the field status to 1 (indicating a problem) if the car
type doesn‟t match one of these permitted types.
In order to perform the database lookup, we‟ll use the actions below.
Library
Action
Description
Lookup
OpenConnection
Uses a data source name or connection string to open a connection to a database.
Lookup
ExecuteSQL
Executes a SQL statement. Returns True if the SQL statement executes successfully
and any SELECT statement returns a value.
Lookup
CloseConnection
Closes an open database connection.
CREATING THE LOOKUP DATABASE TABLE
 If you don‟t have Microsoft Access, copy the file TravelDocsLook.mdb from the sample images
download into the C:\Datacap\TravelDocs folder.
1. Open the file C:\Datacap\TravelDocs\TravelDocsLook.mdb in Microsoft Access.
2. Create a new table called Car_Types.
3. Create a new field called Car_Type of type “Text.”
4. Enter the permitted values (Compact, Standard, Full size, SUV, and Other) as shown below.
5. Save the new table.
CREATING THE “VALIDATE CAR TYPE” RULE
1. In the Rulesets pane, make sure the Validate ruleset is locked for editing.
2. Right-click the Validate ruleset and choose Add Rule.
3. Rename the new rule from Rule1 to Validate Car Type.
4. Rename the default function from Function1 to Validation: Car Type.
5. On the Actions library tab, expand the Lookup library and select the OpenConnection action.
134
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
6. Make sure the Validation: Car Type function is selected in the Rulesets pane.
7. Click the Add to function button at the left side of the Actions Library pane.
8. Select the ExecuteSQL action and click Add to Function.
9. Select the CloseConnection action and click Add to Function.
10. Select the OpenConnection action and in the Properties pane set the strParam parameter to:
@APPVAR(*/lookupdb:cs)
 This is a Taskmaster smart parameter that obtains the connection string for the application‟s
lookup database from the application configuration file.
11. Select the ExecuteSQL action and in the Properties pane set the sStringIn parameter to:
"SELECT Car_Type FROM Car_Types WHERE Car_Type='%s';",Car_Type
 It‟s important to get the syntax exactly as it is here. You can copy and paste from here if necessary.
12. In the Rulesets pane, click Save. Then click the Lock ruleset button and choose Publish ruleset.
ADDING THE “VALIDATE CAR TYPE” RULE TO THE DOCUMENT HIERARCHY
1. In the Document Hierarchy pane, make sure the document hierarchy is locked for editing.
2. Expand the Car_Rental > Rental_Agreement page so the fields are visible.
3. In the Validate ruleset, select the Validate Car Type rule.
4. In the Document Hierarchy pane, select the Car_Type field node. Then click the Add to DCO
button on the left side of the Rulesets pane. This adds the Validate Car Type rule to the Car_Type field‟s
Open element.
5. In the Document Hierarchy pane, click the Save button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
135
DATA VALIDATION
CREATING A DICTIONARY OF VALID CAR TYPES
The database lookup in the previous section determines whether or not the value in the car type field is one
of the permitted values. If it isn‟t, Taskmaster sets the field status to 1. This causes the verification panel to
display the field in red, indicating to the operator that there‟s a problem.
In the verification panel you may want to present the operator with a list of valid car types. This way, the
operator can select a valid type, instead of typing one in manually.
The DotEdit and Taskmaster Web verification interfaces both let you populate a drop-down list directly from
the database using a SQL statement embedded in the field‟s SELECT variable. You need to create the variable
by first unlocking the document hierarchy, right-clicking the Car_Type field, and choosing Manage
Variables. You can then add the SELECT variable and set it to the following value:
<SQL flist='Car_Type' dsn="*/lookupdb:cs">SELECT Car_Type FROM Car_Types</SQL>
This uses a SQL query (SELECT <column> FROM <table>) to get the list of valid car types from the
application‟s lookup database (dsn="*/lookupdb:cs"). It then creates a drop-down list in the specified
field (flist='<field>') containing the values returned by the query.
 Another variable, Lookup, is functionally very similar, except that it displays the list of available choices
in a popup window instead of a drop-down list. For this to work with Batch Pilot, you must copy the
file Lookup.dcf from C:\Datacap\BPilot\Verify to your application‟s dco_<app>\verify folder.
Note, though, that Batch Pilot does not support the SELECT variable. To create a selection list that works for
all verification interfaces, you can create a dictionary containing the same valid car types (Compact, Standard,
Full size, SUV, and Other) and attach it to the “Car_Type” field, as described below.
CREATING THE DICTIONARY
1. Make sure the document hierarchy is locked for editing.
2. Click the Dictionaries button at the top of the Document Hierarchy pane.
3. Click the Edit dictionary
button and choose Add dictionary.
4. Change the dictionary name from <new_dictionary> to Car_Types.
5. Right-click the new dictionary and choose Add word.
6. Change the name from <new word> to Compact and the value from value to Compact.
7. Repeat to add Standard, Full size, SUV, and Other to the dictionary. The completed dictionary should
look like the one below.
8. Click the Save button.
136
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
ATTACHING THE DICTIONARY TO THE “CAR_TYPE” FIELD
1. Make sure the document hierarchy is locked for editing.
2. Expand the Car_Rental > Rental_Agreement page so the fields are visible.
3. Right-click the Car_Type field and choose Manage variables.
4. Click New, type DICT, and press Enter.
 Variables are case sensitive, so make sure you capitalize DICT as shown here.
5. Enter the value Car_Types. Then click Done.
6.
In the Document Hierarchy pane, click the Save button and then click Unlock DCO.
RUNNING A BATCH THROUGH THE WORKFLOW
Next, we‟ll run a batch through the workflow to see how the application is progressing.
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar. When asked if you
want to release the batch, click Advance. This moves the batch to the next step in the workflow, which is
PageID.
5. Click the Process rules for target object  button on the main Test tab toolbar. When asked if you
want to release the batch, click Advance. This moves the batch to the next step in the workflow, which is
Rulerunner.
6. Click the Process rules for target object  button on the main Test tab toolbar and wait while the
task profile executes. When asked if you want to release the batch, click Advance. This moves the batch
to the next step in the workflow, which is Verify.
7. Since we‟re not yet ready to run the Verify task profile, right-click the batch in the Workflow pane and
choose Cancel.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
137
DATA VALIDATION
EXAMINING THE PAGE AND FIELD STATUS VALUES
The validation rules we created in this section affect the status that Taskmaster assigns to the status variable
for each page and field.
To see the page status, open Rulerunner.xml in the application‟s most recent batch folder. This file includes
the status of each page in the batch.
<D id="20100323.007.01">
<V n="TYPE">Car_Rental</V>
<V n="STATUS">0</V>
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">1</V>
 Problem on page TM000001
etc.
</P>
<P id="TM000002">
<V n="TYPE">Optional_Insurance</V>
<V n="STATUS">0</V>
 Page OK
etc.
</P>
</D>
<D id="20100323.007.02">
<V n="TYPE">Car_Rental</V>
<V n="STATUS">1</V>
<P id="TM000003">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">1</V>
etc.
</P>
</D>
etc.
<D id="20100323.007.04">
<V n="TYPE">Flight</V>
<V n="STATUS">1</V>
<P id="TM000006">
<V n="TYPE">Air_Ticket</V>
 Problem on page TM000003
<V n="STATUS">1</V>
etc.
</P>
</D>
etc.
 Problem on page TM000006
The three problem pages shown above have Status = 1 for different reasons. To see the nature of the
problems we need to look at the individual page files: tm000001.xml, tm000003.xml, and tm000006.xml.
138
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
TM000001
A portion of tm000001.xml is shown below.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000001">
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">189,403,567,465</V>
<V n="STATUS">0</V>
<C cn="7" cr="200,416,220,440">84</C>
<C cn="4" cr="218,415,226,430">114</C>
<C cn="10" cr="218,423,230,438">117</C>
etc.
</F>
<F id="Pickup_Location">
<V n="TYPE">Pickup_Location</V>
<V n="Position">195,537,558,592</V>
<V n="STATUS">0</V>
<C cn="10" cr="203,549,216,570">66</C>
<C cn="10" cr="219,555,234,570">111</C>
etc.
</F>
<F id="Return_Date">
<V n="TYPE">Return_Date</V>
<V n="Position">580,403,942,465</V>
<V n="STATUS">0</V>
 Low confidence character
 Low confidence character
<C cn="6" cr="593,416,604,438">70</C>
 Low confidence character
<C cn="6" cr="606,423,615,438">114</C>
 Low confidence character
<C cn="7" cr="619,416,621,438">105</C>
<C cn="10" cr="625,434,630,441">44</C>
<C cn="10" cr="690,416,691,438">32</C>
etc.
</F>
etc.
 Low confidence character
All of the fields in TM000001 have Status = 0 (OK), but the pickup date and return date fields have low
confidence characters. By default, any character with a confidence level below 8 is considered low confidence
and is displayed to an operator for verification.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
139
DATA VALIDATION
TM000003
A portion of tm000003.xml is shown below.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000003">
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
<F id="Pickup_Location">
<V n="TYPE">Pickup_Location</V>
 No recognition zone coordinates (and no field data)
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
 No recognition zone coordinates (and no field data)
TM000003 has no data associated with any of the fields. This is because we only defined recognition zones
for the first fingerprint of each page type. Page TM000003 is the rental agreement page for Car Rental #2 and
has no recognition zones. We‟ll fix this and then run the batch again.
TM000006
A portion of tm000006.xml is shown below.
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000006">
etc.
<F id="Airfare">
<V n="TYPE">Airfare</V>
<V n="Position">359,805,527,854</V>
<V n="STATUS">1</V>
 Problem in field (failed validation)
<V n="MESSAGE">Failed By Calculate Action On Field &apos;TM000006&apos;.</V>
etc.
</F>
<F id="Taxes">
<V n="TYPE">Taxes</V>
<V n="Position">359,861,525,905</V>
<V n="STATUS">1</V>
 Problem in field (failed validation)
<V n="MESSAGE">Failed By Calculate Action On Field &apos;TM000006&apos;.</V>
etc.
</F>
<F id="Total_Cost">
<V n="TYPE">Total_Cost</V>
<V n="Position">361,912,527,961</V>
<V n="STATUS">1</V>
 Problem in field (failed validation)
<V n="MESSAGE">Failed By Calculate Action On Field &apos;TM000006&apos;.</V>
etc.
</F>
</P>
In TM000006, the Calculate('Airfare' + 'Taxes' = Total_Cost') validation action failed. Since
Taskmaster cannot know which of the field values is incorrect, it flags all fields.
140
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VALIDATION
CREATING RECOGNITION ZONES FOR THE REMAINING FINGERPRINTS
Earlier we created recognition zones for the first fingerprint of each page type. Now that we know our
application is basically working, we‟ll create the recognition zones for the remaining fingerprints.
Refer to the section “TravelDocs: Specifying recognition zones” starting on page 107 for instructions on
how to create the recognition zones for the different page types.
You‟ll need to create recognition zones for each of the following fingerprints:

Rental_Agreement (Car Rental #2)

Optional_Insurance (Car Rental #2)

Rental_Agreement (Car Rental #3)

Optional_Insurance (Car Rental #3)

Room_Receipt (Hotel #2)

Room_Receipt (Hotel #3)

Air_Ticket (Airline #2)

Air_Ticket (Airline #3)
 As you draw the zones, click the Save button in the Document Hierarchy pane often!
IMPORTANT NOTE ABOUT DRAWING THE CHECKBOX RECOGNITION ZONES
To get accurate recognition on the checkbox options, it‟s important that all the checkbox recognition zones
on all fingerprints be as close to the same size as possible. When drawing zones on the Zones tab this is
sometimes hard to achieve, so you may want to establish approximate zone boundaries by drawing the
bounding boxes on the Image View tab, and then edit the coordinates in the Pos variables manually in the
Properties pane. Refer to “Implications of using RecogOMRThreshold” on page 106 for more information.
RUNNING A BATCH THROUGH THE WORKFLOW
Now that you‟ve defined all of the required recognition zones, you can run a batch through the workflow.
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Click the Process rules for target object  button on the main Test tab toolbar. When asked if you
want to release the batch, click Advance. This moves the batch to the next step in the workflow, which is
PageID.
5. Click the Process rules for target object  button on the main Test tab toolbar. When asked if you
want to release the batch, click Advance. This moves the batch to the next step in the workflow, which is
Rulerunner.
6. Click the Process rules for target object  button on the main Test tab toolbar and wait while the
task profile executes. When asked if you want to release the batch, click Advance. This moves the batch
to the next step in the workflow, which is Verify.
7. Review each of the pages in the Runtime batch hierarchy pane to make sure recognition was successful.
Then review the batch and page XML files in the runtime batch folder.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
141
DATA VALIDATION
INTERPRETING THE PAGE AND FIELD STATUS CODES IN THE TRAVELDOCS APPLICATION
The table below describes how to interpret the status code for each of the fields we validated, as well as each
of the pages in the runtime batch.
Field
STATUS = 0
STATUS = 1
Page OK
Page contains unrecognized or low confidence characters,
or a field with Status = 1
Car_Type
Field OK
Field value is not one of the permitted values
Total_Cost
Field OK
Field value is not currency
Optional_Insurance
Page OK
Page contains unrecognized or low confidence characters,
or a field with Status = 1
Total_Cost
Field OK
Field value is not currency
Room_Receipt
Page OK
Page contains unrecognized or low confidence characters,
or a field with Status = 1
Total_Cost
Field OK
Field value is not currency
Page OK
Page contains unrecognized or low confidence characters,
or a field with Status = 1
Airfare
Field and all
calculated fields OK
Field value is invalid or calculated fields do not add
correctly
Taxes
Field and all
calculated fields OK
Field value is invalid or calculated fields do not add
correctly
Total_Cost
Field and all
calculated fields OK
Field value is invalid or calculated fields do not add
correctly
Car_Rental
Rental_Agreement
Hotel
Flight
Air_Ticket
142
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 11
DATA VERIFICATION
During verification, you display pages to an operator for manual checking and possibly correction. There are
three primary reasons to display pages to an operator:

The batch failed document integrity checking – we covered this earlier in the chapter on document
assembly.

A page contains one or more characters or OMR fields that were marked “low confidence” by the
recognition engine.

A validation rule failed, indicating that there‟s a problem with the integrity of the data.
This chapter covers the last two cases. It begins with an overview of Taskmaster‟s three verification user
interfaces: Batch Pilot, DotEdit, and Taskmaster Web. It then goes on to discuss character recognition and
confidence values, and how you determine which fields require operator verification due to low confidence
recognition. We‟ll look next at handling validation errors and how to let an operator override a validation
error. Finally, we‟ll update the TravelDocs application and create user interface panels to do verification using
Batch Pilot, DotEdit, and Taskmaster Web.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
143
DATA VERIFICATION
VERIFYING DATA
During verification, Taskmaster displays pages to an operator to confirm and if necessary correct problem
fields. Problem fields include:

Character fields with one or more low confidence characters

OMR fields with low confidence values

Fields with validation errors
OPTIONS FOR DATA VERIFICATION
Taskmaster includes three different user interface options for verification:

Batch Pilot: “Thick client” verification (requires installation of Taskmaster Client).

DotEdit: Alternative “thick client” verification (also requires installation of Taskmaster Client).

Taskmaster Web client: “Thin client” verification (runs in a web browser).
Taskmaster applications can support any or all verification options simultaneously. All verification clients
access the same job queue and provide similar functionality in terms of identifying problems, correcting them,
and submitting the batch to the next stage in the workflow.
BATCH PILOT
Batch Pilot panels are VBScript forms created using the Batch Pilot GUI designer. The Batch Pilot GUI
designer is covered later under “Using Batch Pilot for verification” on page 151.
144
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
DOTEDIT
DotEdit panels are .NET forms. The default field-at-a-time interface (shown above) is generated
automatically from the application‟s document hierarchy and is covered later under “Using DotEdit for
verification” on page 161.
You can also create custom panels using the DotEdit panel builder, which is distributed as a Microsoft Visual
Studio project. Custom panels typically display all of a page‟s fields simultaneously, like Batch Pilot panels.
Creating custom panels is beyond the scope of this guide, but is covered in detail in the IBM Datacap
Taskmaster Capture Creating Custom DotEdit Panels Guide.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
145
DATA VERIFICATION
TASKMASTER WEB CLIENT
Taskmaster Web generates verification panels automatically from the document hierarchy, although it‟s also
possible to create “static” layouts and add other custom functionality. The web page for the “Prelayout”
verification client (shown above) includes:

An image pane (top) that displays the current page

A data entry panel (bottom left) that displays image snippets and controls for checking and correcting the
data fields

A batch tree view (bottom right) for restructuring the batch
The Prelayout web verification client is covered in more detail in a later chapter (see “Verification using the
Prelayout web client” on page 324).
 Taskmaster includes other web verification clients that are covered later in the chapter “Taskmaster
Web and Remote Scanning” starting on page 315.
Taskmaster Web requires additional Microsoft IIS configuration that is covered later under “Using
Taskmaster Web for verification” starting on page 163. Taskmaster Web is functionally similar to Batch Pilot
and DotEdit in that the operator must review each problem page, make any necessary corrections, and submit
the batch when complete.
146
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
UNDERSTANDING CONFIDENCE LEVELS AND SETTING THE PAGE STATUS
CONFIDENCE LEVELS
During recognition, Taskmaster assigns a confidence level to each character and OMR field. Confidence
levels range from 1 (lowest confidence) to 10 (highest confidence). You can see confidence level for each
character or OMR field in the object‟s “cn” attribute in the page data file.
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">189,403,567,465</V>
<V n="STATUS">0</V>
<C cn="7" cr="205,414,219,439">83</C>
 ASCII ‘T’
[low confidence]
<C cn="4" cr="205,414,219,439">83</C>
 ASCII ‘r’
[low confidence]
<C cn="10" cr="224,423,236,438">117</C>
 ASCII ‘u’
[high confidence]
<C cn="10" cr="241,423,255,438">101</C>
 ASCII ‘e’
[high confidence]
<C cn="10" cr="256,423,266,438">115</C>
 ASCII ‘s’
[high confidence]
<C cn="10" cr="270,434,275,441">44</C>
 ASCII ‘,’
[high confidence]
<C cn="10" cr="334,416,335,438">32</C>
 ASCII ‘ ’
[high confidence]
<C cn="10" cr="288,416,304,438">68</C>
 ASCII ‘D’
[high confidence]
 ASCII ‘e’
[high confidence]
<C cn="10" cr="308,423,320,438">101</C>
etc.
</F>
The confidence level determines how Taskmaster displays the character and the parent field within the
verification panel:

The three verification clients all display fields containing low confidence characters in yellow, where low
confidence in this case is anything less than 10.

Within the field, Batch Pilot and the Taskmaster Web client display the low confidence characters in red,
while DotEdit highlights the problem characters in yellow within the image snippet. Low confidence in
this case is anything less 10 or the field‟s ReqConf value (see “Overriding the default confidence value on
specific fields” on page 148).
Batch Pilot:
Taskmaster Web:
DotEdit:
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
147
DATA VERIFICATION
SETTING THE PAGE STATUS
The confidence level does not directly affect the page status. For example, you might have a page where every
character has a confidence level of 1 (lowest confidence) but the page status could still be 0 (OK). In order to
set the page status based on the confidence level of the characters on the page, use the ChkConfidence
action.
Library
Action
Description
DCO
ChkConfidence
Checks the confidence level of all characters. If the confidence level
on any character is less than the value specified in parameter 1, the
action assigns the status value in parameter 2 to the page.
The default “Routing” ruleset generated by the Application Wizard uses this action to set the page status
accordingly.
Routing Rule 1 is assigned to each page in the document hierarchy and works as follows:

ChkDCOStatus checks the page status and returns True if the page status is 1. A status of 1 typically
signifies that there‟s an error on the page. If the action returns True, Function 2 does not execute.

ChkDCOStatus returns False if the page status is 0 (or any other value other than 1). A status of 0
typically signifies that the page contains no errors. If the action returns False, Function 2 executes.

ChkConfidence examines the characters on the current page and sets the page status to 1 if any
character has a confidence level of less than 8 (or the field‟s ReqConf value as described below).
Following execution of Routing Rule 1, any page with a validation error or a character with a confidence level
less than 8 will have Status = 1. You can configure Taskmaster to display only pages with Status = 1 as
described under “Determining which pages to display to the operator” on page 159.
OVERRIDING THE DEFAULT CONFIDENCE VALUE ON SPECIFIC FIELDS
When determining which characters to display in red, the Batch Pilot and Taskmaster Web clients use a
confidence level of 10, unless the field has its own ReqConf value.
Similarly, the ChkConfidence action uses the confidence value specified in parameter 1, unless the field has
its own ReqConf value. For example, if you specify 8 as the parameter value but a field has ReqConf=6,
ChkConfidence uses the value 6 for that field.
To set the confidence level on a specific field:
1. In the Document Hierarchy pane, lock the document hierarchy for editing.
2. Right-click the field and choose Manage variables.
3. If the field has a ReqConf variable, assign the appropriate value; otherwise click New, type ReqConf,
press Enter, and then assign the value.
4. Click Done and then click the Save button in the document hierarchy pane.
148
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
OVERRIDING VALIDATION FAILURES
By default, all validations are overrideable, meaning the operator can submit a batch that contains validation
errors by selecting to override them.
Depending on the business requirements, this may or may not be appropriate. For example, if the validation
error stems from a calculation error on the original page, then it might not be appropriate for the operator to
modify the field values. In this case, the operator must be able to override the error and submit the batch.
The application should set the batch status so the batch is sent to an exception handling task (this kind of
exception handling is beyond the scope of this guide). When the operator overrides a validation error,
Taskmaster sets the page status to 73 and the document status to 142.
In other situations, you may want to prevent the operator from overriding validation errors. In this case you
can use the SetIsOverrideable action.
Library
Action
Description
Validations
SetIsOverrideable
If set to False, specifies that if validation on the current object fails,
the operator cannot override the error; if set to True, the operator can
override the error.
For example, to prevent the operator from overriding an error in the Validate Car Type rule, you can insert
SetIsOverrideable("False") as shown below.
In this case, the operator must modify the page data by selecting a valid car type so the validation passes.
If the operator attempts to submit a page that failed this validation, the following message is displayed.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
149
DATA VERIFICATION
TRAVELDOCS: VERIFYING THE BATCH
This section covers verification using Batch Pilot, DotEdit, and Taskmaster Web.
SETTING THE CAR TYPE FIELD TO NON-OVERRIDEABLE
Before we run a batch through the workflow, we‟ll set the SetIsOverrideable property on the Car Type field
to False. This will prevent the operator from overriding any non-valid car type.
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the Validate ruleset and click
Lock/Unlock ruleset button to lock the ruleset for editing.
2. Expand the Validate Car Type rule completely.
3. Select the Validation: Car Type function.
4. Expand the Validations library and select SetIsOverrideable.
5. Click the Add to function button at the left side of the Actions Library pane to add the action to the
Validate ruleset.
6. In the Properties pane, set StrParam to False.
7. Use the  button to move the new action to the beginning of the function.
8. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
150
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
USING BATCH PILOT FOR VERIFICATION
CREATING THE RENTAL AGREEMENT PANEL
1. Click Start > All Programs > Datacap > Batch Pilot > Batch Pilot.
2. Click File > Open Project.
3. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_verify.bpp and click Open.
4. In the bottom section of the Batch Pilot window, expand TravelDocs, and then expand Car_Rental.
5. Right-click the Rental_Agreement page and choose AutoForm. AutoForm reads the document
hierarchy (setup DCO) and displays an image snippet control and an edit or listbox control for each of
the defined fields.
6. Click File > Save Form.
7. Open the dco_TravelDocs\verify folder and save the form as Rental_Agreement.dcf.
8. In the bottom section of the Batch Pilot window, right-click the Rental_Agreement page and choose
Pick form.
9. From the dco_TravelDocs\verify folder select Rental_Agreement.dcf and click Open. This links the
form to the page type.
10. Click File > Save Project.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
151
DATA VERIFICATION
CREATING THE REMAINING PANELS
1. In the bottom section of the Batch Pilot window, select the Optional_Insurance page.
2. Right-click in the Optional_Insurance page and choose AutoForm. Batch Pilot creates a form for the
Optional_Insurance page.
3. Click File > Save Form.
4. Open the dco_TravelDocs\verify folder and save the form as Optional_Insurance.dcf.
5. In the bottom section of the Batch Pilot window, right-click the Optional_Insurance page and choose
Pick form.
6. From the dco_TravelDocs\verify folder select Optional_Insurance.dcf and click Open. This links the
form to the page type.
7. Repeat these steps for Air_Ticket and Room_Receipt. When you‟ve created the forms and linked them
to the appropriate page type the bottom section of the Batch Pilot window should look like the one
below.
8. Click File > Save Project and then close the Batch Pilot window.
152
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
PREPARING A BATCH FOR VERIFICATION
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.
2. Select the TravelDocs application and click OK.
3. Log in using User ID: admin, Password: admin, and Station: 1.
4. Double-click the VScan icon and wait for the task to complete. Then click Stop.
5. Double-click the PageID icon and wait for the task to complete. Then click Stop.
6. Double-click the Rulerunner icon and wait for the task to complete. Then click Stop.
REVIEWING THE BATCH IN BATCH PILOT
1. In the Taskmaster Client window, double-click the Verify/Fixup icon. Batch Pilot displays the first
rental agreement page. Note that Batch Pilot displays fields with low confidence characters in yellow.
 The page image is not shown in this screen shot. You can toggle the image pane off and on using
the Toggle ImageBar button.
You can see that the Options image snippet is much too small. Additionally, the checkbox options are
labeled Item #1, Item #2, and Item #3 instead of the actual option names. We‟ll fix these issues later.
However, the rest of the fields are OK.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
153
DATA VERIFICATION
2. Click the Next Problem
button. Batch Pilot displays the first optional insurance page.
Again, the image snippets are a bit small and the checkbox options are listed as “blank” and “Item #1.”
Notice also that Batch Pilot displayed the page, even though there are no problems. We‟ll fix these issues
later.
3. Click the Next Problem
on the Car Type field.
button until you reach the rental agreement page with the validation error
4. Click the Next Problem
validation error.
button. Batch Pilot should prevent you from leaving the page due to the
5. Click the drop-down list beside the Car_Type image snippet and choose Other.
6. Press Alt+v to re-run the page validations. The message should indicate that the validations passed.
154
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
7. Click the Next Problem
button until you reach the first air ticket page. Note that Batch Pilot displays
fields with validation errors in red.
8. Click the Next Problem
button and click Yes to override the validation failure.
9. Click the Next Problem
button until you reach the first room receipt page.
Notice again that Batch Pilot displayed the page, even though there are no problems. We‟ll fix this later.
10. Click File > Quit Task and click OK to put the batch on hold. Then click Stop.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
155
DATA VERIFICATION
MODIFYING THE PANELS
1. Click Start > All Programs > Datacap > Batch Pilot > Batch Pilot.
2. Click File > Open Project.
3. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_verify.bpp and click Open.
4. In the bottom section of the Batch Pilot window, expand TravelDocs, and expand Car_Rental.
5. Right-click the Rental_Agreement page and choose View Form.
6. Move the Total_Cost fields down and enlarge the image snippet control for the Options field, as shown
below
7. Click File > Save Form.
8. In the bottom section of the Batch Pilot window, right-click the Optional_Insurance page and choose
View Form.
9. Enlarge each of the image snippet controls as shown below.
10. Click File > Save Form.
11. Click File > Exit.
156
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
CREATING DICTIONARIES FOR THE CHECKBOX OPTIONS
1. On the Datacap Studio Rulemanager tab, in the Document Hierarchy pane, click the Lock DCO
button to lock the document hierarchy for editing.
2. Click the Dictionaries button at the top of the Document Hierarchy pane.
3. Click the Edit dictionary
button and choose Add dictionary.
4. Change the dictionary name from <new_dictionary> to Options.
5. Right-click the new dictionary and choose Add word.
6. Change the name from <new word> to Navigation System and the value from value to Navigation
System.
7. Repeat to add Child Seat and Fuel Service to the dictionary.
8. Click the Edit dictionary
button and choose Add dictionary.
9. Create another dictionary called Checkbox and add one word with name Selected and value Selected.
The completed dictionary should look like the one below.
10. Click the Save button in the popup and then click the Save button in the Document Hierarchy pane.
ATTACHING THE DICTIONARIES TO THE CHECKBOX FIELDS
1. Make sure the document hierarchy is still locked for editing.
2. Expand the Car_Rental > Rental_Agreement page so the fields are visible.
3. Right-click the Options field and choose Manage variables.
4. Click New, type DICT, and press Enter.
5. Enter the value Options. Then click Done.
6. Expand the Car_Rental > Optional_Insurance page so the fields are visible.
7. Right-click the CDW field and choose Manage variables.
8. Click New, type DICT, and press Enter.
9. Enter the value Checkbox. Then click Done.
10. Repeat for the PAI, PEP, and ELP fields.
11. In the Document Hierarchy pane, click the Save button and then click Unlock DCO.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
157
DATA VERIFICATION
REVIEWING THE MODIFIED PANELS
1. In the Taskmaster Client window, double-click the Verify/Fixup icon. Taskmaster displays the Job
Monitor with the batch we placed on hold.
2. Double-click the row ID for the batch that is on hold (“1”) to view the first rental agreement page in
Batch Pilot. The rental options are now clear and the options are labeled properly in the listbox.
3. Click Next Problem
button to view the first optional insurance page. The insurance options are also
now clear and the drop-down options are “blank” and “Selected.”
4. Click File > Quit Task and click OK to put the batch on hold. Then click Stop.
158
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
DETERMINING WHICH PAGES TO DISPLAY TO THE OPERATOR
By default, Batch Pilot displays all pages to the operator, regardless of whether there is a problem on a given
page. In this section, we‟ll configure Batch Pilot to display only pages with Status=1 (indicating a problem).
1. In the Taskmaster Client window, click the Administrator
button.
2. On the Workflow tab, expand Main Job and select Verify. Then click the Setup button.
3. In the Batch Pilot window, click File > Task Settings and then click the Filters tab.
4. Under Type, select Rental_Agreement. Then select Level: PAGE, Property: STATUS, Problem
Value: 1 and click Add. This says if we‟re in Verify and the page has a status of “1” (a problem), then
show the page to an operator.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
159
DATA VERIFICATION
5. Repeat for Optional_Insurance, Air_Ticket, and Room_Receipt. Then click OK.
6. Click File > Quit and click Yes to save the project.
7. Click Apply and then click Done to close the Administrator window.
SUBMITTING THE BATCH
1. Prepare a new batch for verification (see “Preparing a batch for verification” on page 153).
2. In the Taskmaster Client window, double-click the Verify/Fixup icon.
3. If Taskmaster displays the Job Monitor window:

Select the job that‟s on hold.

Press Delete, click Yes, and click Yes to All.

Press F5 to refresh the Job Monitor window and confirm the job is gone.

Close the Job Monitor window.

Double-click the Verify/FixUp icon again to run the pending batch.
4. Use the Next Problem
button to advance through the batch and check all the problem fields:

Make corrections to any low confidence fields as necessary.

Correct the validation failure on the third car rental page.

Leave any other fields with validation failures since the mistakes are on the original images. Click Yes
to override the validation failures.
5. When you reach the end of the batch, click Yes to finish the batch and then click Stop. The batch is now
marked as pending for export.
160
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
USING DOTEDIT FOR VERIFICATION
CREATING DICTIONARIES FOR THE CHECKBOX OPTIONS
If you did not complete the previous section on using Batch Pilot for verification, follow the steps under
“Creating dictionaries for the checkbox options” and “Attaching the dictionaries to the checkbox fields” on
page 157.
PREPARING A BATCH FOR VERIFICATION
1. If Taskmaster Client is not already running:

Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.

Select the TravelDocs application and click OK.

Log in using User ID: admin, Password: admin, and Station: 1.
2. Double-click the VScan icon and wait for the task to complete. Then click Stop.
3. Double-click the PageID icon and wait for the task to complete. Then click Stop.
4. Double-click the Rulerunner icon and wait for the task to complete. Then click Stop. The batch is now
pending verification.
OPENING THE BATCH IN DOTEDIT
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster DotEdit.
2. In the Application field, type TravelDocs.
3. Enter User ID: admin, Password: admin, and Station: 1 and click Login.
4. In the Shortcut field, select Verify/FixUp and click Start:

If there are no older batches on hold, click OK to run the next pending batch.

If there are older batches on hold, click Run Pending! to run the pending batch (the one you just
created in Taskmaster Client).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
161
DATA VERIFICATION
REVIEWING THE BATCH IN DOTEDIT
The default DotEdit field-at-a-time interface is shown above. When you start DotEdit, the verify panel
displays the first problem field (marked with an  in the grid).
Within DotEdit you can navigate directly to the next problem by clicking the Next Problem button, or you
can display any page in the batch by selecting it in the Batch View pane.
You cannot modify the default field-at-a-time interface. Instead, if you want to modify the DotEdit interface
you must create a custom panel for each page. This is beyond the scope of this introduction but is described
in detail in the IBM Datacap Taskmaster Capture Creating Custom DotEdit Panels Guide.
SUBMITTING THE BATCH
1. Review each problem page and:
 Make corrections to any low confidence fields if necessary.
 Correct the “Car Type” validation failure on the third car rental page (select “Other”).
 Leave the other fields with validation failures since the mistakes are on the original images.
2. Click the Submit button at the bottom of the verification panel to advance to the next problem page.
When asked if you want to the override validation failures, click OK.
3. When you reach the end, click OK to finish the batch. The batch is now marked as pending for export.
4. Close the DotEdit window.
162
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
USING TASKMASTER WEB FOR VERIFICATION
SETTING UP THE TASKMASTER WEB SERVER
This section provides an outline of the steps required to set up the Taskmaster Web server using Microsoft
Information Systems (IIS) 7.5 and Windows 7 Professional. For complete setup instructions, including steps
for other operating systems, refer to the IBM Datacap Taskmaster Capture Installation and Configuration Guide.
1. Click Start > All Programs > Datacap > Support > Web Configuration > Taskmaster Web Server
Configuration. The message box will let you know if the required IIS components are installed.
2. If the message indicates you‟re missing required components, open the Windows Control Panel
Programs and Features window, click Turn Windows features on or off, and install the missing
components:

The Management Console is under Internet Information Services > Web Management Tools.

The ASP and ASP.NET components are under Internet Information Services > World Wide
Web Services > Application Development Features.

The Static Content component is under Internet Information Services > World Wide Web
Services > Common HTTP Features.
After installing any required components, run the Taskmaster Web Server Configuration tool again. Then
click OK to continue.
3. In the Taskmaster Web Server Configuration window, click Configure to set up the Taskmaster Web
application in IIS.
4. Start the Internet Information Services (IIS) Manager (Control Panel > Administrative Tools >
Internet Information Services Manager, or type inetmgr).
5. Expand Sites > Default Web Site and confirm that the TaskRun folder shortcut and the tmweb.net
application are visible.
SETTING UP THE TASKMASTER WEB CLIENT
1. Click Start > All Programs > Datacap > Support > Web Configuration > Taskmaster Web Client
Configuration.
2. Click Configure to set up the required security options in Internet Explorer®.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
163
DATA VERIFICATION
CREATING DICTIONARIES FOR THE CHECKBOX OPTIONS
If you didn‟t complete the earlier section on using Batch Pilot for verification, follow the steps under
“Creating dictionaries for the checkbox options” and “Attaching the dictionaries to the checkbox fields” on
page 157.
PREPARING A BATCH FOR VERIFICATION
1. If Taskmaster Client (the “thick” client) is not already running:

Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.

Select the TravelDocs application and click OK.

Log in using User ID: admin, Password: admin, and Station: 1.
2. Double-click the VScan icon and wait for the task to complete. Then click Stop.
3. Double-click the PageID icon and wait for the task to complete. Then click Stop.
4. Double-click the Rulerunner icon and wait for the task to complete. Then click Stop. The batch is now
pending verification.
OPENING THE BATCH IN TASKMASTER WEB
1. Start Internet Explorer (not Internet Explorer 64-bit).
2. Enter the URL for the Taskmaster Web client:
http://localhost/tmweb.net
3. In the Application field, type TravelDocs.
4. Enter User ID: admin, Password: admin, and Station: 2.
 We‟re using station 2 since the Taskmaster “thick” client is logged on as station 1.
5. Click Login.
6. On the Taskmaster Web “Run Shortcut” page, click the Verify/FixUp shortcut.
164
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA VERIFICATION
REVIEWING AND SUBMITTING THE BATCH
The Taskmaster Web client interface is shown above. Taskmaster Web lays out the page with the fields and
image snippets automatically. Modifying the default layout is beyond the scope of this guide.
1. Review each problem page and:
 Make corrections to any low confidence fields if necessary.
 Correct the “Car Type” validation failure on the third car rental page (select “Other”).
 Leave the other fields with validation failures since the mistakes are on the original images.
2. Click the Submit button at the top of the verification panel to advance to the next problem page. When
asked if you want to the override validation failures, click OK.
 If you‟re unable to get past the page with the validation error after clicking OK, click Hold to put
the batch on hold. Then open C:\Datacap\TravelDocs\dco_TravelDocs\rrs_verify.bpp in a
text editor and in the [iCap] section set DPS=0,2. After making the change and saving the file, reopen the batch from the Taskmaster Web Monitor tab by clicking the QID field. For information
about the DPS and other settings, see “Configuring the page and field status settings” on page 326.
3. When you reach the end, click OK to finish the batch. The batch is now marked as pending for export.
For additional information about configuring the web verification client, see “Verification using the Prelayout
web client” on page 324.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
165
Chapter 12
DATA EXPORT
Taskmaster can export data to a text file, an XML file, a database, a document management system, or a
custom business process. The default output format is a text file, but in this chapter we‟ll look briefly at
actions you can use to export data to a database and an XML file. Export to document management systems
is beyond the scope of this guide.
In the last section in this chapter, we‟ll update the TravelDocs application to export the captured data to a
Microsoft Access database and an XML file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
167
DATA EXPORT
EXPORTING DATA
EXPORTING TO A TEXT FILE
The framework generated by the Application Wizard exports all captured data to a text file located in the
application‟s “export” folder. The default Export ruleset includes two rules:

Set Export Params: This rule is attached to the batch‟s “Open” element. It sets the export path and
filename, and writes the header information to the file.

Export Page Fields: This rule is attached to each page and writes all fields values from the current page
to the export file.
The resulting file looks like the one below, where each line represents one page and the fields are separated by
commas.
*****************
Export for batch #20100334.019,12/01/2010,08:47:58
 Header information
,Tues, Dec 7, 2010,Boston (BOS),Fri, Dec 10, 2010,Boston (BOS),Compact,001,$345.70
,0,0,0,1
,Mon, Dec 6, 2010,San Francisco (SFO),Fri, Dec 10, 2010,San Francisco
(SFO),SUV,010,$489.31
,Boston (BOS),Pittsburgh (PIT),17NOV10,Pittsburgh (PIT),Boston
(BOS),21NOV10,313.17,64.56,477.73
,Newark, NJ (EWR),Charlotte, NC (CLT),MON NOV 15, 2010,Charlotte, NC (CLT),Newark, NJ
(EWR),WED NOV 17, 2010,$524.76,$53.23,$577.99
,Dec 21, 2010,Dec 24, 2010,$293.03
,Nov 30, 2010,Dec 2, 2010,$243.07
The “Export” actions library includes actions typically used for exporting captured data to a text file. A few of
the key export actions are outlined in the table below.
168
Library
Action
Description
Export
SetExportPath
Specifies the path to the export file’s location. Typically you’ll reference
the export path in the application’s configuration file using the
@APPPATH(export) smart parameter.
Export
SetFileName
Specifies the name for the export file (do not include the file extension).
Export
SetExtensionName
Specifies the extension for the export file.
Export
ExportAllFields
Writes all field values on the current page to the export file.
Export
ExportFieldValue
Writes the specified field’s value to the export file, for example,
ExportFieldValue(Return_Date).
Export
CloseExportFile
Closes the export file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA EXPORT
EXPORTING TO A DATABASE
Taskmaster can export data to any Microsoft Access, SQL Server, or Oracle database using the actions in the
“ExportDB” library. A few of the commonly-used export actions are outlined in the table below.
Library
Action
Description
ExportDB
ExportOpenConnection
Opens a connection to the specified export database.
ExportDB
SetTableName
Specifies the name of the table to which data is to be exported.
ExportDB
ExportFieldToColumn
Gets the value of the specified field on the current page and adds it to
specified column in the internal data record. You build the record in
memory before committing it to the database using AddRecord.
ExportDB
AddRecord
Inserts the assembled data record into the export table specified by
the previous SetTableName action.
ExportDB
ExportCloseConnection
Closes an open export database connection.
For a complete example, see “Creating the ExportDB ruleset” on page 171.
EXPORTING TO AN XML FILE
The “ExportXML” library includes actions you can use to write data to an XML file. A few of the commonlyused export actions are outlined in the table below.
Library
Action
Description
ExportXML
xml_SetExportPath
Specifies the path to the XML file storage location.
ExportXML
xml_SetFileName
Specifies the name for the XML file (do not include the .xml extension).
ExportXML
xml_NewNode
Creates a new child node under the specified parent node, creating
the parent node if necessary.
ExportXML
xml_SetNodeValue
Sets the value of the specified node.
ExportXML
xml_SaveFile
Commits all unsaved nodes and saves the XML file to disk.
For a complete example, see “Creating the ExportXML ruleset” on page 174.
EXPORTING TO A DOCUMENT MANAGEMENT SYSTEM
Several action libraries are available for exporting captured data to various document management systems.
Export to document management systems is beyond the scope of this guide.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
169
DATA EXPORT
TRAVELDOCS: EXPORTING DATA TO A DATABASE
In this section we‟ll update the TravelDocs application to export data from each rental agreement page to an
export database.
CREATING THE EXPORT DATABASE
The Application Wizard does not create an export database by default. You can create a Taskmastercompatible export database using Microsoft Access, SQL Server, or Oracle. The simplest thing to do is to
copy the Access export database from the 1040EZ sample application and then modify it, as described here.
 If you don‟t have Microsoft Access, copy the file TravelDocsExport.mdb from the sample images
download into the C:\Datacap\TravelDocs folder. This file includes the Rental_Agreement table.
1. Copy the file C:\Datacap\1040ez\1040ezExport.mdb to C:\Datacap\TravelDocs.
2. In C:\Datacap\TravelDocs, rename the file TravelDocsExport.mdb.
3. Open the file TravelDocsExport.mdb in Microsoft Access.
4. Create a new table called Rental_Agreement.
5. Create new field for BatchID and for each of the fields defined for the rental agreement page, as shown
below. Make all fields of type “Text” and then save the table.
CONFIGURING THE EXPORT DATABASE IN THE TASKMASTER APPLICATION MANAGER
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Application Manager.
2. Select the TravelDocs application from the list on the left.
3. Click the Browse […] button beside the Export database field.
4. In the Database type field, select Microsoft Access. Then in the Database field select the file
C:\Datacap\TravelDocs\TravelDocsExport.mdb. Database authentication is not required.
5. Click OK and then close the Taskmaster Application Manager window.
170
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA EXPORT
CREATING THE EXPORTDB RULESET
1. In the Rulesets pane, right-click the TravelDocs node and choose Add Ruleset.
2. Rename the new ruleset from Ruleset1 to ExportDB.
3. Rename the default rule from Rule1 to Export Rental Agreement Data.
4. Rename the default function from Function1 to Export Data.
5. Click the Actions library tab and expand the ExportDB library.
6. Select and add each of the following actions shown in the table below to the Export Data function using
the Add to function
button. Then set the action parameters as shown in the table below.
Action
Parameter
ExportOpenConnection
@APPVAR(*/exportdb:cs)
SetTableName
Rental_Agreement
ExportBatchIDToColumn
BatchID
ExportFieldToColumn
Pickup_Date,Pickup_Date
ExportFieldToColumn
Pickup_Location,Pickup_Location
ExportFieldToColumn
Return_Date,Return_Date
ExportFieldToColumn
Return_Location,Return_Location
ExportFieldToColumn
Car_Type,Car_Type
ExportFieldToColumn
Options,Options
ExportFieldToColumn
Total_Cost,Total_Cost
AddRecord
ExportCloseConnection
7. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The finished ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
171
DATA EXPORT
ADDING THE RULESET TO THE EXPORT TASK PROFILE
1. In the Rulesets pane, select the ExportDB ruleset.
2. Click the Task profiles tab and click the Lock/Unlock task profiles button.
3. Select the Export task profile and click the Add ruleset to profile
profiles pane.
button at the left of the Task
4. Expand the Export task profile and make sure the ExportDB ruleset is listed.
5. In the Task profiles pane, click the Save button and then click the Lock/Unlock task profiles button.
ATTACHING THE EXPORT RENTAL AGREEMENT DATA RULE TO THE RENTAL AGREEMENT PAGE
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so the Rental_Agreement page is visible. Then select the
Rental_Agreement page.
3. In the Rulesets pane, select the Export Rental Agreement Data rule and click the Add to DCO
button.
4. With the Export Rental Agreement Data rule still highlighted, click the Sync DCO view with Ruleset
view
button and make sure the new rule is now included in the rental agreement page‟s Open
element.
5. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button.
172
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA EXPORT
RUNNING A BATCH THROUGH THE WORKFLOW
1. Use the Connection Wizard
to reopen the TravelDocs application.
 This forces Datacap Studio to reload the information from the application configuration (.app) file.
If you don‟t do this, the export database connection string may not be in the cached copy of the
.app file and the new ruleset may fail.
2. Click the Datacap Studio Test tab.
3. In the Rulesets pane, expand the ExportDB ruleset. Then right-click the Export Rental Agreement
Data rule and choose Set breakpoint. Execution will stop when Taskmaster reaches the rule.
4. In the Workflow pane, select the VScan task profile under Main Job.
5. Click the New button to start a new batch.
6. Use the Process rules for target object  button and the Advance button to move the batch through
the VScan, PageID, Rulerunner, Verify, and Export tasks. Execution stops at the breakpoint.
7. Click the Step in
button to single-step into the function and start executing the actions. As each line
executes, make sure there‟s a checkmark beside the action, indicating that the action returned True.
 If ExportOpenConnection fails (as indicated by a ! beside the action), make sure you set up the
export database correctly and added the connection string to the Taskmaster Application Manager.
Then use the Connection Wizard to re-open the TravelDocs application.
8. Click the Process rules for target object  button to resume normal execution. You‟ll need to click
the button again each time you hit the Export Rental Agreement Data rule (for each rental agreement
page). Then click Advance.
9. Open the file C:\Datacap\TravelDocs\TravelDocsExport.mdb and review the exported data in the
Rental_Agreement table.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
173
DATA EXPORT
TRAVELDOCS: EXPORTING DATA TO AN XML FILE
Here we‟ll update the TravelDocs application to export data from each rental agreement page to an XML file.
If you wanted to export data from the other pages as well, you‟d need a separate rule for each page type.
CREATING THE EXPORTXML RULESET
We‟ll need three separate rules:

One rule attached to the batch‟s “Open” element to set the XML export path and file name.

One rule attached to the rental agreement page that writes the data for the current page.

One rule attached to the batch‟s “Close” element to save the XML file.
To create the ExportXML ruleset:
1. In the Rulesets pane, right-click the TravelDocs node and choose Add Ruleset.
2. Rename the new ruleset from Ruleset1 to ExportXML.
3. Rename the default rule from Rule1 to Open XML File.
4. Rename the default function from Function1 to Open XML.
5. Click the Actions library tab and expand the ExportXML library.
6. Select and add each of the following actions shown in the table below to the Open XML function using
the Add to function
button. Then set the action parameters as shown in the table below.
Action
Parameter
xml_SetExportPath
@APPPATH(export)
xml_SetFileName
@BatchID
 @APPPATH(export) is a smart parameter that gets the export path from the application
configuration file. @BatchID is a smart parameter that returns the current batch ID.
7. Right-click the ExportXML ruleset and choose Add Rule.
8. Rename the new rule from Rule1 to Export Rental Agreement XML.
9. Rename the default function from Function1 to Export XML.
10. Select and add each of the following actions shown in the table below to the Export XML function
using the Add to function
button. Then set the action parameters as shown in the table below.
174
Action
Parameter
xml_NewNode
@ID,Rental_Agreements
xml_NewNode
Pickup_Date,@ID
xml_SetNodeValue
Pickup_Date, @P\Pickup_Date
xml_NewNode
Pickup_Location,@ID
xml_SetNodeValue
Pickup_Location, @P\Pickup_Location
xml_NewNode
Return_Date,@ID
xml_SetNodeValue
Return_Date, @P\Return_Date
xml_NewNode
Return_Location,@ID
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA EXPORT
Action
Parameter
xml_SetNodeValue
Return_Location, @P\Return_Location
xml_NewNode
Car_Type,@ID
xml_SetNodeValue
Car_Type, @P\Car_Type
xml_NewNode
Options,@ID
xml_SetNodeValue
Options, @P\Options
xml_NewNode
Total_Cost,@ID
xml_SetNodeValue
Total_Cost, @P\Total_Cost
 @ID gets the ID of the current object. @P\ gets the value of the specified field on the current page.
11. Right-click the ExportXML ruleset and choose Add Rule.
12. Rename the new rule from Rule1 to Close XML File.
13. Rename the default function from Function1 to Close XML.
14. Select and add the action shown below to the Close XML function using the Add to function
button. This action has no parameter.
Action
Parameter
xml_SaveFile
15. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The finished ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
175
DATA EXPORT
ADDING THE RULESET TO THE EXPORT TASK PROFILE
1. In the Rulesets pane, select the ExportXML ruleset.
2. Click the Task profiles tab and click the Lock/Unlock task profiles button.
3. Select the Export task profile and click the Add ruleset to profile
profiles pane.
button at the left of the Task
4. Expand the Export task profile and make sure the ExportXML ruleset is listed.
5. In the Task profiles pane, click the Save button and then click the Lock/Unlock task profiles button.
ATTACHING THE EXPORT XML RULES TO THE DOCUMENT HIERARCHY
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the batch and select the batch‟s Open element.
3. In the Rulesets pane, select the Open XML File rule and click the Add to DCO
button.
4. Select the batch‟s Close element.
5. In the Rulesets pane, select the Close XML File rule and click the Add to DCO
button.
6. In the Document hierarchy pane, expand the Car_Rental document node and select the
Rental_Agreement page.
7. In the Rulesets pane, select the Export Rental Agreement XML rule and click the Add to DCO
button.
8. Select the Open XML File rule, click the Sync DCO view with Ruleset view
sure the rule is now included in the batch‟s Open element.
button, and make
9. Repeat for the Export Rental Agreement XML and Close XML File rules.
10. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button.
176
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
DATA EXPORT
RUNNING A BATCH THROUGH THE WORKFLOW
1. Click the Datacap Studio Test tab.
2. In the Breakpoints pane, click the Remove all breakpoints
button.
3. In the Workflow pane, select the VScan task profile under Main Job.
4. Click the New button to start a new batch.
5. Use the Process rules for target object  button and the Advance button to move the batch through
the entire workflow.
6. Open the file C:\Datacap\TravelDocs\export\<batch_id>.xml and review the exported XML data.
<?xml version='1.0' ?>
<Rental_Agreements>
<TM000001>
<Pickup_Date>Trues, Dec 7, 2010</Pickup_Date>
<Pickup_Location>Boston (BOS)</Pickup_Location>
<Return_Date>Fri, Dec 10, 2010</Return_Date>
<Return_Location>Boston (BOS)</Return_Location>
<Car_Type>Compact</Car_Type>
<Options>Fuel Service</Options>
<Total_Cost>345.70</Total_Cost>
</TM000001>
<TM000003>
<Pickup_Date>Mon, Dec 6, 2010</Pickup_Date>
<Pickup_Location>San Francisco (SFO)</Pickup_Location>
<Return_Date>Fri, Dec 10, 2010</Return_Date>
<Return_Location>San Francisco (SFO)</Return_Location>
<Car_Type>SUV</Car_Type>
<Options>Child Seat</Options>
<Total_Cost>489.31</Total_Cost>
</TM000003>
<TM000004>
<Pickup_Date>Mon, Dec 13, 2010</Pickup_Date>
<Pickup_Location>Newark (EWR)</Pickup_Location>
<Return_Date>Thur, Dec 16, 2010</Return_Date>
<Return_Location>Newark (EWR)</Return_Location>
<Car_Type>Luxury</Car_Type>
<Options>Navigation System Child Seat Fuel Service</Options>
<Total_Cost>387.40</Total_Cost>
</TM000004>
</Rental_Agreements>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
177
Chapter 13
APPLICATION DEBUGGING
In this chapter we‟ll look at two key runtime log files: the Rulerunner Service (RRS) log and the task log. The
RRS log provides detailed information about each action as it executes; the task log documents internal calls
and is used mostly by Datacap support. Since the RRS log is most helpful to application developers, we‟ll
cover this in more detail.
We‟ll also look at Datacap Studio‟s integrated debugging functionality. We looked briefly at this in earlier
chapters, but here we‟ll look in more detail at the debugging features that let you control the execution
environment so you can monitor your application at runtime.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
179
APPLICATION DEBUGGING
TASKMASTER LOG FILES
Taskmaster generates two types of log file during task execution:

Rulerunner Service (RRS) log files – includes detailed information about each action as it executes

Task log files – documents mostly internal calls and is typically most helpful to Datacap support
We‟ll look at each of these in turn. Additionally, RV2 and Rulerunner Quattro can generate their own log
files. We‟ll look at Quattro logging in a later chapter (see “Quattro logging” on page 271).
ENABLING LOGGING FOR BATCH PILOT TASKS
Logging for Batch Pilot tasks is controlled through the Taskmaster Administrator window.
1. In the Taskmaster Client window, click the Administrator
button.
2. On the Workflow tab, select the task you want to configure.
3. With the task selected (VScan in this example), click Setup to display the Batch Pilot Setup window.
4. In the Batch Pilot window, click File > Task Settings and then click the Log tab.
The Log tab is where you configure logging options
for the selected task. These options are described
more fully on the next page.
Setting the Severity to anything 3 or higher (on the
scale of 0-9) enables the RRS log.
Setting the Severity to anything 1 or higher (on the
scale of 0-9) enables the task log. The amount of
information depends on the severity level (see next
page).
The fields in the Log File group apply mostly to the
task log (see next page).
180
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
APPLICATION DEBUGGING
LOGGING OPTIONS
The Log tab controls both the Rulerunner Service (RRS) log and the task log. However, it affects only tasks
that you run from Taskmaster Client. When running tasks from Datacap Studio, the RRS log is always
enabled and the task log is always disabled.
Execute task from Taskmaster Client
Execute task from Datacap Studio
RRS log
Task log
RRS log
Task log
Logging disabled (Severity = None)




Logging enabled (Severity = 1)




Logging enabled (Severity = 2)




Logging enabled (Severity = 3-9)




The default severity levels for the default tasks are as follows:
Task
VScan
PageID
Rulerunner
Verify
Export
Default severity
0 (None)
9 (All)
9 (All)
9 (All)
9 (All)
By default you‟ll get an RRS log and a task log for each task except VScan when running a task from
Taskmaster Client. When developing your application, you‟ll find the RRS log to be more helpful since it
provides detailed descriptions of actions as they are executed (see “Rulerunner Service (RRS) log files” on
page 183). The task log provides information that is typically most helpful to IBM technical support since it
documents mostly internal calls, although it can be useful for debugging certain issues such as scanning errors.
The RRS log is either enabled or disabled – in other words, the severity level does not affect how much
information is written to the log file. The RRS log file is always named <task_name>_rrs.log (for example,
vscan_rrs.log) and is always written to the current batch folder.
For the task log, the higher the severity level, the more information is written to the log. Severity levels 1-5
write very little information; severity levels 6-7 write more; and severity levels 8-9 write the most information.
The following options on the Log tab apply only to the task log.
Name
Name for the task log file. The RRS log file is always <task_name>_rrs.log.
Directory
By default the task log is saved in the current batch folder but you can specify a fixed
location. The RRS log is always written to the current batch folder
Overwrite Old File
Overwrites the existing task log with new data each time the task runs, rather than
letting the file grow. Applies mostly if writing to a log in a fixed location since the file
can quickly grow large, although VScan runs repeatedly until all images are imported.
Flush Buffer
Saves the contents of the log file each time a new line is added. Disable unless the log
you’re trying to read is cut off as it will decrease performance.
The other options enable the fields shown in the sample task log entry below.
Date
Time
Application
ID
Message
Number
Severity
Message (always written)






01/06/2011
10:29:54(57.736)
TASK
1232
(1)
Path.ini was not found
Only the Date (writes the current time) and Message Number fields apply to the RRS log.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
181
APPLICATION DEBUGGING
ENABLING LOGGING FOR TASKMASTER WEB TASKS
The logging settings in the Taskmaster Administrator window don‟t affect tasks executed from Taskmaster
Web. If you want to enable RRS logging for web clients, you must configure this separately. RRS logging
from web clients is disabled by default.
 RRS logging is only useful for tasks that run rules, so if your web client isn‟t associated with a task
profile you won‟t get an RRS log file.
To enable logging from web clients, edit the [RRC] section of the task‟s .bpp (or .icp) file. The setting that
controls RRS logging is the ServiceLog setting:
ServiceLog setting
Result
0 or 1
No RRS log file
2
RRS log file with action logging but no action parameters displayed
3 or 4
RRS log file with action logging and action parameters displayed
5 or higher
RRS log file with action logging and complete DCO navigation
In most situations, ServiceLog=3 provides enough information to help you debug rule-related issues.
In the example below, for the Verify task, the service log level is set to 3 (ServiceLog=3). Since the task is
associated with a task profile (TProfile=Verify), the web client will generate an RRS log file.
182
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
APPLICATION DEBUGGING
RULERUNNER SERVICE (RRS) LOG FILES
Rulerunner is the Taskmaster component responsible for executing rules and actions. As it executes each
action, Rulerunner writes detailed logging information to a Rulerunner Service (RRS) log file (<task>_rrs.log).
Rulerunner generates an RRS log file whenever you run a task from Datacap Studio tab. When running a task
from Taskmaster client, you must set the severity level to 3 or higher if you want to generate an RRS log (see
“Enabling logging for Batch Pilot tasks” on page 180).
Each task profile generates its own Rulerunner Service log file. If you look in the most recent “batches” for
the TravelDocs application, you‟ll see a log file for each of the task profiles in the “Main Job” workflow.










Each log file contains detailed descriptions of the actions executed by the task profile and is useful for
application troubleshooting.
EXAMPLE 1
Here is the vscan_rrs.log entry showing execution of the SetSourceDirectory action in the VScan ruleset:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
action SetSourceDirectory(bool=false,bool=true,str="@APPPATH(vscanimagedir)")
1 Smart Parameter element found
Parsing Smart Parameter element {0} value: "@APPPATH(vscanimagedir)"
@APPPATH key root value: 'vscanimagedir'
@APPPATH looking for workflow key: '*/dco_TravelDocs/vscanimagedir'
@APPPATH workflow key found: 'C:\Datacap\TravelDocs\images'
Smart Parameter return value: 'C:\Datacap\TravelDocs\images'
looking for:C:\Datacap\TravelDocs\images
Action changes: Directory with source images: C:\Datacap\TravelDocs\images
result 0[x0] = true
action returned true
execute statement On Action True
executing code:
Call OnActionEnd()
/execute statement On Action True
/action
execute statement On Action Start
executing code:
Call OnActionStart()
/execute statement On Action Start
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
183
APPLICATION DEBUGGING
By looking through the Rulerunner Service log file you can see precisely how Rulerunner interprets and
executes each action. In the SetSourceDirectory example above, Rulerunner:

Identifies the @APPPATH(vscanimagedir) parameter as a smart parameter [line 2]

Identifies the key value as “vscanimagedir” [ line 4)]

Looks up the specified key value in the application configuration [line 5]

Retrieves the value C:\Datacap\TravelDocs\images [line 6]

Sets the image source directory to the specified location [line 9]
EXAMPLE 2
In the previous example, the action executed successfully and returned true [line 11]. In this next example,
we‟ll introduce an invalid key name in the action parameter:
SetSourceDir("@APPPATH(imagedir)")
 “imagedir” is not a valid key
This time the batch aborts. If you run the task from the Datacap Studio you‟ll see a message like this:
In this case we can look at the end of the log file to determine the cause of the error [line 6].
[1]
[2]
[3]
[4]
[5]
action SetSourceDirectory (bool=false,bool=false,str="@APPPATH(imagedir)")
1 Smart Parameter element found
Parsing Smart Parameter element {0} value: "@APPPATH(imagedir)"
@APPPATH key root value: 'imagedir'
@APPPATH looking for workflow key: '*/dco_TravelDocs/imagedir'
[6]
@APPPATH workflow key not found.
 Key not found
[7] @APPPATH looking for appname key: '*/dco_TravelDocs/imagedir'
[8]
@APPPATH appname key not found.
[9] @APPPATH looking for general key: '*/imagedir'
[10]
@APPPATH general key not found.
[11]Smart Parameter return value: ''
[12] looking for:@APPPATH(imagedir)
[13] Error: Folder '@APPPATH(imagedir)' does not exist
 Folder does not exist
.
.
[14]Error (385875969=hex:17000001). In CIMainAlgorithm::execute4DCO: Aborting: Action
[SetSourceDirectory] requested abort [api source:]
[15]EXCEPTION: code="385875969" msg="Aborting: Action [SetSourceDirectory] requested
abort" loc="CIMainAlgorithm::execute4DCO" API=""
[16]execute statement On Process End
[17]
executing code:
[18]
Quit()
184
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
APPLICATION DEBUGGING
TASK LOG FILES
 Taskmaster does not generate task logs when you execute tasks from Datacap Studio.
When running a task from Taskmaster Client, you must set the severity level to 1 or higher (on the scale of 09) if you want to generate a task log (see “Enabling logging” on page 180). Logging is enabled by default for
all tasks except VScan. The severity level and the options on the Log tab determine how much information is
written to the log file (see “Logging options” on page 181).
The task log file is given the name specified on the Log tab in the task‟s “Settings” dialog box (see “Enabling
logging for Batch Pilot tasks” on page 180) and is saved in the batch folder unless you specify otherwise.
The task log provides information that is typically most helpful to IBM technical support since it documents
mostly internal calls. An example is shown below.
LogFile opened at 10:44:09 01/06/11 for <>
01/06/2011 Batch Directory: C:\Datacap\TravelDocs\batches\20110006.004
01/06/2011 Batch ID: 20110006.004
01/06/2011 Station ID: 1
01/06/2011 Operator: admin
01/06/2011 Job Name: Main Job
01/06/2011 Module ID : rrsVscan
01/06/2011 Task ID: VScan
01/06/2011 Page File:
01/06/2011
C:\Datacap\BPilot\bpilot.exe, Version: 8.00.7
::()
01/06/2011 Task Started. Task VScan, Module rrsVscan, Version 8.00.7, Batch
20110006.004, station 1, operator admin.
01/06/2011 Path.ini file was not found
01/06/2011
CVerTask::ReadDataPriv()
01/06/2011
CVerTask::ReadProblems()
01/06/2011
CVerTask::ReadProblemsFor()
01/06/2011
CVerTask::ReadProblemsFor()
01/06/2011
CVerTask::ReadProblemsFor()
01/06/2011 CVerTask::TaskModule
01/06/2011
CVerTask::TaskModule()
01/06/2011
TaskModule::C:\Datacap\TravelDocs\dco_TravelDocs\RRS_VScan.bpp()
01/06/2011
CVerTask::GetSpecialForm()
01/06/2011
CVerTask::GetSpecialForm()
01/06/2011 File total read : 48
01/06/2011 Set Batch Directory to C:\Datacap\TravelDocs\batches\20110006.004
01/06/2011 Page File set to rrsvscan.xml
01/06/2011
CPlaneDoc::StopScript()
01/06/2011
CModuleEdit::OnScriptStopping()
01/06/2011
CMainFrame::AdjustMode()
01/06/2011
CMainFrame::ResetMenu()
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
185
APPLICATION DEBUGGING
DEBUGGING YOUR APPLICATION FROM THE DATACAP STUDIO TEST TAB
The RRS log is an excellent way to review exactly what happened if your application aborts or fails to
generate the expected output. You may, however, want to monitor the application during execution to
determine if rules are executing as you expect. You can do this by running the application from the Datacap
Studio Test tab using the debugging features.
We‟ve done most of our task execution from the Test tab throughout this guide and we looked quickly at
single-stepping and breakpoints in earlier chapters:

In the chapter on “Rule Execution,” we stepped through the PageID task profile to see how rules execute
based on the objects they‟re bound to in the document hierarchy.

In the chapter on “Data Export,” we set a breakpoint to pause execution in a rule that connects to a
database to we could check for successful execution.
In this section we‟ll look in more detail at the execution and debugging features available on the Test tab.
USING BREAKPOINTS
A breakpoint halts task execution at a predetermined ruleset, rule or action, or when a task starts processing a
specific document, page, or field.
BREAKPOINT TYPES
There are two types of breakpoint:

“Breakpoints”: This halts execution whenever the Rulerunner execution
manager encounters the specified element, regardless of context.

“Full breakpoints”: This halts execution when the Rulerunner execution
manager encounters the specified element within the same context.
To see the difference more clearly, let‟s look at an example. The TravelDocs application includes two calls to
ExportCloseConnection() – one in the “Export Rental Agreement Data” rule and one in the “Export
Other Close Database” rule.

Two instances of
the same action

186

If you set a “breakpoint” on the first instance of ExportCloseConnection, execution stops
whenever ExportCloseConnection is called from anywhere in the application.

If you set a “full breakpoint”
on the first instance of ExportCloseConnection, execution stops
only when ExportCloseConnection is called from the Export Data function in the Export Rental
Agreement Data rule in the ExportDB ruleset.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
APPLICATION DEBUGGING
You can see the difference in the way the breakpoints are defined by looking in the Breakpoints pane. This
pane displays all of the breakpoints that are currently defined.
“Breakpoint”
“Full Breakpoint”
The image below shows the difference between a “breakpoint” and a “full breakpoint” when the breakpoint
element is a document, page, or field.
“Breakpoint”
“Full Breakpoint”
SETTING BREAKPOINTS
To set a breakpoint on a ruleset, rule, function, or action:

In the Rulesets pane on the Datacap Studio Test tab, right-click the item and choose Set breakpoint or
Set full breakpoint. Note that for rulesets and rules you can only set “breakpoints.”
To set a breakpoint on a document, page, or field:

In the Runtime batch hierarchy pane on the Datacap Studio Test tab, right-click the item and choose
Set breakpoint or Set full breakpoint. Note that the document, page, or field must already exist in the
runtime hierarchy before you can set a breakpoint on it.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
187
APPLICATION DEBUGGING
DISABLING AND CLEARING BREAKPOINTS
The Breakpoints pane displays all of the breakpoints that are currently defined.
The checkbox to the left of each breakpoint indicates whether the breakpoint is enabled
default breakpoints are enabled when you add them.
or disabled
. By
You can selectively enable or disable individual breakpoints using the checkboxes, or you can use the buttons
on the left of the Breakpoints pane to enable, disable, or remove breakpoints.
Enable all breakpoints
Enables all the breakpoints displayed in the Breakpoints pane.
Disable all breakpoints
Disables all the breakpoints displayed in the Breakpoints pane.
Remove selected breakpoints
Removes the highlighted breakpoints. To select multiple breakpoints,
hold down the Ctrl key.
Remove all breakpoints
Removes all breakpoints from the Breakpoints pane.
SETTING GENERIC BREAKPOINTS
The Test tab also lets you halt execution if any rule or action fails. Two additional buttons at the left of the
Breakpoints pane let you do this.
188
Stop on a failed action
Select this button to add a generic breakpoint that halts execution
whenever an action fails.
Stop on a failed rule
Select this button to add a generic breakpoint that halts execution
whenever a rule fails.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
APPLICATION DEBUGGING
SINGLE-STEPPING THROUGH YOUR CODE
Single-stepping is a useful way to determine whether the functions and actions within a rule are executing as
intended. As you step through each line, you can see which actions returned True () and which returned
False (!).
 Action returned True
If an action returns false, you can look in the batch log to see why (see “Examining log files from the Test
tab” on page 190).
 Reminder:

If all actions within a function return True, Taskmaster skips any remaining functions in the current rule.

If an action returns False, Taskmaster skips any remaining actions within that function and executes the
next function (if there is one).
The Test tab provides the following buttons for stepping through code:
Step in
Steps into the next line of code. If the next line calls a rule or function, Step in opens
the rule or function and halts inside it. If the next line is an action, Step in opens the
action and you must click it again to close the action.
For example, if execution is halted at a function, use this button to step into a
function and execute each of its actions in turn.
Step/Step over
Executes the next line of code as well as any lower level functions and actions, and
then halts. If the next line is an action, Step over works like Step in and opens the
action.
For example, if execution is halted at a function, use this button to execute the
function, including all of its actions.
Step out
Steps through the next line of code. If the next line is a rule or function, Step out
works like Step over and executes any lower level functions and actions. If the next
line is an action, Step out executes and closes the action.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
189
APPLICATION DEBUGGING
EXAMINING LOG FILES FROM THE TEST TAB
The Output tab in the center pane lets you view output that‟s written to the following log files:
Studio log
This is the Datacap Studio exception log. It doesn’t contain anything useful for application
developers.
Batch log
This contains the same information as the Rulerunner Service (RRS) log discussed earlier. The
major benefit of viewing the log on the Output tab is that you can view the message stream
whenever you pause execution.
Interbatch log
To use the batch log to view the Rulerunner message stream:
1. Click the Output tab in the center pane of the Test tab.
2. If Batch log is not already selected, click the down-arrow and select it from the list of available logs.
The Output pane refreshes automatically whenever you stop at a breakpoint or single-step through a line of
code, or when the current task profile completes, although you‟ll need to scroll to the bottom each time to see
the latest messages. In the example below, the “Stop on a failed action” option is selected
(a[@res="false"]) and you can see the action that returned False (PopulateZNLineItemField).
 Action returned False
190
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 14
HANDLING LINE ITEM GRIDS
The techniques we‟ve looked so far have relied upon data being at predictable locations on the page. In this
section we‟ll look at line item grids, which are by nature unpredictable. When we receive something like an
invoice, we don‟t know how many items may be on it – there could be just one, or there could be a hundred,
possibly spanning multiple pages.
Taskmaster includes actions to handle line items grids. For these to work, you define the region on the page
that may contain line items and define the structure of one line item. Taskmaster can then scan the region and
locate all of the individual line items.
At the end of this chapter we‟ll update the TravelDocs application to demonstrate various techniques relating
to line item grids, including recognition, validation, verification, and export.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
191
HANDLING LINE ITEM GRIDS
DEFINING THE DOCUMENT HIERARCHY FOR LINE ITEM GRIDS
A line item grid is a structure of repeating items, each of which typically contains several fields. To set up the
document hierarchy, you need to:

Define the region of the page that may include line items

Define the structure of one line item in terms of its fields
Page
Item
Description
Page
Grid region
Cost
Grid_region
4352
Widget
$9.95
7845
Widget
$4.95
9122
Widget
$8.25
Description
1734
Widget
$7.50
Cost
Field
Field
Field
Line_item
Item
Line item
Note that you define only a single line item in the document hierarchy. At runtime, Taskmaster expands the
runtime hierarchy as needed to accommodate however many line items it finds.
<P id="TM000001">
 Page
<F id="Grid_region">
<F id="Line_item0">
 First line item
<F id="Item"> etc. </F>
 Item field
<F id="Description"> etc. </F>
 Description field
<F id="Cost"> etc. </F>
</F>
<F id="Line_item1">
 Cost field
 Second line item
<F id="Item"> etc. </F>
 Item field
<F id="Description"> etc. </F>
 Description field
<F id="Cost"> etc. </F>
</F>
<F id="Line_item2">
 Cost field
 Third line item
<F id="Item"> etc. </F>
 Item field
<F id="Description"> etc. </F>
 Description field
<F id="Cost"> etc. </F>
</F>
etc.
192
 Grid region
 Cost field
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
CREATING RULES TO RECOGNIZE LINE ITEMS
Although iterating through an undefined number of lines items sounds complicated, Taskmaster includes
actions that make it fairly straightforward.
Library
Action
Description
Zones
ScanDetails
Searches a line item grid object looking for line items. Assign this action
to the grid region in the document hierarchy.
Zones
ScanLineItem
Searches a line item object looking for fields. Assign this action to each
line item in the document hierarchy.
Zones
PopulateZNLineItemField
Populates the page data file with the recognized value in the zone for
the current line item child field. Assign this action to each line item child
field in the document hierarchy.
These three actions automate the process of reading line item grids. The key is to make sure the actions
operate at the correct level in the document hierarchy by assigning rules as shown in the example below.
Open
Grid_region
Open
(global)
Recognize : Recognize Line Item Grid
Line_item
Open
(global)
Recognize : Recognize Line Item
Recognize
Recognize Line Item Grid
Function1
ScanDetails()
Item
Open
(global)
Recognize : Recognize Line Item Field
Recognize Line Item
Function1
ScanLineItem()
Recognize Line Item Field
Description
Open
(global)
Recognize : Recognize Line Item Field
Function1
PopulateZNLineItemField()
Close
Cost
Open
(global)
Recognize : Recognize Line Item Field
Close
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
193
HANDLING LINE ITEM GRIDS
As Taskmaster executes each rule, first on the grid and then on each line item and field in the grid, it builds a
data structure in memory. The data gets written to the page data file upon completion of the current task.
ScanDetails()
ScanLineItem()
PopulateZNLineItemField()
Runs once only (at the grid level)
Runs once for each line item
Runs once for each field
Item
Description
Cost
Item
Description
Cost
Item
Description
Cost
1176
Widget
$6.95
1176
Widget
$6.95
1176
Widget
$6.95
9122
Widget
$8.25
9122
Widget
$8.25
9122
Widget
$8.25
Resulting data structure (in memory)

Grid_region
Line_item0
Line_item1


Grid_region
Grid_region
Line_item0
Line_item0
Item
Item = 1176
Description
Description = Widget
Cost
Cost = $6.95
Line_item1
Line_item1
Item
Item = 9122
Description
Description = Widget
Cost
Cost = $8.25
In this way the three actions can read all the items in a grid of arbitrary length.
194
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
USING TEXT MATCHING TO LOCATE FIELDS
Frequently, a line item grid will include a total at the bottom. There may be other fields, like sales tax and
shipping costs. You don‟t know ahead of time how many line items will be in the grid, so you don‟t know
where the other fields will be located.
Item
Description
Cost
Item
Description
Cost
4352
Widget
$9.95
1176
Widget
$6.95
7845
Widget
$4.95
9122
Widget
$8.25
9122
Widget
$8.25
1734
Widget
$7.50
Total
Total
$15.20
$30.65
Since the location of the other fields is unpredictable, we can‟t use positional information to read these fields.
Instead, we can use text matching to locate an adjacent label and then read the text beside the label.
Taskmaster provides many actions for locating text on a page and we‟ll look at them in more detail later in the
chapter on text matching. Typically when working with line item grids, you‟re seeking the last instance of a
word or phrase like “Total” or “Sales Tax.” The “Locate” actions library has several useful actions, including
those outlined below.
Library
Action
Description
Locate
FindLastRegEx
Locates a the last occurrence of a word or phrase on the current page.
Locate
FindLastKeyList
Locates the last occurrence of a word or phrase that’s contained in the specified
keyword file. A keyword file is a text file with a .key extension that contains a list
of similar words and phrases, for example:
Sales tax
Tax
Locate
GoRightWord
Moves ‘n’ words to the right of the location of a previously found word.
Locate
UpdateField
Updates the current field in the page data file with the value of the located word.
For detailed information on these and other actions in the “Locate” library, select the action on the Actions
Library tab and click the Display information
button.
The example below locates the last instance of the word “Total” on the current page, moves one word to the
right, makes sure the value is a currency value, and then updates the current field in the page data file. The
rule must be attached to the “Total” field in the document hierarchy.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
195
HANDLING LINE ITEM GRIDS
REMOVING NON-LINE ITEMS FROM THE PAGE DATA FILE
Since the region defined for the line item grid may include fields that are not line items (for example, the
“Total” field in the previous example), Taskmaster may create line items for items that are not actually line
items. In the example below, although there are only two line items on the invoice, Taskmaster created a third
line item for the “Total” field.
<P id="TM000001">
<F id="Grid_region">
<F id="Line_item0">
 First line item
<F id="Item"> etc.</F>
 Has the value “1176”
<F id="Description"> etc.</F>  Has the value “Widget”
<F id="Cost"> etc.</F>
 Has the value “6.95”
Item
Description
Cost
</F>
1176
Widget
$6.95
9122
Widget
$8.25
<F id="Line_item1">
<F id="Item"> etc.</F>
Total
 Second line item
 Has the value “9122”
<F id="Description"> etc.</F>  Has the value “Widget”
$15.20
<F id="Cost"> etc.</F>
</F>
<F id="Line_item2">
<F id="Item"> etc.</F>
 Has the value “8.25”
 Third line item (not a line item)
 No data
<F id="Description"> etc.</F>  Has the value “Total”
<F id="Cost"> etc.</F>
</F>
etc.
 Has the value “$15.20”
If the page you‟re processing has non-line item fields within the grid region, you may need to identify those
non-line items and remove them. Taskmaster includes an action to do this for you.
Library
Action
Description
Validations
CheckSubFields
Confirms values exist in child fields of specified parent field. Deletes the parent
field if any of the specified child fields have no values.
In the example above, you‟d create a rule that uses the CheckSubFields action to determine if the “Item,”
“Description,” and “Cost” fields all have values. If they don‟t, the action deletes the line item. The rule would
look like this:
It‟s important to attach the rule to the line item grid object in the document hierarchy and to make sure it
runs after recognition is complete. To do this, you can do either one of the following:

Include the rule in the “Recognize” ruleset and attach it to the line item grid object‟s “Close” element.

Include the rule within a separate task profile that executes after recognition but before validation (for
example, the “Clean” ruleset) and attach it to the line item grid object‟s “Open” element.
Using either method, in the example above Taskmaster recognizes that the “Item” field in the third line item
has no value and therefore deletes it, leaving only the two real line items.
196
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
EXPORTING DATA FROM A LINE ITEM GRID
You can export line item grid data using the export actions we used in an earlier section of this guide.
However, since the data exists at different levels in the runtime hierarchy, you‟ll need different rules for each
level. For example, to export the data from an invoice, you typically need:

A rule to set up the export file or open the database

A rule to export the header information

A rule to export each of the line items

A rule to export any trailing items (for example, the invoice total field)

A rule to close the file or database
An example of exporting data to a database is provided later in this chapter (see “Exporting to a database” on
page 219). We‟ll demonstrate how to export to an XML file in the following chapter (see “Exporting to an
XML file” on page 236), since this requires the use of “smart parameters” that we haven‟t covered yet.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
197
HANDLING LINE ITEM GRIDS
TRAVELDOCS: ADDING NEW PAGES CONTAINING LINE ITEM GRIDS
In this section, we‟ll update the TravelDocs application to process the “Meals” and “Other_Charges” pages
of the Hotel document type. These pages both include line item grids of undefined length.
UPDATING THE DOCUMENT HIERARCHY
When we set up the document hierarchy earlier, we skipped the “Meals” and “Other_Charges” pages. The
business requirements specify the following rules for the structure of the new page types:
Number
Required?
Order
Meals
Any number per document
No
Any position in the document
Other_Charges
Any number per document
No
Any position in the document
Hotel
Within the document hierarchy, the following variables define the structure of the pages within the document:
Max
Min
Order
Meals
0
0
0
Other_Charges
0
0
0
Hotel
ADDING PAGES TO THE DOCUMENT HIERARCHY
1. In the Document Hierarchy pane, click the Lock DCO for editing button to lock the document
hierarchy for editing.
2. Expand the tree so you can see the document types.
3. Right click on the Hotel document node and choose Add multiple > Pages. Then type 2 in the box
and press Enter.
4. Rename the new pages from Page1 and Page2 to Meals and Other_Charges. The document hierarchy
should look like the one below.
5. Click the Save button.
198
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
6. Right click on the Meals page node and choose Manage variables.
7. Make sure the Max, Min, and Order values are as specified in the table above (the Meals page is 0, 0, 0).
Then click Done.
8. Right click on the Other_Charges page node and choose Manage variables. Then enter the Max, Min,
and Order values as specified in the table above (the Other_Charges page is also 0, 0, 0) and click Done.
9. Click the Save button
CREATING DATA FIELDS
The business requirements specification defines the following fields for each new page type:
Meals
Other Charges
Meals_Grid
Other_Charges_Grid
Meals_Line_Item
Other_Charges_Line_Item
Date
Date
Description
Category
Cost
Quantity
Meals_Total
Unit_Cost
Total
Other_Charges_Total
1. Make sure the document hierarchy is still locked for editing.
2. Right click on the Meals page and choose Add multiple > Fields. Then type 2 in the box and press
Enter.
3. Rename the new fields Meals_Grid and Meals_Total.
4. Right click on the Meals_Grid field and choose Add > Field.
5. Rename the new field Meals_Line_Item.
6. Right click on the Meals_Line_Item field and choose Add multiple > Fields. Then type 3 in the box
and press Enter.
7. Rename the new fields Date, Description, and Cost.
8. Repeat to add the fields and sub-fields for the Other_Charges page, as shown above.
 When you add the Date field, click Yes to inherit all rules and properties.
9. Click the Save button. The finished hierarchy should look like the one shown on the next page.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
199
HANDLING LINE ITEM GRIDS
DOCUMENT HIERARCHY FOR THE NEW HOTEL PAGES
ATTACHING THE EXISTING PAGE RULES TO THE NEW PAGES
Here, we‟ll add the same rules we used on the other pages in the TravelDocs application to the new pages.
1. Make sure the document hierarchy is still locked for editing.
2. Expand the document hierarchy so the Meals and Other_Charges pages are visible and then select the
Meals page.
3. In the Rulesets pane, expand the CreateDocs ruleset, select the Create Fields rule, and click the Add to
DCO
button. Taskmaster adds the rule to the Meals page‟s “Open” element.
4. Select the Other_Charges page and click the Add to DCO
Other_Charges page‟s “Open” element.
button. Taskmaster adds the rule to the
5. Repeat to add the Recognize: Recognize Page, Validate: Validate Page, Routing: Routing Rule 1,
and Export: Export Page Fields rules to each page. Each page should now have an Open element like
the one below.
6. Click the Save button and then click the Unlock DCO button.
200
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
CREATING THE PAGE FINGERPRINTS
The next step is to add fingerprints for the new page types to the fingerprint library.
1. Click the Datacap Studio Zones tab.
2. In the Fingerprints pane, right-click the Hotel class and choose Add fingerprint.
3. Browse to the folder where the TravelDocs fingerprint images are located.
4. Select Hotel4.tif, and click Open. When asked if you want to enhance the image, click Yes.
5. In the Image Processing window, click the Run image processing  button to apply the application‟s
image processing properties. Make sure the lines disappear from the processed page.
6. Click the Save button, choose Save image, and click OK. Then click  to close the Image Processing
window.
7. With the new fingerprint selected, click the Type field and choose Meals.
8. Repeat to add Hotel5.tif and enhance the page image in the same way, but set the type to
Other_Charges.
DEFINING THE RECOGNITION ZONES
Next, we need to define the recognition zones for each of the new page types.
DEFINING ZONES ON THE “MEALS” PAGE
1. In the Fingerprints pane, select the Meals page.
2. In the Image View pane, click the Zoom button to enlarge the page so you can see the fields clearly.
3. In the Document hierarchy pane, click the Lock DCO for editing button.
4. In the Document hierarchy pane, expand the Meals page so you can see all of the fields and subfields.
5. Select the Meals_Total field and draw a bounding box around the total cost beneath the line item grid.
 If you create the Meals_Details zone first, you won‟t be able to draw the Meals_Total zone inside it.
That‟s why we‟re doing the Meals_Total field first.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
201
HANDLING LINE ITEM GRIDS
6. Select the Meals_Grid field. Then draw a bounding box around the grid items but extend the bounding
box to the bottom of the page, since we don‟t know how many line items there may be on the actual page
images.
7. Select the Meals_Line_Item field and draw a bounding box around the first line item.
8. Select the Meals_Line_Item > Date field and draw a box around the date in the first line item.
9. Repeat to create zones for the Description and Cost fields.
10. In the Document hierarchy pane, click the Save button.
202
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
DEFINING ZONES ON THE “OTHER_CHARGES” PAGE
1. In the Fingerprints pane, select the Other_Charges page.
2. Make sure the document hierarchy is still locked for editing. Then expand the Other_Charges page
completely so you can see all of the fields and subfields.
3. Select the Other_Charges_Total field and draw a bounding box around the total cost beneath the line
item grid.
4. Select the Other_Charges_Grid field and draw a bounding box around the grid items, extending the
bounding box to the bottom of the page as you did for the Meals page.
5. Select the Other_Charges_Line_Item field and draw a bounding box around the first line item.
6. Select each of the sub-fields in turn and draw a bounding box around each field in the first line item.
7. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
203
HANDLING LINE ITEM GRIDS
TRAVELDOCS: RECOGNIZING LINE ITEM GRID DATA
Recognizing the data with a line item grid requires several rules, with each rule attached to a different object
in the document hierarchy. We‟ll need the following:

A rule attached to the line item grid object to scan all line items

A rule attached to the line item object to scan each line item

A rule attached to each field within the line item to read each field

A rule attached to the grid total field to locate and read the total cost
CREATING THE RECOGNITION RULES FOR THE LINE ITEMS
1. Click the Datacap Studio Rulemanager tab.
2. In the Rulesets pane, select the Recognize ruleset and click the Lock/Unlock ruleset button.
3. Right-click the Recognize ruleset and choose Add Rule. Rename the new rule Recognize Line Item
Grid.
4. Right-click the Recognize ruleset and choose Add Rule. Rename the new rule Recognize Line Item.
5. Right-click the Recognize ruleset and choose Add Rule. Rename the new rule Recognize Line Item
Field.
6. Under the Recognize Line Item Grid rule, select Function 1. Then click the Actions library tab,
expand the Zones library, select the ScanDetails action, and click the Add to function
button.
7. Under the Recognize Line Item rule, select Function 1. Then select the ScanLineItem action and
click the Add to function
button.
8. Under the Recognize Line Item Field rule, select Function 1. Then select the
PopulateZNLineItemField action and click the Add to function
button.
9. Click the Save button. The Recognize ruleset should look like the one shown below.
204
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
CREATING THE RECOGNITION RULE FOR THE GRID TOTAL
1. Right-click the Recognize ruleset and choose Add Rule. Rename the new rule Recognize Grid Total
Field.
2. Under the Recognize Line Item Grid rule, select Function 1.
3. On the Actions library tab, expand the Locate library, select the FindLastRegEx action, and click the
Add to function
button.
4. Also in the Locate library, select the GoRightWord action and click the Add to function
5. Also in the Locate library, select the UpdateField action and click the Add to function
button.
button.
6. In the Rulesets pane, select the FindLastRegEx action. Then in the Properties pane, set strParam to
Total.
7. In the Rulesets pane, select the GoRightWord action. Then in the Properties pane, set strParam to 1.
8. Click the Save button and then click the Lock/Unlock ruleset button and choose Publish ruleset. The
rule should look like the one shown below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
205
HANDLING LINE ITEM GRIDS
ATTACHING THE RULES TO THE DOCUMENT HIERARCHY
In this section, we‟ll do the following:

Attach the Recognize Line Item Grid rule to the Meals_Grid and Other_Charges_Grid fields.

Attach the Recognize Line Item rule to the Meals_Line_Item and Other_Charges_Line_Item field.

Attach the Recognize Line Item Field rule to each field.

Attach the Recognize Grid Total Field rule to the Meals_Total and Other_Charges_Total fields.
To attach the rules to the document hierarchy:
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so you can see all of the fields and sub-fields on the Meals and
Other_Charges pages.
3. Select the Meals_Grid field. Then in the Rulesets pane, select the Recognize Line Item Grid rule and
click the Add to DCO
button. Taskmaster adds the rule to the field‟s “Open” element.
4. Select the Other_Charges_Grid field and click the Add to DCO
button.
5. Select the Meals_Line_Item field. Then in the Rulesets pane, select the Recognize Line Item rule and
click the Add to DCO
button.
6. Select the Other_Charges_Line_Item field and click the Add to DCO
button.
7. Select the Date field under the Meals_Line_Item. Then in the Rulesets pane, select the Recognize Line
Item Field rule and click the Add to DCO
button.
8. Repeat to attach the Recognize Line Item Field rule to the Description and Cost fields under the
Meals_Line_Item and the Category, Quantity, Unit_Cost, and Total fields under the
Other_Charges_Line_Item.
9. Select the Meals_Total field. Then in the Rulesets pane, select the Recognize Grid Total Field rule
and click the Add to DCO
button.
10. Select the Other_Charges_Total field and click the Add to DCO
button.
11. In the Document hierarchy pane, click the Save button and then click Unlock DCO.
206
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
RUNNING A BATCH THROUGH THE WORKFLOW
1. Copy the files Images_Page_12 and Images_Page_13 from the samples images download location into
the TravelDocs application‟s “images” folder.
2. Click the Datacap Studio Test tab.
3. In the Workflow pane, select the VScan task profile under Main Job.
4. Click the New button to start a new batch.
5. Use the Process rules for target object  button and the Advance button to move the batch through
to the Verify task (do not run the Verify task).
6. In the Runtime batch hierarchy pane, expand the last hotel document and make sure pages TM000012
and TM000013 were identified correctly.
7. Expand the Meals page and then expand each of the line items in turn to make sure each line item
contains data.
When you expand the last line item, you should see a value for the Cost field only. This is the “Total”
field at the bottom of the grid. Since this is not a real line item, we‟ll have to create a rule to eliminate
non-line items. We‟ll do this in the next section.
8. Expand the Other_Charges page and then expand each line item in turn to make sure each line item
contains data. When you expand the last line item, you should see a value for the Total field only. This is
the “Total” field at the bottom of the grid that will also have to be eliminated.
9. In the Workflow pane, right-click the batch and choose Cancel.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
207
HANDLING LINE ITEM GRIDS
CREATING RULES TO REMOVE THE NON-LINE ITEMS
1. Click the Datacap Studio Rulemanager tab.
2. In the Rulesets pane, select the Clean ruleset and click the Lock/Unlock ruleset button.
3. Right-click the Clean ruleset and choose Add Rule. Rename the new rule Remove Non Line Items
(Meals).
4. Under the Remove Non Line Items (Meals) rule, select Function 1.
5. On the Actions library tab, expand the Validations library, select the CheckSubFields action, and click
the Add to function
button.
6. In the Rulesets pane, select the CheckSubFields action. Then in the Properties pane, set strParam to:
'Date' AND 'Description' AND 'Cost'
7. Right-click the Clean ruleset and choose Add Rule. Rename the new rule Remove Non Line Items
(Other Charges).
8. Under the Remove Non Line Items (Other Charges) rule, select Function 1.
9. Select the CheckSubFields action, and click Add to function
.
10. In the Rulesets pane, select the new CheckSubFields action and set strParam to:
'Date' AND 'Category' AND 'Quantity' AND 'Unit_Cost' AND 'Total'
11. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish ruleset. The
ruleset should look like the one shown below.
12. In the Document hierarchy pane, click the Lock DCO for editing button.
13. Expand the Meals and Other_Charges pages.
14. Select the Meals_Grid field. Then in the Rulesets pane, select the Remove Non Line Items (Meals)
rule and click the Add to DCO
button. Taskmaster adds the rule to the field‟s “Open” element.
15. Select the Other_Charges_Grid field. Then in the Rulesets pane, select the Remove Non Line Items
(Other Charges) rule and click the Add to DCO
button.
16. Click the Save button and then click the Unlock DCO button.
17. Run another batch through the workflow as described on the previous page and make sure the non line
items are no longer included. Cancel the batch when you‟re done.
208
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
TRAVELDOCS: VALIDATING LINE ITEM GRID DATA
The grid on the “Other_Charges” page includes calculated fields.



The first “Total” field  represents the “Quantity” multiplied by the “Unit Cost.”

The second “Total” field  represents the sum of all the line item totals.
In this section we‟ll add validations to the “Other_Charges” page to make sure these calculations are correct.
VALIDATING THE LINE ITEM TOTALS
We used the “Calculate” action earlier in this guide when we checked that the total cost of the air ticket equals
the airfare plus taxes.
Library
Action
Description
Validations
Calculate
Returns True if the arithmetic expression is valid; returns False otherwise.
When we used the action before, we performed the calculation at the page level, since the page was the next
level up in the document hierarchy. This time, however, the next level up in the hierarchy is the line item.
Other_Charges_Line_Item
Quantity
Unit_Cost
Total
Due to the way Taskmaster processes line item grids, we can‟t attach calculations to line item objects. Instead,
what we can do is create a “dummy” field within the line item and attach the rule to this field.
Other_Charges_Line_Item
Quantity
Unit_Cost
Total
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Validation  Perform calculation here
209
HANDLING LINE ITEM GRIDS
CREATING THE VALIDATION RULE
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the Validate ruleset and click the
Lock/Unlock ruleset button.
2. Right-click the Validate ruleset and choose Add Rule. Rename the new rule Validate Other Charge.
3. Under the Validate Other Charge rule, select Function 1.
4. On the Actions library tab, expand the Validations library, select the Calculate action, and click the
Add to function
button.
5. In the Rulesets pane, select the Calculate action. Then in the Properties pane, set strParam to:
'Quantity' * 'Unit_Cost' = 'Total'
6. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish ruleset. The
rule should look like the one shown below.
ATTACHING THE RULE TO THE DOCUMENT HIERARCHY
Here, we‟ll create the “dummy” field as a sub-field of the “Other_Charges_Line_Item” field and attach the
new rule to it.
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the Other_Charges page completely, so you can see all the individual fields.
3. Right-click the Other_Charges_Line_Item field and choose Add > Field. Then rename the new field
Validation.
4. Select the Validation field. Then in the Rulesets pane, select the Validate Other Charge rule and click
the Add to DCO
button. Taskmaster adds the rule to the field‟s “Open” element.
5. Click the Save button.
210
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
VALIDATING THE GRID TOTAL
This validation is to ensure that the grid total (“Other_Charges_Total”) equals the sum of the line item totals
(“Total”).
Other_Charges
Other_Charges_Total
Other_Charges_Grid
Other_Charges_Line_Item
Quantity
Unit_Cost
Total
Validation
There are a few ways you could do this calculation. Here, we‟ll attach the following rule to the page‟s “Close”
element:
The validation action looks a little unusual:
Calculate("'Total' = 'Other_Charges_Total'")
The reference to “Total” sums all of the child fields labeled “Total.” The action then compares the sum to the
“Other_Charges_Total” field.
CREATING THE VALIDATION RULE
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the Validate ruleset and click the
Lock/Unlock ruleset button.
2. Right-click the Validate ruleset and choose Add Rule. Rename the new rule Validate Other Charges
Total.
3. Under the Validate Other Charges Total rule, select Function 1.
4. On the Actions library tab, expand the Validations library, select the Calculate action, and click the
Add to function
button.
5. In the Rulesets pane, select the Calculate action. Then in the Properties pane, set strParam to:
'Total' = 'Other_Charges_Total'
6. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish ruleset. The
rule should look like the one shown below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
211
HANDLING LINE ITEM GRIDS
ATTACHING THE RULE TO THE DOCUMENT HIERARCHY
Here we‟ll attach the new validation rule to the Other_Charges page‟s “Close” element.
1. Make sure the document hierarchy is still locked for editing.
2. If necessary, expand the Other_Charges page so you can see the page‟s “Close” element.
 “Close” element
3. Select the “Close” element. Then in the Rulesets pane, select the Validate Other Charges Total rule
and click the Add to DCO
button. Taskmaster adds the rule to the “Close” element.
4. Click the Save button and then click the Unlock DCO button.
RUNNING A BATCH THROUGH THE WORKFLOW
The “Other Charges” page in the sample images collection includes calculation errors to trigger validation
failures.
 Line item total incorrect
 Grid total incorrect
When we run the batch through the workflow we should see these failures.
212
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Use the Process rules for target object  button and the Advance button to move the batch through
to the Verify task (do not run the Verify task).
5. Open the application‟s most recent batch folder (C:\Datacap\TravelDocs\batches\<batch_id>). Then
open the file Ruleruner.xml and scroll down to the data for page TM0000013.
<P id="TM000013">
<V n="TYPE">Other_Charges</V>
<V n="STATUS">1</V>
 Page status is ‘1’
etc.
<V n="MESSAGE">Failed Calculation:FormatNumber(138.75 ,8,0,0)=  Calc failure
FormatNumber( 238.75,8,0,0)</V>
<V n="DATAFILE">tm000013.xml</V>
</P>
The page status is „1‟ (indicating a problem) and the message indicates that the calculation relating to the
“Total_Charges” field failed.
6. Open the file tm000013.xml. Notice that there‟s a calculation failure in the first line item and that
Taskmaster flags all the fields involved in the calculation.
<F id="Other_Charges_Line_Item0">
<V n="TYPE">Other_Charges_Line_Item</V>
etc.
<F id="Quantity">
etc.
<V n="STATUS">1</V>
<V n="MESSAGE">Failed By Calculate Action On Field  Quantity field
&apos;Validation&apos;.</V>
etc.
</F>
<F id="Unit_Cost">
etc.
<V n="STATUS">1</V>
<V n="MESSAGE">Failed By Calculate Action On Field  Unit_Cost field
&apos;Validation&apos;.</V>
etc.
</F>
<F id="Total">
etc.
<V n="STATUS">1</V>
<V n="MESSAGE">Failed By Calculate Action On Field  Total field
&apos;Validation&apos;.</V>
etc.
</F>
<F id="Validation">
etc.
<V n="STATUS">1</V>
<V n="MESSAGE">Failed Calculation:FormatNumber  Validation field
(1 * 4.95 ,8,0,0)=FormatNumber( 9.9,8,0,0)</V>
</F>
</F>
7. In Datacap Studio, cancel the batch when you‟re done.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
213
HANDLING LINE ITEM GRIDS
TRAVELDOCS: CREATING VERIFICATION PANELS FOR THE LINE ITEM GRID PAGES
USING BATCH PILOT FOR VERIFICATION
CREATING THE NEW PANEL
1. Click Start > All Programs > Datacap > Batch Pilot > Batch Pilot.
2. Click File > Open Project.
3. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_verify.bpp and click Open.
4. In the bottom section of the Batch Pilot window, expand TravelDocs and then expand Hotel.
5. Right-click in the Meals document and choose AutoForm.
6. If any of the labels are clipped, as they are in the example above, stretch the label controls to all the text is
visible.
7. Click File > Save Form.
8. Open the dco_TravelDocs\verify folder and save the form as Meals.dcf.
214
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
9. In the bottom section of the Batch Pilot window, right-click the Meals page and choose Pick form.
10. From the dco_TravelDocs\verify folder select Meals.dcf and click Open. This links the form to the
page type.
11. In the bottom section of the Batch Pilot window, right-click the Other_Charges page and choose
AutoForm.
12. If any of the labels are clipped, as they are in the example above, stretch the label controls to all the text is
visible.
13. Delete the controls associated with the Validation field (the label, the DCImage control, and the DCEdit
control), since this field is for internal use and we don‟t want to display it to the operator. Then resize the
parent containers and move the Other_Charges_Total field up.
14. Click File > Save Form.
15. Open the dco_TravelDocs\verify folder and save the form as Other_Charges.dcf.
16. In the bottom section of the Batch Pilot window, right-click the Other_Charges page and choose Pick
form.
17. From the dco_TravelDocs\verify folder select Other_Charges.dcf and click Open. This links the
form to the page type.
18. Click File > Save Project and then close the Batch Pilot window.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
215
HANDLING LINE ITEM GRIDS
PREPARING A BATCH FOR VERIFICATION
1. If Taskmaster Client is not already started:

Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.

Select TravelDocs, click OK, and log in using User ID: admin, Password: admin, and Station: 1.
2. Double-click the VScan icon and wait for the task to complete. Then click Stop.
3. Double-click the PageID icon and wait for the task to complete. Then click Stop.
4. Double-click the Rulerunner icon and wait for the task to complete. Then click Stop.
REVIEWING THE BATCH IN BATCH PILOT
1. In the Taskmaster Client window, double-click the Verify/Fixup icon. Batch Pilot displays the first
rental agreement page.
2. Click the Next Problem
on the “Car_Type” field.
button until you reach the rental agreement page with the validation error
3. Select a valid car type (for example, “Other”) and then click the Next Problem
reach the air ticket page with the validation error.
button until you
4. Click the Next Problem
button and click Yes to override the validation failure. Then click the Next
Problem
button until you reach the Meals page.
216
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
5. Click the > button within the Meals_Line_Item section to view the other line items.
6. Click the Next Problem
button again to display the Other Charges page.
The red fields indicate validation failures relating to the first line item and the grid total field.
7. Click the Next Problem
button and click Yes to override the validation failures. Then click Yes to
put the batch on hold and click Stop.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
217
HANDLING LINE ITEM GRIDS
USING DOTEDIT FOR VERIFICATION
 This section uses the batch you put on hold in the previous section. If you skipped the section, prepare
a batch from Taskmaster Client as described under “Preparing a batch for verification” on page 216.
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster DotEdit.
2. In the Application field, type TravelDocs.
3. Enter User ID: admin, Password: admin, and Station: 1 and click Login.
4. In the Shortcut field, select Verify/FixUp and click Start. Then open the most recent batch by doubleclicking it..
5. In the Batch View pane, expand the third Hotel document and select the Other_Charges page. Then
select Other_Charges_Line_Item0 in the grid. DotEdit displays the corresponding line item in the
snippet view beneath the grid.
6. You can see the fields with validation errors marked in red. By default, the application is configured so
you can override validation failures, so click Submit and then click OK to override the errors.
 DotEdit‟s default field-at-a-time interface might be confusing for operators if line item grids are
involved. If you want to modify the DotEdit interface you must create a custom panel for each
page. This is beyond the scope of this guide but is described in the IBM Datacap Taskmaster Capture
Creating Custom DotEdit Panels Guide.
7. When asked if you want to continue from the start of the batch, click No and then close DotEdit.
218
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
TRAVELDOCS: EXPORTING LINE ITEM GRID DATA TO A DATABASE
EXPORTING TO A DATABASE
We‟ll update the application to export data from the “Other Charges” page to a table in the export database.
CREATING THE EXPORT DATABASE TABLE
 If you don‟t have Microsoft Access and used the sample Access file earlier (see “Creating the export
database” on page 170), the file you copied has the required table so you can skip this section.
1. Open the file TravelDocsExport.mdb in Microsoft Access.
2. Create a new table called Other_Charges.
3. Create new field for BatchID and for each of the fields defined for the Other Charges page, as shown
below. Make all fields of type “Text.”
4. Save the new table.
ADDING RULES TO THE EXPORTDB RULESET
1. In the Datacap Studio Rulesets pane, select the ExportDB ruleset and click the Lock/Unlock ruleset
for editing button.
2. Right-click the ExportDB ruleset and choose Add Rule. Rename the new rule Export Other Open
Database.
3. Repeat to create three additional rules:



Export Other Line Item
Export Other Total
Export Other Close Database
4. Click the Actions library tab and expand the ExportDB library.
5. Expand the Export Other Open Database rule and select Function1.
6. Select and add each of the actions shown in the table below to Function1 using the Add to function
button. Then set the action parameters as shown in the table.
Action
Parameter
ExportOpenConnection
@APPVAR(*/exportdb:cs)
SetTableName
Other_Charges
7. Expand the Export Other Line Item rule and select Function1.
8. Select and add each of the actions shown in the table below to Function1 using the Add to function
button. Then set the action parameters as shown in the table below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
219
HANDLING LINE ITEM GRIDS
Action
Parameter
ExportBatchIDToColumn
BatchID
ExportFieldToColumn
Date,Charge_Date
ExportFieldToColumn
Category,Category
ExportFieldToColumn
Quantity,Quantity
ExportFieldToColumn
Unit_Cost,Unit_Cost
ExportFieldToColumn
Total,Total
AddRecord
9. Expand the Export Other Total rule and select Function1.
10. Select and add each of the actions shown below to Function1 using the Add to function button. Note
that second action is ExportToColumn. Then set the action parameters as shown in the table.
Action
Parameter
ExportBatchIDToColumn
BatchID
ExportToColumn
Total
AddRecord
11. Expand the Export Other Close Database rule and select Function1.
12. Select and add the action shown in the table below to Function1 using the Add to function button.
Action
Parameter
ExportCloseConnection
13. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The finished ruleset should look like the one below.
220
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
HANDLING LINE ITEM GRIDS
ATTACHING THE EXPORT OTHER RULES TO THE DOCUMENT HIERARCHY
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so all the fields in the Other_Charges page are visible.
3. Select the Other_Charges page. Then select the Export Other Open Database rule and click the Add
to DCO
button.
4. Select the Other_Charges_Line_Item field. Then select the Export Other Line Item rule and click the
Add to DCO
button.
5. Select the Other_Charges_Total field. Then select the Export Other Total rule and click the Add to
DCO
button.
6. Select the Other_Charges page‟s “Close” element. Then select the Export Other Close Database rule
and click the Add to DCO
button.
7. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button. The
rules should be linked to the document hierarchy as shown below.
Other_ChargesPage
Open
(global)
ExportDB : Export Other Open Database
Other_Charges_Grid
Open
Other_Charges_Line_item
Open
(global)
ExportDB : Export Other Line Item
Close
ExportDB
Export Other Open Database
Function1
Export Other Line Item
Function1
Close
Other_Charges_Total
Open
(global)
ExportDB : Export Other Total
Close
Export Other Total
Function1
Export Other Close Database
Function1
Close
(global)
ExportDB : Export Other Close Database
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
221
HANDLING LINE ITEM GRIDS
RUNNING A BATCH THROUGH THE WORKFLOW
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Use the Process rules for target object  button and the Advance button to move the batch through
the VScan, PageID, Rulerunner, Verify, and Export tasks.
5. Open the file C:\Datacap\TravelDocs\TravelDocsExport.mdb and review the exported data in the
Other_Charges table.
222
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 15
USING SMART PARAMETERS
“Smart parameters” are action arguments that get evaluated at runtime. We‟ve used a few smart parameters
already in the TravelDocs application – for example, when we created the ExportXML ruleset we used
@BatchID to set the export filename equal to the ID of the current batch:
xml_SetFileName("@BatchID")
Since the value of @BatchID is different each time we run a batch through the workflow, we can create a
unique export file for each batch.
Taskmaster provides a variety of special variables (@<variable_name>) that you can use within smart
parameters to access dynamic information at runtime. You can use them to:

Get information from the application configuration file

Get (or set) the value of an object in the document hierarchy (typically a variable or a field value)

Get job information such as the task name, ID of the operator running the batch, etc.

Get system information such as the current date, time, etc.
Smart parameters can also include strings, navigation elements, and combinations of strings, navigation
elements, and special variables.
In this chapter, we‟ll look in more detail at smart parameters. At the end of the chapter, we‟ll update the
TravelDocs application to export the line item grid data to an XML file, since this requires the use of a variety
of smart parameters.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
223
USING SMART PARAMETERS
GENERAL STRUCTURE OF A SMART PARAMETER
A smart parameter may include any number of elements that are combined at runtime to produce a single
action argument. Elements may include special variables, string constants, and navigation elements.
 Smart parameters do not work with all actions. Check the action help
compatibility information.
in Datacap Studio for
EXAMPLE 1
The example below shows an action with a single smart parameter argument that includes three smart
parameter elements:
SetSourceDirectory("@APPPATH(vscanimagedir)+\+Input")




@APPPATH(vscanimagedir)
Special variable that gets a setting from the application
configuration (.app) file (see “Using special variables to
access application configuration settings” on page 226)

\
String constant

Input
String constant
Elements are combined using the „+‟ sign. At runtime, Taskmaster first evaluates any special variables and
then concatenates the elements to create a single string that becomes the action argument.
07:13:25.53
07:13:25.53
07:13:25.54
07:13:25.54
07:13:25.54
07:13:25.54
07:13:25.54
07:13:25.54
07:13:25.54
07:13:25.55
3 Smart Parameter elements found
Parsing Smart Parameter element {0} value: "@APPPATH(vscanimagedir)"
@APPPATH key root value: 'vscanimagedir'
@APPPATH looking for workflow key: '*/dco_TravelDocs/vscanimagedir'
workflow key found: 'C:\Datacap\TravelDocs\images'
Parsing Smart Parameter element {1} value: "\"
Parsing Smart Parameter element {2} value: "Input"
Smart Parameter return value: 'C:\Datacap\TravelDocs\images'
looking for:C:\Datacap\TravelDocs\images\Input
Action changes: Directory with source images:
C:\Datacap\TravelDocs\images\Input
Note that it would be incorrect to specify the argument as @APPPATH(vscanimagedir)+\Input. The „\‟
sign when followed by a string represents a navigation element (see “Using navigation elements to access the
runtime hierarchy” on page 234). In this example, we don‟t want to specify a navigation element. Instead, we
want to concatentate the result of @APPPATH(vscanimagedir) with the string “\Input” and can do this
using +\+Input.
224
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
EXAMPLE 2
The next example shows an action with two smart parameter arguments. Each argument includes one smart
parameter element:
rrSet ("..\Pickup_Location","@B.FieldValue")



..\Pickup_Location
Navigation element that references another field on the
same page within the runtime hierarchy

@B.FieldValue
Special variable that references a batch level custom
variable within the runtime hierarchy
The portion of the log file below shows how Taskmaster evaluates the first argument by recognizing it as a
navigation element. It then goes to the referenced element within the runtime hierarchy and retrieves the
field‟s value. In this example, the action is bound to a field, so ..\Pickup_Location references another
field at the same level on the same page.
08:17:30.892
action rrSet (str="..\Pickup_Location",str="@B.FieldValue")
08:17:30.892
execute statement On Action Start
08:17:30.892
executing code:
08:17:30.892
Call OnActionStart()
08:17:30.892
/execute statement On Action Start
08:17:30.892 1 Smart Parameter element found
08:17:30.892 Parsing Smart Parameter element {0} value: "..\Pickup_Location"
08:17:30.892 DCO Parent Navigation key match (starts with '\' or '..\'). Calling
DCONavGetValue(..\Pickup_Location)
08:17:30.892 Finding Child 'Pickup_Location' -->
08:17:30.893 Found child 'Pickup_Location'
08:17:30.895 Finding Dictionary assigned to DCO Node:'Pickup_Location'
08:17:30.895 This DCO does not have an assigned Dictionary or is not an OMR type
Field.
08:17:30.895 Smart Parameter return value: 'Orlando (MCO)'
08:17:30.896 Setting '20110054.002.FieldValue' value to 'Orlando (MCO)'.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
225
USING SMART PARAMETERS
USING SPECIAL VARIABLES TO ACCESS APPLICATION CONFIGURATION SETTINGS
The application configuration file (or “.app file”) stores the application‟s paths, connection strings, and other
settings. Do not attempt to modify this file directly – instead, use the Taskmaster Application Manager. We
used the Application Manager earlier when we configured the export database (see “Configuring the export
database in the Taskmaster Application Manager” on page 170).
The .app file is stored in the root of the application folder. For example, the TravelDocs application‟s
configuration file is C:\Datacap\TravelDocs\TravelDocs.app:
<app name="TravelDocs" ver="7" modder="Pete.DC14.DATACAP" dt="12/22/10.753
11:41:06.753 " src_ver="1">
<k name="tmservers">
<k name="tms" ip="127.0.0.1" port="2402" retry="3"/>
</k>
<k name="runtime" v="batches"/>
<k name="tmengine" cs="<encoded_connection_string>"/>
 encoded
<k name="tmadmin" cs="<encoded_connection_string>"/>
<k name="dco_TravelDocs">
<k name="setupdco" v="TravelDocs.xml"/>
<k name="rules" v="rules"/>
<k name="imagefix" v="imagefix.ini"/>
<k name="UseFPXML" v="False"/>
 encoded
<k name="fingerprintconn" cs="<encoded_connection_string>"/>
 encoded
<k name="vscanimagedir" v="C:\Datacap\TravelDocs\images"/>
 encoded
<k name="exportdb" cs="<encoded_connection_string>"/>
</k>
<k name="fingerprint" v="fingerprint"/>
<k name="export" v="export"/>
</app>
 encoded
 Connection strings may contain usernames and passwords, so they are encoded when they‟re written to
the .app file. Taskmaster encodes and decodes these automatically, so no special handling is required
when accessing these from your application using smart parameters. For information about storing
other action parameters in the .app file as encoded strings, see “Storing passwords, connection strings,
and other parameters in the .app file” on page 228.
Applications can access settings in the configuration file using the following special variables:
Smart Parameter
Description
@APPPATH
Retrieves the path to a file or folder from the application configuration file.
@APPVAR
Retrieves a connection string, value, or other attribute from the application configuration
file.
 For detailed information about these and other special variables, see the “Smart Parameter Special
Variable Reference” on page 431.
For each special variable, you specify a “key” representing the field you want to get from the configuration
file. When we used the APPPATH parameter earlier in this guide, we specified export as the key
(@APPPATH(export)).
226
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
DETERMINING THE CORRECT KEY NAME
You can obtain the correct special variable key name from the Taskmaster Application Manager:
1. Start the Taskmaster Application Manager (Start > All Programs > Datacap > Taskmaster Client >
Taskmaster Application Manager).
2. Move the mouse pointer over the field. The smart parameter and key name are displayed in the balloon
help.
The example above shows how to get the path to the application‟s “images” folder. The dco_*[1] prefix is
only required if the application has multiple workflows. Substitute *[1] with the workflow name, for
example:
@APPPATH(dco_Workflow2/vscanimagedir)
If there is only one instance, you can use * instead, for example:
@APPPATH(*/vscanimagedir)
Keys for connection strings are slightly more complicated, since the connection string is stored in the “cs”
attribute rather than the “v” attribute. The “v” attribute is the default attribute, so you don‟t need to specify
the attribute name. To obtain the value of a different attribute you must specify the attribute name using
:<attribute_name>, as shown in the example below.
We used this syntax earlier to obtain the connection string for the application‟s lookup database:
@APPVAR(*/lookupdb:cs)
Details of these special variables and a listing of key names are provided in Appendix A (see “Special variables
for accessing the application configuration file” on page 432).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
227
USING SMART PARAMETERS
STORING PASSWORDS, CONNECTION STRINGS, AND OTHER PARAMETERS IN THE .APP FILE
 The functionality described in this section is available in Taskmaster Capture 8.0.1 or higher.
The sample .app file shown earlier illustrates how Taskmaster encodes the standard Taskmaster database
connection strings (engine, admin, fingerprint, lookup, and export) before writing them to the .app file.
You can use the .app file to store other action parameters as encoded strings. You can then use smart
parameters to access the strings from your actions. This avoids having to specify sensitive information like
passwords as action parameters.
Instead of:
ex_login("svr/exch.asmx","user@company.com","secret")
Use:
ex_login("svr/exch.asmx","user@company.com",@APPVAR(values/adv/pwd))
You can also use the .app file to store other action parameters that may not be sensitive but that you don‟t
want to hardcode into your actions. For example, you might choose to store a machine-specific path as a
custom value so you can change it easily if you move the application to a different machine.
To store passwords, connection strings, and other parameters in the .app file, use the Taskmaster Application
Manager‟s Custom values tab:
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Application Manager.
2. Click the Custom values tab and select your application from the list on the left.
228
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
3. Click the Add new value/CS name button beneath the field you want to use:
Field
Description
General string values
Use this field for action parameters you don’t want to hardcode in
your actions. For example, instead of specifying a machine-specific
path as an action parameter, enter the path here and reference it
from your actions as described in the next section. Although
Taskmaster encodes the values when saving them to the .app file, do
not use this field for passwords as the strings are visible to anyone
using the Taskmaster Application Manager (use “Advanced values”
below).
Data source connection string
values
Use this field to store non-Taskmaster data source connection strings.
Type or paste your connection string into this field and reference it
from your actions as described in the next section.
Taskmaster data source connection
string values
Use this field to store Taskmaster data source connection strings. Click
the *…+ button to create the connection string using Taskmaster
supported providers and reference it from your actions as described in
the next section.
Advanced values
Use this field to store passwords or other strings you don’t want to
reveal through the Application Manager. Values you type here are
masked. Reference the value from your actions as described in the
next section.
4. Enter the value name and the value. Note that advanced values are masked whereas other values aren‟t.
5. Close the Taskmaster Application Manager window.
 If you change any of the settings in the application configuration file while Datacap Studio is open, use
the Connection Wizard button (top right of the Datacap Studio window) to reopen the your application
before executing tasks from the Datacap Studio Test tab. Reconnecting to the application forces
Datacap Studio to reload the information from the application configuration (.app) file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
229
USING SMART PARAMETERS
REFERENCING PASSWORDS, CONNECTION STRINGS, AND OTHER PARAMETERS FROM YOUR ACTIONS
To reference the custom values from your actions, you need to know the proper key path. You can get this
from the help text on the Custom Values tab in the Taskmaster Application Manager.
 How to reference the value
The text at the beginning of each section shows how to reference the value from an action. For example, for
values defined in the “Advanced values” section, use @APPVAR(values/adv/<value_name>). Using the
example above, you can reference the value as:
@APPVAR(values/adv/Mail password)
The table below shows how to reference the values for each field type.
Field
Description
General string values
@APPVAR(values/gen/<value_name>)
Example: @APPVAR(values/gen/MyParameter1)
Data source connection string values
@APPVAR(values/dsn/<value_name>:cs)
Example: @APPVAR(values/dsn/MyDatabase1:cs)
Taskmaster data source connection
string values
@APPVAR(values/tmdsn/<value_name>:cs)
Advanced values
@APPVAR(values/adv/<value_name>)
Example: @APPVAR(values/tmdsn/MyTMDatabase1:cs)
Example: @APPVAR(values/gen/MyPassword1)
 The ":cs" suffix is required to access connection strings defined in the “Data source connection
string” and “Taskmaster data source connection string” fields.
230
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
ACCESSING THE RUNTIME HIERARCHY
We‟ve looked at the runtime batch hierarchy at various times throughout this guide, both indirectly through
the Datacap Studio Test tab and directly by opening the runtime XML files.
Runtime batch hierarchy XML file
Datacap Studio Test tab
<B id="20110003.001">
<V n="TYPE">TravelDocs</V>
<D id="20110003.001.01">
<V n="TYPE">Car_Rental</V>
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="DATAFILE">tm000001.xml</V>
</P>
<P id="TM000002">
<V n="TYPE">Optional_Insurance</V>
<V n="DATAFILE">tm000002.xml</V>
</P>
</D>
etc.
<D id="20110003.001.04">
<V n="TYPE">Flight</V>
<P id="TM000006">
<V n="TYPE">Air_Ticket</V>
<V n="DATAFILE">tm000006.xml</V>
</P>
</D>
etc.
<P id="TM000006">
<F id="Outbound_From">
<V n="TYPE">Outbound_From</V>
<C cn="10" cr="668,426,681,448">66</C>
<C cn="10" cr="684,433,699,448">111</C>
etc.
</F>
<F id="Outbound_To">
<V n="TYPE">Outbound_To</V>
<C cn="10" cr="1068,426,1080,448">80</C>
<C cn="10" cr="1084,426,1086,448">105</C>
etc.
</F>
<F id="Outbound_Date">
<V n="TYPE">Outbound_Date</V>
<C cn="7" cr="203,426,215,448">49</C>
<C cn="7" cr="219,426,233,448">55</C>
etc.
</F>
etc.
B
o
P
i
1
7
Runtime page data XML file
In this section we‟ll examine how to access the information in the runtime hierarchy using smart parameters.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
231
USING SMART PARAMETERS
EXAMPLES OF USING SPECIAL VARIABLES TO ACCESS THE RUNTIME HIERARCHY
We‟ve already used special variables to access data within the runtime hierarchy. Here are some examples
from the ExportXML ruleset.
USING @BATCHID TO GET THE CURRENT BATCH ID
Runtime batch hierarchy XML file
xml_SetFileName("@BatchID")

xml_SetFileName("20110003.001")
<B id="20110003.001">
<V n="TYPE">TravelDocs</V>
<D id="20110003.001.01">
<V n="TYPE">Car_Rental</V>
<P id="TM000001">
etc.
USING @ID TO GET THE ID OF THE CURRENT PAGE
xml_NewNode("@ID,Rental_Agreements")

xml_NewNode("TM000001,Rental_Agreements")
<B id="20110003.001">
<V n="TYPE">TravelDocs</V>
<D id="20110003.001.01">
<V n="TYPE">Car_Rental</V>
<P id="TM000001">
etc.
USING @P\<FIELD_NAME> TO GET THE VALUE OF A FIELD ON THE CURRENT PAGE
Runtime page data XML file
<P id="TM000001">
<F id="Pickup_Date">
xml_SetNodeValue("Pickup_Date,
@P\Pickup_Date")

xml_SetNodeValue("Pickup_Date,
Tues, Dec 7, 2010")
<V n="TYPE">Pickup_Date</V>
<V n="Position">179,394,543,462</V>
<V n="STATUS">0</V>
<C cn="7" cr="200,416,220,440">84</C>
T
<C cn="10" cr="226,425,240,440">117</C>
u
<C cn="10" cr="245,425,258,440">101</C>
e
<C cn="10" cr="260,425,270,440">115</C>
s
<C cn="10" cr="273,435,278,444">44</C>
,
<C cn="10" cr="336,419,337,440">32</C>
<C cn="10" cr="290,419,306,440">68</C>
D
<C cn="10" cr="310,425,324,440">101</C>
e
<C cn="10" cr="325,425,336,440">99</C>
c
<C cn="10" cr="370,419,371,444">32</C>
<C cn="10" cr="349,419,363,440">55</C>
7
<C cn="10" cr="365,435,370,444">44</C>
,
<C cn="10" cr="445,419,446,440">32</C>
232
<C cn="10" cr="381,419,395,440">50</C>
2
<C cn="10" cr="396,419,411,440">48</C>
0
<C cn="10" cr="415,419,428,440">49</C>
<C cn="10" cr="430,419,445,440">48</C>
etc.
1
0
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
SUMMARY OF SPECIAL VARIABLES FOR ACCESSING THE RUNTIME HIERARCHY
A full listing of special variables for accessing the runtime hierarchy is provided in Appendix A (see “Special
variables for accessing the runtime hierarchy” on page 434). Note that you can use @ID and
@B/@D/@P.<variable_name> special variables to get most of the batch, document, and page information
from the runtime batch file, for example:
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<B id="20110003.001">
@ID
<V n="LAST_RR_TPROFILE">Rulerunner:m:eRun</V>
@B.LAST_RR_PROFILE
<V n="STATUS">1</V>
@B.STATUS
<D id="20110003.001.01">
@ID
<V n="TYPE">Car_Rental</V>
@D.TYPE
<V n="STATUS">0</V>
@D.STATUS
<P id="TM000001">
@ID
@P.TYPE
<V n="STATUS">1</V>
@P.STATUS
<V n="IMAGEFILE">tm000001.tif</V>
@P.IMAGEFILE
<V n="ScanSrcPath">c:\...\images\page_01.tif</V>
@P.ScanSrcPath
<V n="RecogStatus">0</V>
@P.RecogStatus
<V n="Confidence">0.9660463</V>
@P.Confidence
<V n="TemplateID">556</V>
@P.TemplateID
<V n="DATAFILE">tm000001.xml</V>
@P.DATAFILE
PAGE
<V n="TYPE">Rental_Agreement</V>
DOCUMENT
@B.TYPE
BATCH
<V n="TYPE">TravelDocs</V>
</P>
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
233
USING SMART PARAMETERS
Similarly, you can get most of the field information from the runtime page data file using the @ID,
@F.<variable_name>, and @P\<field_name> special variables, for example:
<?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?>
<P id="TM000001">
<F id="Pickup_Date">
@ID
@F.TYPE
<V n="Position">179,394,543,462</V>
@F.Position
<V n="STATUS">0</V>
@F.STATUS
<C cn="7" cr="200,416,220,440">84</C>
<C cn="10" cr="226,425,240,440">117</C>
FIELD
<V n="TYPE">Pickup_Date</V>
@P\Pickup_Date
<C cn="10" cr="245,425,258,440">101</C>
<C cn="10" cr="260,425,270,440">115</C>
</F>
etc.
USING NAVIGATION ELEMENTS TO ACCESS THE RUNTIME HIERARCHY
As shown in the previous section, you can reference various objects within the runtime hierarchy using the
special variables @B, @D, @P, and @F. You can also reference field values and variables using navigation elements.
Smart parameters support the following navigation elements:
\<field_name>
References a field one level beneath the current object
..\ <field_name>
References a field at the same level as the current object
When you reference a field by specifying just the field‟s name, Taskmaster retrieves the field‟s text value. You
can obtain the value of a variable associated with the field by appending .<variable_name>, for example:
..\Car_Type.TYPE
Note that you can‟t use this syntax to access variables on parent objects.
EXAMPLES
In these examples, the rr_Get action is bound to a field:

In the first example, the smart parameter returns the text of the “Car_Type” field on the current page.

In the second example, the smart parameter returns the value of the “Car_Type” field‟s “TYPE” variable.
Action: rr_Get("..\Car_Type.TYPE")
<F id="Car_Type">
<V n="TYPE">Car_Type</V>
<C cn="10" cr="588,748,600,769">83</C> ASCII ‘S’
<C cn="10" cr="605,748,620,769">85</C> ASCII ‘U’
<C cn="10" cr="625,748,643,769">86</C> ASCII ‘V’
Return value: Car_Type
</F>
Action: rr_Get("..\Car_Type")
Return value: SUV
234
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
USING OTHER SPECIAL VARIABLES
ACCESSING JOB AND TASK INFORMATION
Taskmaster includes several special variables for accessing job and task information. These include:

The ID and name of the current job

The ID and name of the current task

The username and station associated with the job
A listing of these special variables is provided in Appendix A (see “ Special variables for accessing job and
task information” on page 438).
ACCESSING OTHER INFORMATION
Taskmaster includes additional special variables for accessing other information, including:

The current date and time

Information from the DCO and Pilot objects

Settings defined in the application‟s “Paths.ini” file
A listing of these special variables is provided in Appendix A (see “Miscellaneous special variables” on
page 440).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
235
USING SMART PARAMETERS
TRAVELDOCS: EXPORTING LINE ITEM GRID DATA TO AN XML FILE
In the previous chapter, we added functionality to the TravelDocs application to export the “Other Charges”
grid data to a database (see “Exporting to a database” on page 219). In this section, we‟ll update the
application to export the same data to an XML file. The steps here use smart parameters to access the
runtime document hierarchy. We‟ll also set up a custom variable to store data within the document hierarchy.
EXPORTING TO AN XML FILE
ADDING RULES TO THE EXPORTXML RULESET
1. In the Datacap Studio Rulesets pane, select the ExportXML ruleset and click the Lock/Unlock
ruleset for editing button.
2. Right-click the ExportXML ruleset and choose Add Rule. Rename the new rule Export Other XML
Page Node.
3. Right-click the ExportXML ruleset and choose Add Rule. Rename the new rule Export Other XML
Line Item.
4. Right-click the ExportXML ruleset and choose Add Rule. Rename the new rule Export Other XML
Total Cost. The ExportXML ruleset should have the rules shown below.
5. Expand the Open XML File rule and the Open XML function.
6. Click the Actions library tab and expand the ExportXML library.
7. Select and add each of the following actions shown in the table below to theend of the Open XML
function using the Add to function
button. Then set the action parameters as shown in the table
below.
Action
Parameter
xml_NewNode
Car_Rentals,BatchID_+@BatchID
xml_NewNode
Rental_Agreements,Car_Rentals
xml_NewNode
Flights,BatchID_+@BatchID
xml_NewNode
Hotels,BatchID_+@BatchID
xml_NewNode
Other_Charges,Hotels
 These actions and their parameters are described on the next page.
236
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
Make sure you add these actions after the existing actions, as shown below. Together these actions set up
a structure for the XML as shown on the right.
<BatchID_batch_id>
<Car_Rentals>
<Rental_Agreements
/>
<Flights/>
<Hotels>
<Other_Charges/>
New node
</Hotels>
Parent node
</BatchID_batch_id
>
8. Expand the Export Other XML Page Node rule and select Function1.
9. Select and add each of the actions shown in the table below to Function1 using the Add to function
button. Then set the action parameters as shown in the table.
Library
Action
Parameter
ExportXML
xml_NewNode
@ID,Other_Charges
rrunner
rrSet
varSource = @ID
(do not use rr_Set)
varTarget = @P.ID
 xml_NewNode("@ID,Other_Charges") creates a new XML node using the ID of the current
page (for example, <TM000013>) beneath the <Other_Charges> node.
rrset("@ID", "@P.ID") stores the ID of the current page in a variable called “ID” within the
runtime hierarchy (for example, <V n="ID">TM000013</V>.
10. Expand the Export Other XML Line Item rule and select Function1.
11. Select and add each of the actions shown in the table below to Function1 using the Add to function
button. Then set the action parameters as shown in the table.
Library
Action
Parameter
ExportXML
xml_NewNode
Item,@P.ID
ExportXML
xml_SetAttributeValue
Item,Category,@F\Category
ExportXML
xml_SetAttributeValue
Item,Cost,@F\Total
 xml_NewNode("Item,@P.ID") creates a new node <Item> that is a child of the node we created
in the Export Other XML Page Node rule. @P.ID identifies the parent node using the page‟s “ID”
variable we saved earlier using rrset (since you can‟t reference the ID of a parent object using a
smart parameter).
xml_SetAttributeValue("Item,Category,@F\Category") creates a Category attribute on
the current <Item> node and sets the value to the value of the current line item‟s “Category” field
(for example, <Item Category=”Internet”/>.
xml_SetAttributeValue("Item,Cost,@F\Total") is the same except that the attribute is
Cost and the value is the current line item‟s “Total” field (for example, <Item Cost="9.90"/>.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
237
USING SMART PARAMETERS
12. Expand the Export Other XML Total rule and select Function1.
13. Select and add each of the actions shown below to Function1 using the Add to function button. Then
set the action parameters as shown in the table.
Library
Action
Parameter
ExportXML
xml_NewNode
Total_Cost,@P.ID
ExportXML
xml_SetNodeValue
Total_Cost, @P\Other_Charges_Total
 xml_NewNode("Total_Cost,@P.ID") creates a new XML node called <Total__Cost>
beneath the page node we created in the Export Other XML Page Node rule. xml_SetNodeValue
sets the value of this node to the value of the current page‟s “Other_Charges_Total” field (for
example, <Total_Cost>$238.75</Total_Cost>).
14. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The finished ruleset should look like the one on the next page.
COMPLETE EXPORTXML RULESET
238
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
USING SMART PARAMETERS
ATTACHING THE EXPORT OTHER XML RULES TO THE DOCUMENT HIERARCHY
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so the fields in the Other_Charges page are visible.
3. Select the Other_Charges page. Then select the Export Other XML Page Node rule and click the
Add to DCO
button.
4. Select the Other_Charges_Line_Item field. Thenselect the Export Other XML Line Item rule and
click the Add to DCO
button.
5. Select the Other_Charges_Total field. Then select the Export Other XML Total Cost rule and click
the Add to DCO
button.
6. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button. The
rules should be linked to the document hierarchy as shown below.
Other_Charges
Open
(global)
ExportXML : Export Other XML Page Node
Other_Charges_Grid
Open
Other_Charges_Line_item
Open
(global)
ExportXML : Export Other XML Line Item
Close
ExportXML
Export Other XML Page Node
Function1
Export Other XML Line Item
Function1
Close
Other_Charges_Total
Open
(global)
ExportDB : Export Other XML Total Cost
Export Other XML Total Cost
Function1
Close
etc.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
239
USING SMART PARAMETERS
RUNNING A BATCH THROUGH THE WORKFLOW
1. Click the Datacap Studio Test tab.
2. In the Workflow pane, select the VScan task profile under Main Job.
3. Click the New button to start a new batch.
4. Use the Process rules for target object  button and the Advance button to move the batch through
the entire workflow.
5. Open the file C:\Datacap\TravelDocs\export\<batch_id>.xml and review the exported XML data.
<?xml version='1.0' ?>
<BatchID_20100351.006>
<Flights/>
<Car_Rentals>
<Rental_Agreements>
<TM000001>
<Pickup_Date>Trues, Dec 7, 2010</Pickup_Date>
etc.
</TM000001>
<TM000003>
<Pickup_Date>Mon, Dec 6, 2010</Pickup_Date>
etc.
</TM000003>
<TM000004>
<Pickup_Date>Mon, Dec 13, 2010</Pickup_Date>
etc.
</TM000004>
</Rental_Agreements>
</Car_Rentals>
<Hotels>
<Other_Charges>
<TM000013>
<Item Category="Internet" Cost="$9.90"/>
<Item Category="Laundry" Cost="$18.00"/>
<Item Category="Internet" Cost="$4.95"/>
<Item Category="Newspaper" Cost="$2.00"/>
<Item Category="Mini bar" Cost="$8.00"/>
<Item Category="Internet" Cost="$4.95"/>
<Item Category="Newspaper" Cost="$2.00"/>
<Item Category="Mini bar" Cost="$8.00"/>
<Item Category="Internet" Cost="$4.95"/>
<Item Category="Newspaper" Cost="$2.00"/>
<Item Category="Parking" Cost="$74.00"/>
<Total_Cost>$238.75</Total_Cost>
</TM000013>
</Other_Charges>
</Hotels>
</BatchID_20100351.006>
240
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 16
TEXT MATCHING
Up until now, we‟ve used fingerprints to identify pages and recognition zones to locate data on those pages.
The one exception was when we used the RegExFind action to locate the Total Cost field when processing
line item grids. You saw in that example how you can use text matching to locate data that‟s not in a
predictable location on the page.
You can add flexibility to your applications by using text matching to identify pages and locate data. To use
text matching, you first perform full page OCR on the incoming page. You can then search the recognition
results for specific text. For example, if a page contains the words “Car rental agreement” then the chances
are very high that it‟s a car rental agreement page; similarly, if you locate a string “Pickup date” then the
chances are high that beside (or below) the string you‟ll find the actual pickup date.
In this chapter, we‟ll look at techniques you can use to identify pages and locate data using text matching. At
the end of the chapter, we‟ll update the TravelDocs application to identify and process a new car rental
agreement page using text matching.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
241
TEXT MATCHING
IDENTIFYING PAGES USING TEXT MATCHING
In the earlier chapter on page identification, we showed how you can identify pages by searching the
recognition results for a string that‟s unique to each page type.
 Run full page OCR
 Look for the word “Car”
 If found, set the page type to “Rental_Agreement”
 If the previous function fails, look for the word “Flight”
 If found, set the page type to “Air_Ticket”
 If the previous function fails, look for the word “Room”
 If found, set the page type to “Room_Receipt”
 Remember that if all the actions in a given function return True, Taskmaster does not execute any other
functions in the current rule. So in the example above, if we get a match on the word “Car” and are able
to set the page type successfully, the PageID rule exits without performing any of the other tests.
Text matching uses the full page recognition results, so you must perform full page OCR (or ICR) before
executing any of the text matching actions. You can then use the WordFind action to determine if a specific
string is present and the SetPageType action to set the page type accordingly.
Library
Action
Description
Locate
WordFind
Locates the first (or next) occurrence of the specified word or phrase
on the current page.
DCO
SetPageType
Assigns a page type to the current page in the runtime hierarchy.
Note that the WordFind action is case sensitive. Additionally, if different variants of a given page type have
different unique identifiers, you may need to use a more flexible matching technique, such as regular
expressions or keyword lists. These are both covered later in this chapter (see “Using regular expressions” on
page 243 and “Using keyword lists” on page 244).
242
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TEXT MATCHING
LOCATING DATA USING TEXT MATCHING
One way to locate data without using recognition zones is to search the full page recognition results for some
static text that‟s adjacent to the field you want to read. Forms typically have a label beside each field. Once
you‟ve located the label, you can locate the adjacent data and then update the runtime hierarchy.
Label
Data
In the example above, we first locate the word “Date,” then go to the right to obtain the matching data, and
then write the information to the runtime hierarchy. We‟ll go through each of these steps in turn.
LOCATING LABELS
LOCATING SIMPLE STRINGS
The “Locate” library includes several actions you can use to locate specific text strings on the current page,
including those described below.
Library
Action
Description
Locate
WordFind
Locates the first (or next) occurrence of the specified word or phrase on
the current page.
Locate
FindLastWord
Locates the last occurrence of the specified word or phrase on the current
page.
For detailed information on these and other actions in the “Locate” library, select the action on the Actions
Library tab and click the Display information
button.
We used WordFind in the previous page identification example. You can use it, or FindLastWord, to locate
any word or phrase on the current page, although these actions require an exact match. If you need more
flexible matching, you can use regular expressions or keyword lists, as described in the sections that follow.
USING REGULAR EXPRESSIONS
If you‟re using text matching on pages that have some variability, you may be unable to locate a label using a
simple text string. For example, one car rental company might use the label “Date,” another might use the
label “Pickup date,” while another might use “Pickup Date” In an example like this you can use the “Locate”
library‟s regular expression actions, including those described below.
Library
Action
Description
Locate
RegExFind
Same as WordFind, except supports regular expressions.
Locate
FindLastRegEx
Same as FindLastWord, except supports regular expressions.
Here‟s one way you can search for any of the three pickup date labels in the example above using a single
regular expression.
RegExFind("(Date)|(Pickup [Dd]ate)")
The “Locate” library‟s regular expression actions use the VBScript regular expression rules. To cover these is
beyond the scope of this guide, but they are described in detail in the following MSDN® article:
http://msdn.microsoft.com/en-us/library/1400241x%28v=vs.85%29.aspx
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
243
TEXT MATCHING
USING KEYWORD LISTS
Some labels may have more variability than you can address using regular expressions. For example, if you‟re
processing invoices from different vendors, one company might use the label “Total Cost,” another might use
“Amount Due,” and another might use “Invoice total.”
Total Cost: $1,523.49
Amount Due: $76.89
Invoice total: $349.13
Although you could conceivably use a regular expression in this example, it gets unwieldy if there are too
many variations, and a keyword list may be a better option. The “Locate” library provides actions to perform
text matching using either a keyword text file or a database.
Library
Action
Description
Locate
FindKeyList
Locates the first (or next) occurrence of a word or phrase that matches
one of the entries in a keyword file.
Locate
FindLastKeyList
Locates the last occurrence of a word or phrase that matches one of
the entries in a keyword file.
Locate
FindDBList
Locates a word that matches one of a list of words obtained from a
SQL query.
In the invoice example above, you can include all three labels as well as any others in a keyword text file, and
then use FindRegExList to locate the matching field. The keyword file must be a plain text file with the
extension .key and must be located in the application‟s dco_<application_name> folder, unless you specify the
full path to an alternative location.
TotalCost.key
Total Cost
Total cost
TOTAL COST
Amount Due
Amount due
AMOUNT DUE
Total Amount
Total amount
TOTAL AMOUNT
Total Amount Due
Total amount due
TOTAL AMOUNT DUE
Invoice Total
Invoice total
INVOICE TOTAL
FindRegExList("TotalCost.key")
Note that you can include regular expressions in the keyword list, although we haven‟t done so in this
example.
244
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TEXT MATCHING
LOCATING THE FIELD DATA
After locating the label, you‟ll need to locate the adjacent field data. Typically it‟s to the right of the label, but
it may also be above or below the label. Additionally, you may need to group words together if the data you‟re
searching for includes spaces.
The full page recognition engine organizes the recognition results in the CCO file as a coordinate-based grid
of lines and words. Each word is assigned a different position in the grid.

1
2
3
4
1
Car
Rental
#4
2
Pickup
Details
Return
Details
3
Date:
Mon,
Jan
10,
4
Time:
11:00AM
Time:
04:00PM
5
Location:
Orlando
(MCO)
Location:
5
6
7
8
9
10
2011
Date:
Fri,
Jan
14,
2011
Orlando
(MCO)
This structure lets you move around the recognition results using the “Locate” library‟s navigation actions.
The library also includes actions for grouping words together.
Library
Action
Description
Locate
GoRightWord
Moves the specified number of words to the right of the previously
found word or phrase.
Locate
GoDown(Up)Line
Moves down (up) the specified number of lines from the previously
found word or phrase and selects the first word.
Locate
GroupWordsRIGHT(LEFT)
Groups words to the right (left) of the previously found word if they
are no more than the specified number of character widths apart.
Locate
GroupWords
Groups words to the left and right of the previously found word if they
are no more than the specified number of character widths apart.
The example below searches for the word “Date” and then goes one word to the right to obtain the data. The
GroupWordsRIGHT action is required since without it we‟d get only the first word (“Mon,” in the Car Rental
#4 example above). The parameter „2‟ says to group words that are two or fewer character widths apart.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
245
TEXT MATCHING
UPDATING THE RUNTIME DATA FILE WITH THE RECOGNIZED TEXT
Once you‟ve located the data field, you need to write the data to the runtime hierarchy. You do this using the
“Locate” library‟s UpdateField action.
Library
Action
Description
Locate
UpdateField
Updates the page data file with the recognized value and position
of the located word.
The CreateFields action we used earlier in the CreateDocs ruleset is responsible for setting up each field in
the runtime hierarchy. Initially both the field position and the field data are empty, as shown on the left
below. The example on the right shows the field after it‟s been populated using UpdateField. The position
information is used later to display the corresponding image snippet to the operator during verification.
After CreateFields():
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
</F>
After UpdateField():
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">539,419,789,452</V>
<V n="STATUS">0</V>
<C cn="10" cr="543,423,565,444">77</C>
M
<C cn="10" cr="570,429,585,444">111</C>
o
<C cn="10" cr="588,429,600,444">110</C>
n
<C cn="9" cr="605,440,610,448">44</C>
<C cn="9" cr="0,0,0,0">32</C>
,
<C cn="9" cr="620,423,628,444">74</C>
J
<C cn="10" cr="630,429,643,444">97</C>
a
<C cn="10" cr="648,429,660,444">110</C>
<C cn="9" cr="0,0,0,0">32</C>
n
<C cn="10" cr="674,423,685,444">49</C>
1
<C cn="10" cr="689,423,704,444">48</C>
0
<C cn="9" cr="705,440,710,448">44</C>
<C cn="9" cr="0,0,0,0">32</C>
,
<C cn="10" cr="723,423,735,444">50</C>
2
<C cn="10" cr="739,423,754,444">48</C>
0
<C cn="10" cr="756,423,769,444">49</C>
1
<C cn="10" cr="774,423,785,444">49</C>
</F>
1
LIMITATIONS OF USING TEXT MATCHING FOR DATA RECOGNITION
While text matching provides a quick way to read data from a non-fingerprinted page it does have limitations,
most notably if the pages you are processing include checkbox options or line item grids.
There are ways around these limitations but these are beyond the scope of this guide.
246
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TEXT MATCHING
TRAVELDOCS: UPDATING THE APPLICATION TO USE TEXT MATCHING
IDENTIFYING UNRECOGNIZED PAGES USING TEXT MATCHING
In this section, we‟ll create a rule to recognize car rental pages by looking for text unique to that page type.
1. In the Datacap Studio Rulesets pane, select the PageID ruleset and click the Lock/Unlock ruleset for
editing button.
2. Expand the PageID rule.
3. Change the name of the existing function from PageID: Other Function 1 to Identify using
Fingerprint. Then change the parameter on the FindFingerprint action to False.
 Setting the parameter to False ensures that Taskmaster does not automatically generate a fingerprint
file for unrecognized pages. Note that if the current page does not match one of the existing
fingerprints, this action will fail and Taskmaster will execute the next function, if there is one.
4. Right-click the PageID rule and choose Add Function. Then rename the new function Identify using
Text Match.
5. Click the Actions library tab and add the actions shown in the table below to the Identify using Text
Match function using the Add to function
button. Then set the action parameters as shown.
Library
Action
Parameter
Locate
RegExFind
Car
Locate
RegExFind
Pickup
DCO
SetPageType
Rental_Agreement
rrunner
rrSet
varSource = Text
varTarget = @P.MatchType
 We test for “Car” and “Pickup” to avoid mismatches on optional insurance pages. rrSet sets up a
page variable we‟ll need later to identify which pages to process using text matching.
The complete function should look like the one below.
 If the “Identify using Fingerprint” function succeeds in identifying the page, “Identify using Text
Match” will not execute.
6. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
247
TEXT MATCHING
RECOGNIZING DATA USING TEXT MATCHING
Next, we‟ll add rules to the Recognize ruleset to locate the data on those rental agreement pages we identified
using text matching. We want to avoid running the rules on pages identified using fingerprint matching and
we can use the variable we set up earlier to make that distinction.
1. In the Datacap Studio Rulesets pane, select the Recognize ruleset and click the Lock/Unlock ruleset
for editing button.
2. Right-click the Recognize ruleset and chose Add Rule. Do this six times to add six new rules. Rename
the rules as follows:

Recognize Pickup Date

Recognize Return Location

Recognize Pickup Location

Recognize Car Type

Recognize Return Date

Recognize Total Cost
3. Add the actions shown in the table below to Recognize Pickup Date > Function1 and set the action
parameters as shown.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
RegExFind
Date
Locate
GoRightWord
1
Locate
GroupWordsRIGHT
2
Locate
UpdateField
4. Add the actions shown in the table below to Recognize Pickup Location > Function1 and set the
parameters as shown.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
RegExFind
Location
Locate
GoRightWord
1
Locate
GroupWordsRIGHT
2
Locate
UpdateField
5. Add the actions shown in the table below to Recognize Return Date > Function1 and set the
parameters as shown.
248
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
FindLastRegEx
Date
Locate
GoRightWord
1
Locate
GroupWordsRIGHT
2
Locate
UpdateField
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TEXT MATCHING
6. Add the actions shown in the table below to Recognize Return Location > Function1 and set the
parameters as shown.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
FindLastRegEx
Location
Locate
GoRightWord
1
Locate
GroupWordsRIGHT
2
Locate
UpdateField
7. Add the actions shown in the table below to Recognize Car Type > Function1 and set the parameters
as shown.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
RegExFind
Car Type
Locate
GoRightWord
1
Locate
GroupWordsRIGHT
2
Locate
UpdateField
8. Add the actions shown in the table below to Recognize Total Cost > Function1 and set the parameters
as shown.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Text
Locate
RegExFind
Total Cost
Locate
GoRightWord
1
Locate
UpdateField
9. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset. The new rules should look like those shown on the next page.
 Using text matching to locate and recognize OMR (checkbox) fields is beyond the scope of this
guide. When we run the ruleset in the next section, it will leave the OMR options in their default
state („0‟).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
249
TEXT MATCHING
NEW RECOGNIZE ACTIONS
ATTACHING THE RULES TO THE DOCUMENT HIERARCHY
In this section, we‟ll attach each of the new “Recognize” rules to the corresponding field on the
“Rental_Agreement” page definition.
1. In the Document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so you can see the fields on the Rental_Agreement page.
3. Select the Pickup_Date field. Then in the Rulesets pane, select the Recognize Pickup Date rule and
click the Add to DCO
button.
4. Select the Pickup_Location field. Then in the Rulesets pane, select the Recognize Pickup Location
rule and click the Add to DCO
button.
5. Select the Return_Date field. Then in the Rulesets pane, select the Recognize Return Date rule and
click the Add to DCO
button.
6. Select the Return_Location field. Then in the Rulesets pane, select the Recognize Return Location
rule and click the Add to DCO
button.
7. Select the Car_Type field. Then in the Rulesets pane, select the Recognize Car Type rule and click the
Add to DCO
button.
8. Select the Total_Cost field. Then in the Rulesets pane, select the Recognize Total Cost rule and click
the Add to DCO
button.
9. In the Document hierarchy pane, click the Save button and then click Unlock DCO.
250
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TEXT MATCHING
RUNNING A BATCH THROUGH THE WORKFLOW
We‟ve provided an image file for a new rental agreement page that we‟ll run here without fingerprinting.
1. Copy the file CarRental.tif from the sample images download location into the TravelDocs application‟s
“images” folder.
2. On the Datacap Studio Test tab, select the VScan task profile and click the New button.
3. Click the Process rules for target object  button and click Advance to move the batch through the
VScan and PageID task profiles.
4. On the Runtime batch hierarchy tab, select the first page (TM000001) and make sure the Car Rental
#4 page is displayed on the Image tab. Confirm that the page type is “Rental_Agreement.”
5. Click the Process rules for target object  button and click Advance to move the batch through the
Rulerunner task profile.
6. On the Runtime batch hierarchy tab, expand the first Rental_Agreement page to confirm that the data
was recognized correctly (except for the options).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
251
TEXT MATCHING
7. In the Workflow pane, right-click the batch (it should be in the Verify task) and choose Hold.
8. Start Taskmaster Client, select the TravelDocs application, and log on.
9. In the Taskmaster Client window, click the Show Job Monitor button. Locate the most recent batch (it
should be at the top of the list with status “hold”).
10. Double-click the batch‟s row number at the left of the row and choose Yes to execute the selected batch.
11. Review the data on the Car Rental #4 page. Then click File > Quit Task and click OK to put the batch
back on hold.
12. In Datacap Studio, in the Workflow pane, right-click the batch and choose Pick to change its status back
to “Running.” Then click the Process rules for target object  button and click Advance to move
the batch through remainder of the workflow.
252
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 17
PATTERN MATCHING
Standard fingerprint matching can compensate for minor page misalignment, but the offsets it can handle are
quite small. This means if an incoming page is misaligned relative to the fingerprint, Taskmaster may fail to
identify the page. Even if Taskmaster does identify the page successfully, recognition may fail if the fields are
not registered accurately. This is most problematic if the page contains OMR or hand print boxed data, but it
can happen with any page, especially if the image is distorted during copying, scanning, or faxing.
To address this problem, Taskmaster provides pattern matching actions you can use to identify pages and
adjust misaligned or distorted images. These actions use reference patterns, or anchor objects, that you define on
the page fingerprints, and they attempt to match those patterns to regions on the runtime pages.
This chapter covers two pattern matching techniques: one that uses geometric patterns like page registration
marks or vendor logos; the other that uses text-based patterns. At the end of the chapter, we‟ll update the
TravelDocs application to handle misaligned pages using pattern matching.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
253
PATTERN MATCHING
ABOUT PATTERN MATCHING
Pattern matching uses anchor objects that you define on the page fingerprints. These anchor objects can be
geometric patterns (like a page registration marks or vendor logos) or text-based patterns. They can act both
as identification markers used during page identification and as reference points used during registration
(realignment of the image).
Taskmaster‟s pattern matching actions analyze the runtime pages looking for geometric or text-based patterns
that match those in the page fingerprints. It‟s not unlike standard fingerprint identification, except that it uses
only a selected region of the fingerprint image. However, the major advantage is that you can use the
difference between the pattern‟s location on the fingerprint and its location on the runtime page to correct
registration problems.
Whether you use geometric pattern matching or text-based pattern matching, the basic concept is the same
and is illustrated in the example below. Here, the cross-hatched region is the anchor object. In the fingerprint,
the anchor is located 1.0 inches from the top and left edge of the page. In the scanned page, the anchor
pattern is 1.5 inches from the top and 1.4 inches from the left edge of the page.
1.0”
1.5”
1.0”
1.4”
Fingerprint image
Scanned page
A misalignment of this magnitude will almost certainly cause fingerprint matching to fail if you‟re using one
of the standard fingerprint matching techniques. Pattern matching, on the other hand, first attempts to locate
the anchor object and, if successful, computes the offsets required to bring the page back into alignment. As
such, it can handle much larger registration problems. In the example above, the image must be moved 0.4
inches to the left and 0.5 inches up, so the required image offset values are -80, -100.
 Taskmaster processes pages at an effective resolution of 200 x200 pixels per inch, so 0.4 inches is
equivalent to 80 pixels and 0.5 inches is equivalent to 100 pixels.
Additionally, if you‟re using geometric pattern matching and have multiple anchor objects, Taskmaster can
perform interpolative realignment, where field positions are adjusted based on their proximity to each of the
anchor objects.
Although geometric pattern matching and text-based pattern matching are conceptually the same, the
implementations are slightly different and use different actions.
254
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
CONSIDERATIONS FOR USING PATTERN MATCHING
WHEN IS PATTERN MATCHING A GOOD CHOICE FOR PAGE IDENTIFICATION?
If your application must handle a variety of page types that are very similar, standard fingerprint identification
(FindFingerprint) may generate mismatches. Because pattern matching uses a smaller defined region of
the page, you can select an area that‟s unique to each page type and thus avoid mismatches.
WHAT MAKES A GOOD ANCHOR PATTERN?
For pattern matching to work, you must have a good anchor pattern. If you‟re using geometric pattern
matching, the image should be composed of simple solid regions, so avoid anything with shaded areas. The
images below provide some good and bad examples.
Good
Bad
For text-based pattern matching, the only requirement is that the text pattern be unique to the page type.
WHAT TYPES OF PAGES TYPICALLY REQUIRE PATTERN MATCHING FOR IMAGE REGISTRATION?
The standard fingerprint matching action, FindFingerprint, can tolerate some minor page misalignment
relative to the fingerprint image (see “Auto registration when using the FindFingerprint action” on page 256).
Pages that have more serious alignment problems or are inconsistently proportioned require pattern matching
for registration. Faxed images are especially vulnerable because the sending and receiving fax machines may
pull the paper through at different speeds, resulting in longer or shorter images. Additionally, pages with
OMR fields require accurate registration, especially if the boxes are closely spaced.
IS THERE A WAY TO REGISTER PAGES MANUALLY?
In certain situations, you may wish to perform image registration manually. You can do this using the AIndex
web client. For details, see “Manual page identification and registration” on page 338.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
255
PATTERN MATCHING
AUTO REGISTRATION WHEN USING THE FINDFINGERPRINT ACTION
The standard fingerprint matching action, FindFingerprint, can tolerate some minor page misalignment.
When it detects a match, it automatically computes the offsets required to correct the image registration and
stores these in the page‟s “Image_Offset” variable. The ReadZones action uses these offsets when setting the
runtime field positions.
The example below shows the runtime data for a page that Taskmaster identified correctly using
FindFingerprint and registered automatically.
<P id="TM000003">
<V n="TYPE">Air_Ticket</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000003.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\problem_images_page_3.tif</V>
<V n="RecogStatus">0</V>
<V n="Confidence">0.8689438</V>
 Confidence level high enough for a match
<V n="Image_Offset">-24,-20</V>
 Offsets calculated automatically by FindFingerprint
<V n="TemplateID">566</V>
<V n="Fingerprint Created">No</V>
</P>
 Matching fingerprint ID
Note that FindFingerprint calculates the offsets without using anchor objects. However, the offsets it can
handle are quite small. The CalculateOffset action in the “Autodoc” library lets you increase the size the
offsets it can handle, but this slows down the matching process.
256
Library
Action
Description
Autodoc
CalculateOffset
Sets the maximum offset supported when matching pages to
fingerprints.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
SETTING UP ANCHOR OBJECTS
An anchor object is a region of a fingerprint image that‟s used during pattern matching. Whether you‟re using
a geometric pattern or a text-based pattern, the process is the same. Keep in mind though that if you‟re using
pattern matching to identify pages, you need to specify a pattern that‟s unique to each fingerprint.
The anchor object‟s coordinates and other information are stored in the document hierarchy, so you must
create a field-level object to store the anchor object‟s details. Aside from the anchor object‟s coordinates, the
other key element is the PatternMatch variable, which identifies the object as a pattern match anchor.
Additionally, you typically set the anchor object‟s STATUS variable to „-1‟ so the field is not displayed during
verification.
The example below shows the XML definition of an anchor field. We created this anchor as follows:

In Datacap Studio Document Hierarchy pane, we added a field object called “Anchor_Object_1” to each
page definition.

We set the object‟s PatternMatch variable to „1‟ to identify the object as an anchor object.

On the Zones tab, we drew a recognition zone around the anchor on each of the fingerprints (three in
this example).
We‟ll go through the steps later when we update the TravelDocs application.
<F type="Anchor_Object_1">
<V n="ID">0</V>
<V n="TYPE">Field</V>
<V n="STATUS">-1</V>
<V n="Position">0,0,0,0</V>
<V n="MIN_TYPES">0</V>
<V n="MAX_TYPES">0</V>
<V n="ReqConf">8</V>
<V n="rules"></V>
<V n="PatternMatch">1</V>
<V n="Pos565">183,195,276,282</V>
<V n="Pos566">636,185,730,284</V>
<V n="Pos567">1094,185,1181,279</V>
</F>
 STATUS = -1 keeps the anchor field hidden
 PatternMatch = 1 defines the object as an anchor
 Anchor object coordinates for fingerprint 565
 Anchor object coordinates for fingerprint 566
 Anchor object coordinates for fingerprint 567
The image below shows the anchor zone for fingerprint 565 with its offsets from the top and left edge of the
page. Here we‟re using a geometric pattern, but the process is the same for text-based patterns.
195
282
183
276
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
257
PATTERN MATCHING
SETTING THE REQUIRED CONFIDENCE LEVEL FOR PATTERN MATCHING
The default confidence level for pattern matching is 8. The way you change this value depends on whether
you‟re doing geometric pattern matching (using the PatternMatch_* actions) or text-based pattern
matching (using the pat_RecogMatch_Id action).
GEOMETRIC PATTERN MATCHING
You set required confidence level for geometric pattern matching in the anchor object‟s ReqConf variable.
You can change the default confidence level by right-clicking the anchor object in the Document Hierarchy
pane and choosing Manage Variables.
 PatternMatch = 1 indicates that the object is an anchor
 ReqConf indicates that the required confidence level is 8
Alternatively, you can change the ReqConf variable through the Zones tab‟s Properties pane.
TEXT-BASED PATTERN MATCHING
The pat_RecogMatch_Id action does not use the ReqConf variable. Instead, it uses the confidence level
established using the SetMatchConfidence action.
258
Library
Action
Description
PatternMatch
SetMatchConfidence
Sets the confidence threshold for pattern matching.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
USING GEOMETRIC PATTERN MATCHING
Geometric pattern matching uses graphical images, like page registration marks or vendor logos. You can use
geometric pattern matching to identify pages and correct regisstration problems.
The table below identifies some of the key actions in the “PatternMatch” library that are used for geometric
pattern matching.
Library
Action
Description
PatternMatch
PatternMatch_Identify
Identifies a page using geometric pattern matching, sets the page
type and image offsets, and creates the page data file.
PatternMatch
pat_RegisterZones
Use after running PatternMatch_Identify if you have multiple
anchors on the page. This action adjusts the positions of all fields
based on the positions of multiple anchor fields.
HOW THE PATTERNMATCH_IDENTIFY ACTION WORKS
When Taskmaster executes the PatternMatch_Identify action, it gathers all of the anchor objects from
all of the fingerprints, then looks for a match on the current page. It doesn‟t search the entire page – instead,
it searches a region 200 pixels greater in each direction than the zone defined for the anchor object. If it finds
a match that meets the required confidence level, it sets the page type and computes the offset values.
 You can change the size of the search region by setting the anchor field‟s METRIC variable. For
example, METRIC=200,300 increases the width by 200 pixels in each direction and the height by 300
pixels in each direction.
The RRS log entries below, taken from pageid_rrs.log, illustrate how PatternMatch_Identify works.
Created PatternMatch Object
Aquired PM lock
Loading Patterns...
Vendor_Logo : Pattern Found for '565' with zone (171,194,563,302)
Vendor_Logo : Pattern Found for '566' with zone (678,191,1044,296) 
Vendor_Logo : Pattern Found for '567' with zone (1187,206,1524,286
Opening 'provider=microsoft.jet.oledb.4.0;data
source=C:\Datacap\TravelDocs\TravelDocsFingerprint.mdb;persist security info=false'
Fingerprint/Rules Database connection established.
Search Image: 'c:\datacap\traveldocs\batches\20110018.013\tm000001.tif'
Matching ID# 566
ReqConf:8 --> TRUE
Conf:
10. X: 240 Y: 271.
Search Area: 478,0,1244,496  Field
Calculated offset is: (40,80)
Releasing PM lock

The PatternMatch_Identify action finds a “Vendor_Logo” pattern defined in each of three
fingerprints: 565, 566, and 567.

It matches the pattern on fingerprint 566 to a region on the current page with a confidence level of 10
(the highest possible).

The search area for fingerprint 566 is 200 pixels greater in each direction than the zone defined for the
anchor object.

It calculates the x and y offsets between the position of the anchor object and the position of the
matching pattern on the page.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
259
PATTERN MATCHING
USING MULTIPLE ANCHORS
The examples we‟ve looked at so far used a single anchor object on each page. To improve accuracy, you can
specify multiple anchor objects. For example, by defining anchor objects at the top left and bottom right of
the page, you can improve the resulting registration. PatternMatch_Identify does only a simple averaging
of the offsets, while the pat_registerZones action lets you perform interpolative registration, where field
positions are adjusted based on their proximity to each of the anchor objects.
In the example below, Taskmaster located an anchor object at the top left and the bottom right of the page
and calculated an offset value for each anchor. It then averaged these values to determine the offset required
to bring the page image into best alignment and wrote these value to the page‟s Image_Offset variable.
RRS log:
Matching ID# 570 Conf:
ReqConf:8 --> TRUE
10. X: 280 Y: 274.
Search Area: 0,0,564,513
Field
Calculated offset is: (100,100)
 Offset for first anchor object (top left)
Matching ID# 570 Conf: 10. X: 301 Y: 262. Search Area: 1158,1640,1700,2162
ReqConf:8 --> TRUE
Calculated offset is: (101,62)
Field
 Offset for second anchor object (bottom right)
Runtime page data:
<P id="TM000018">
<V n="TYPE">Test</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000018.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\refstopbottom.tif</V>
<V n="RecogStatus">0</V>
<V n="LC_Confidence">8.571231E-02</V>
<V n="LC_Image_Offset">-16,0</V>
<V n="LC_TemplateID">562</V>
<V n="Fingerprint Created">No</V>
<V n="Confidence">8.571231E-02</V>
<V n="TemplateID">570</V>
<V n="PatternConfidence">10</V>
<V n="Image_Offset">-100,-81</V>
<V n="DATAFILE">tm000018.xml</V>
</P>
 Average required offset
In addition to storing the page image offset, PatternMatch_Identify also creates a page data file and
stores the zone offset for each anchor in the field‟s Zone_Offset variable:
<V n="Zone_Offset">100,100</V>
260
 Offset for first anchor
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
USING PAT_REGISTERZONES TO ADJUST THE POSITIONS OF INDIVIDUAL FIELDS
As you‟ve just seen, PatternMatch_Identify computes the offset for the entire page and stores the value
in the page‟s Image_Offset variable.
<V n="Image_Offset">-100,-81</V>
It also creates the page data file and stores the offset for each anchor in the field‟s Zone_Offset variable.
 Offset for first anchor
<V n="Zone_Offset">100,100</V>
If you‟re using multiple anchors, you can use the pat_RegisterZones action to compute the optimum
offsets for individual fields. This lets you handle situations where the difference between the fingerprint and
the runtime image varies across the page.
Library
Action
Description
PatternMatch
pat_RegisterZones
Adjusts the positions of all fields on the current page based on the
positions of the page’s anchor fields.
 pat_RegisterZones is an alternative to ReadZones – you shouldn‟t use both on the same page.
In the example below, we‟re executing pat_RegisterZones immediately after PatternMatch_Identify.
The RRS log entries below, taken from pageid_rrs.log, illustrate how pat_RegisterZones works.
Created PatternMatch Object
Aquired PM lock
Anchor Anchor_1 found.  Looking for offset...
Expected 1384,313,1529,392
Image_Offset

Anchor Anchor_2 found.  Looking for offset...
Zone_Offset 81,100
Expected 697,1988,1006,2074
Image_Offset
Zone_Offset 102,101

Register using 2 anchors
Set Arrival_Date from Position to 411,405,713,484 to 507,506,812,585
Set Departure_Date from Position to 1154,412,1450,488 to 1246,513,1532,589
Set Total_Cost from Position to 1150,781,1331,860 to 1242,882,1417,961


The pat_RegisterZones action locates an anchor object (“Anchor_1”) on the current page.

It retrieves the zone offset for this field that was computed earlier by PatternMatch_Identify.

It locates a second anchor object (“Anchor_2”) on the current page.

It retrieves the zone offset for this field that was computed earlier by PatternMatch_Identify.

It uses the two zone offsets to compute the new positions for each of the three data fields on the
current page using an algorithm that takes into account the proximity of each field to each anchor
zone.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
261
PATTERN MATCHING
USING TEXT-BASED PATTERN MATCHING
Text-based pattern matching works much like geometric pattern matching except that the anchor objects are
text strings. You set up anchor fields in the document hierarchy and then define the anchor zones on each
fingerprint. In the example below, we‟ve set up two anchor zones for one of the hotel pages.
On the right, you can see the properties for one of the text
anchor zones:

The properties are for the field Text_Anchor_1.

PatternMatch = 1 indicates that the field is an anchor
object.

ReqConf = 8 indicates that a match requires a
confidence level of 8 or higher.

Pos572 defines the co-ordinates of the anchor zone on
fingerprint 572.
The following action supports page identification and image realignment using text-based pattern matching:
262
Library
Action
Description
PatternMatch
pat_RecogMatch_Id
Identifies a page using text_based pattern matching, and sets the
page type and image offsets. This action uses the patterns (anchor
objects) from all fingerprints in the fingerprint library.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
HOW THE PAT_RECOGMATCH_ID ACTION WORKS
When Taskmaster executes the pat_RecogMatch_Id action, it gathers all the anchor objects from the
fingerprint library and looks for a match on the current page. For each anchor object, Taskmaster searches
the current page in a region 400 pixels greater in each direction than the text zone defined in the fingerprint.
If it finds a match that meets the required confidence level, it sets the page type and computes the offset
values.
 The METRIC variable does not change the size of the search region used by pat_RecogMatch_Id.
The RRS log entries below illustrate how pat_RecogMatch_Id works.
Created PatternMatch Object
Aquired PM lock
Opening 'provider=microsoft.jet.oledb.4.0;data
source=C:\Datacap\TravelDocs\TravelDocsFingerprint.mdb;persist security info=false'
Fingerprint/Rules Database connection established.
#572, path:'C:\Datacap\TravelDocs\fingerprint\572.cco'
FPZone:'1384,313,1529,392' TXTZone:'1408,334,1498,359' Value:'Room'
FPZone:'697,1988,1006,2074' TXTZone:'758,2018,811,2036' Value:'Hotel #3'
---------------------------ANCHOR TEXT:'Room'--->'R[oO0][oO0]m'
SEARCH AREA:'Hotel #3
Room
Check out Wed Nov 24 2010
speed internet
$109 95
$329 85'
microwave
fridge

METRIC:'400,400' 

Matched Value >>Room<< 
Check FingerPrintID# 572
Match Confidence:
Search Area: 1008,0,1700,759
9.
Offset(-80,-100) 
---------------------------ANCHOR TEXT:'Hotel #3' ---> '[Z2][oO0][\(\)iItl1][oO0][\
]*H[\(\)iItl1][\(\)iItl1][\(\)iItl1][\(\)iItl1][oO0]p[\
]*H[oO0][\(\)iItl1]e[\(\)iItl1]s'
METRIC:'400,400'
SEARCH AREA:'Hotel #3' 
Matched Value >>Hotel[SPACE CHARACTER]#3<<
Check FingerPrintID# 572
Match Confidence:
Offset(-100,-101)
RecogMatch FingerPrint#:572
9.
PAGETYPE:Room_Receipt
Search Area: 358,1618,1211,2200


The action finds two anchor zones defined in fingerprint 572: “Room” and “Hotel #3”

It computes a bounding region 400 pixels greater in each direction than the region (“TXTZone”)
defined in the fingerprint CCO file for the first anchor value (“Room”).

It identifies all the text within that bounding region on the current page.

It locates the anchor value “Room” within the search region.

It computes the offset by comparing the word‟s position on the page to the position in the fingerprint.

It repeats the process for the second anchor value.

It sets the page‟s template ID and type, since at least one of the zones matched.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
263
PATTERN MATCHING
DETERMINING THE RUNTIME FIELD POSITIONS USING THE ANCHOR OFFSETS
The pat_RecogMatch_Id uses the anchor offsets to determine the Image_Offset value it writes to the
runtime page file:
<V n="Image_Offset">-100,-101</V>
The value here (-100,-101) is from the same “Hotel #3” page shown on the previous page. In this example,
you can see that pat_RecogMatch_Id used the offset value from “Text_Anchor_2.”
 pat_RecogMatch_Id does not create a page data file and does not store the individual offset for each
anchor field.
Later on, when Taskmaster executes the ReadZones action, it uses the Image_Offset value to compute the
position of each runtime field, for example:
<F id="Arrival_Date">
<V n="TYPE">Arrival_Date</V>
<V n="Position">511,506,813,585</V>
<V n="STATUS">0</V>
etc.
By comparing the field positions defined in the fingerprint (you can see these in the Properties pane) with the
field positions in the runtime page file, you can see how Taskmaster used the offset value (-100,-101) to
compute the position of each data field.
Field
Fingerprint 572
Runtime page
Arrival_Date
411,405,713,484
511,506,813,585
Departure_Date
1154,412,1450,488
1254,513,1550,589
Total_Cost
1150,781,1331,860
1250,882,1431,961
ADJUSTING THE POSITIONS OF INDIVIDUAL FIELDS BASED ON MULTIPLE ANCHORS
The pat_RegisterZones action does not work on pages identified using pat_RecogMatch_Id because
pat_RecogMatch_Id does not save the individual anchor field offsets.
264
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
TRAVELDOCS: USING GEOMETRIC PATTERN MATCHING TO IDENTIFY PAGES
SETTING UP THE PATTERN MATCH ANCHOR OBJECTS
In this section, we‟ll create pattern matching zones for each of the air ticket pages. We‟ll use the vendor logo
at the top of each page as the anchor object we‟re trying to match. Before we can define the zones, we‟ll need
to add a new field to the document hierarchy for the anchor object.
1. Click the Datacap Studio Zones tab.
2. In the Document hierarchy pane, click the Lock DCO for editing button and then expand the Flight
document.
3. Right-click the Air_Ticket page and choose Add > Field.
4. Rename the new field Vendor_Logo and then use the  button to move it to the top of the list.
5. Right-click the Vendor_Logo field and choose Anchor field. This identifies the field as an anchor object
by setting the PatternMatch variable to „1‟.
6. In the Properties pane, set the STATUS variable to „-1‟.
7. In the Document hierarchy pane, click the Save button but leave the document hierarchy locked for
editing.
8. In the Fingerprints pane, expand the Flight class and select the first air ticket page (Airline #1).
9. In the Document hierarchy pane, select the Vendor_Logo field. Then use the mouse to draw a
bounding box around the vendor logo on the page image.
10. Repeat for the remaining air ticket pages (Airline #2 and Airline #3).
11. In the Document hierarchy pane, click the Save button and then click the Unlock DCO button.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
265
PATTERN MATCHING
UPDATING THE PAGEID RULE TO USE PATTERN MATCHING
Next, we‟ll update the PageID rule to identify unrecognized pages using pattern matching. We‟ll put the
pattern matching functionality at the end so it will execute only if standard fingerprint matching and text
matching both fail (we‟ll be using a new sample image that has offsets large enough that standard fingerprint
matching will fail). Note that most applications use one page identification method – we‟re doing this for
illustration purposes only.
1. Click the Datacap Studio Rulemanager tab.
2. In the Rulesets pane, select the PageID ruleset and click the Lock/Unlock ruleset button to lock the
ruleset for editing.
3. Expand the PageID ruleset. Then right-click the PageID rule and choose Add Function.
4. Rename the new function Identify using Pattern Match.
5. Click the Actions library tab.
6. Select and add each of the actions shown in the table below to the Identify using Pattern Match
function using the Add to function
button. Then set the action parameters as shown in the table.
Library
Action
PatternMatch
PatternMatch_Identify
rrunner
rrSet
Parameter
varSource = GeometricPattern
varTarget = @P.MatchType
 rrset("GeometricPattern", "@P.MatchType") stores the string “GeometricPattern” in a
page variable called “MatchType” within the runtime hierarchy. We‟ll use this later so we can see
which pages were identified using which PageID function.
7. Add the rrSet action to the Identify using Fingerprint function and set the action parameter as shown.
Library
Action
Parameter
rrunner
rrSet
varSource = Fingerprint
varTarget = @P.MatchType
The PageID functions should look like the ones below.
8. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
266
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
PATTERN MATCHING
RUNNING A BATCH THROUGH THE WORKFLOW
Now we can test the pattern matching functionality. We‟ve provided an additional page image file for a
“Airline #2” air ticket where the image is offset by enough to cause standard fingerprint matching to fail.
1. Copy the file OffsetAirTicket.tif into the TravelDocs application‟s “images” folder. If you open the
page and compare it to the original Airline #2 ticket, you‟ll see that the image is offset by 0.5 inches in
both the x and y directions.
Original page
OffsetAirTicket.tif
2. Click the Datacap Studio Test tab.
3. In the Workflow pane, select the VScan task profile under Main Job.
4. Click the New button to start a new batch.
5. Click the Process rules for target object  button. Then wait for the VScan task profile to execute
and click Advance when it completes.
6. Click the Process rules for target object  button again. Then wait for the Page ID task profile to
execute and click Advance when it completes.
7. On the Runtime batch hierarchy tab, scroll to the bottom and make sure the last page is identified with
type Air_Ticket. Then select it and confirm that the new Airline #2 ticket is displayed on the Image tab.
8. Click the Process rules for target object  button and click Advance to move the batch through the
remainder of the workflow.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
267
PATTERN MATCHING
REVIEWING THE RUNTIME BATCH FILES
1. Open the TravelDocs application‟s most recent “batches” folder.
2. Open the file PageID.xml and scroll to the bottom to view the details for the last page.
<P id="TM000015">
<V n="TYPE">Air_Ticket</V>
<V n="STATUS">49</V>
<V n="IMAGEFILE">tm000015.tif</V>
<V n="ScanSrcPath">c:\datacap\traveldocs\images\offsetairticket.tif</V>
<V n="RecogStatus">0</V>
<V n="LC_Confidence">0.4774427</V>
<V n="LC_Image_Offset">-24,0</V>
 Confidence level from fingerprint matching
<V n="LC_TemplateID">569</V>
<V n="Fingerprint Created">No</V>
<V n="Confidence">0.4774427</V>
 Closest matching fingerprint
<V n="TemplateID">566</V>
 Matching fingerprint based on pattern matching
<V n="PatternConfidence">10</V>
 Confidence level from pattern matching
<V n="Image_Offset">-100,-100</V>
 Offset relative to fingerprint 566
<V n="MatchType">GeometricPattern</V>  Page identified using geometric pattern matching
<V n="DATAFILE">tm000015.xml</V>
</P>
In the example above you can see that the closest matching fingerprint was fingerprint 569 (one of the hotel
pages) but the confidence level is only 0.477 – not enough for a match. This caused the FindFingerprint
action to fail.
You can see the full PageID rule on the right. Since
FindFingerprint failed, Taskmaster executed the
“Identify using Text Match” function, but this failed as well
since the Airline #2 ticket does not include the word “Car.”
It then executed the “Identify using Pattern Match” function
and found a perfect match with the fingerprint 566 (the
original Airline #2 fingerprint).
Note that the match is perfect because the vendor logo is
identical on both pages – not because the data on both pages
is identical.
If you open the export text file (C:\TravelDocs\export\<batch_id>.txt), you can see that Taskmaster
managed to locate all the data fields using the offsets it calculated during pattern matching and recognized the
data successfully.
,U Airline #2,Newark, NJ (EWR),Dayton, OH (DAY),MON JAN 10, 2011,Dayton, OH
(DAY),Newark, NJ (EWR),THUR JAN 13, 2011,360.56,33.23,393.79
268
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 18
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC
FINGERPRINT GENERATION
So far we‟ve moved batches through the workflow manually by launching tasks from Taskmaster Client or
the Datacap Studio Test tab. In this chapter we‟ll introduce workflow automation using Rulerunner Quattro.
Instead of launching background tasks like PageID, Rulerunner, and Export manually, you can configure
Quattro to monitor the job queue and run these tasks automatically whenever batches are pending.
Additionally, most of our workflow processing so far has been linear, meaning we‟ve moved each batch from
task to task in the same sequence (VScan  PageID  Rulerunner etc.). The one exception was when we
diverted a batch out of the standard workflow and into the FixUp task to address document integrity
problems. In this chapter we‟ll look at conditional routing. This lets you route a batch to a specific job if it
requires some special handling. The example we‟ll use is manual page identification, which we‟ll do if the
automated identification methods fail to recognize a page.
The last new topic we‟ll cover in this chapter is automatic fingerprint generation. This is useful when you
want to let operators add new page types to the fingerprint library automatically, instead of defining them
using Datacap Studio. This way the fingerprint and recognition zones are saved and can be used next time
you need to process a page of the same type.
At the end of the chapter, we‟ll configure the TravelDocs application to use Quattro for background task
processing. We‟ll also update the application to handle unidentified pages by routing the batch to a new web
client interface called “ProtoId” for manual page identification, and we‟ll add functionality to generate
fingerprints for these pages automatically.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
269
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
USING QUATTRO TO AUTOMATE BACKGROUND TASKS
So far in this guide, we‟ve initiated each workflow task manually – typically by launching the task from
Taskmaster Client or from the Datacap Studio Test tab.
In a typical production environment, the goal is to automate batch processing to the greatest extent possible.
This means identifying “background” tasks and running them automatically.
Background tasks are tasks that don‟t require operator intervention. The TravelDocs application has three
background tasks:

PageID – identifies each page and assigns a page type

Rulerunner – combines pages into documents, and performs recognition and validation

Export – writes structured data to a data repository
Each of these tasks runs a set of rules, updates the runtime batch hierarchy, and marks the task as complete.
The Taskmaster queuing engine then readies the batch for the next task in the workflow. If the rules detect
any problems (for example, the batch failed document integrity checking), the task can raise an error
condition and you can divert the task out of the normal processing workflow for special handling.
We‟ll consider routing and exception handling later in this chapter, but for now let‟s focus on moving a batch
through the standard workflow with as little operator intervention as possible.
ABOUT RULERUNNER QUATTRO
Rulerunner Quattro is Taskmaster‟s background processing engine. You can configuring Quattro to monitor
the job queue and launch designated background tasks automatically whenever batches are pending.
For example, in the TravelDocs application, the VScan task and the Verify task are manual tasks that require
an operator. The other three tasks don‟t require any manual input, so we can run them in the background
under the control of Quattro.
Manual
tasks
Background
tasks


(V)Scan
Verify
PageID
Quattro
270
Rulerunner
Quattro
Export
Quattro
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CONFIGURING QUATTRO
Configuring Quattro is a 2-step process:

Define the tasks you want to run as background tasks in the Taskmaster Application Manager.

Configure the background tasks in the Rulerunner Quattro Manager.
We‟ll do this later for the TravelDocs application (see “Defining the background tasks in the Taskmaster
Application Manager” and “Setting up the background tasks in the Quattro Manager” starting on page 282).
For additional Quattro configuration information, refer to the IBM Datacap Taskmaster Capture Installation and
Configuration Guide.
RUNNING QUATTRO
Quattro runs in the background as a Windows service. Although you can configure Quattro to work with
multiple Taskmaster applications, we‟ll consider the case where it‟s configured for just one.
Quattro monitors the application‟s job queue looking for jobs it‟s configured to run. We‟ve used the
Taskmaster Client‟s Job Monitor window to monitor the queue, but the actual job queue is maintained in the
“queue” table in the Taskmaster “engine” database. By default, the engine database is a Microsoft Access
database located in the root of the application folder. For example, the TravelDocs application‟s engine
database is C:\Datacap\TravelDocs\TravelDocsEng.mdb. The example below shows the “queue” table in
TravelDocsEng.mdb with two jobs pending for background tasks.
 Taskmaster supports other database types, including Microsoft SQL Server and Oracle. For more
information, see the chapter “Moving Your Application into Production” starting on page 381.
Quattro polls the job queue every 10 seconds looking for pending jobs. When it finds one, it “grabs” the
batch and processes it. Upon completion, the batch is readied for the next task in the workflow. In this way
you can move batches through any number of sequential background tasks automatically.
QUATTRO LOGGING
Quattro provides extensive logging options that you configure through the Rulerunner Quattro Manager.
We‟ll be using the Quattro log later when we use Quattro with the TravelDocs application (see “Enabling
Quattro logging” on page 284 and “Examining the Quattro log” on page 285).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
271
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
USING BRANCHING AND SPLITTING TO ROUTE DOCUMENTS
So far, most of our batch processing has been done by going through the workflow linearly.
VScan
PageID
Rulerunner
Verify
Export
The one exception was when we used a branch to divert a batch out of the standard workflow and into the
FixUp task to address document integrity problems.
FixUp
VScan
PageID
Rulerunner
Verify
Export
In this section, we‟ll look in more detail at conditional branching and splitting, and look at ways to route a
batch or a portion of a batch to a separate job.
 You can‟t raise condition flags from tasks running under Taskmaster Web (the task will exit normally).
BRANCHING VERSUS SPLITTING
The two basic actions for workflow routing are branches and splits:

Branch: The entire batch is sent from the main job to a child job. When the child job completes, the
batch returns to the main job.

Split: Documents within the batch are split off from the parent batch and placed into one or more child
batches. The child batches are sent to a child job for processing and do not return to the main job.
Main
Job


Task 1

Task 2
Child
Job
Task 3
Task A


Task B



Task B
BRANCH
Parent
batch
SPLIT

Child
batches

In the diagram above, Task 1 in the main job raises a branch condition and sends the entire batch to Task
A in the child job. Task A then returns the batch to Task 2 in the main job. Task 2 in the main job raises
a split condition and creates two child batches, which are sent to Task B in the child job. The parent
batch continues to Task 3 in the main job.
272
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RAISING CONDITION FLAGS
Branching and splitting are both initiated using a condition flag raised during rule execution:

For branching, you use the Task_RaiseCondition action to raise the task‟s condition flag.

For splitting, you use the SplitBatch action, which raises the task‟s condition flag implicitly.
We‟ll look at branching and splitting separately in the sections that follow.
BRANCHING
We saw earlier, in the “Document Assembly” chapter, how to use the Task_RaiseCondition action to
raise a condition flag that determines what happens when the task profile completes. In the example below,
the “Batch Route To Fixup” function executes only if CheckAllIntegrity returns false.
 Function executes only if CheckAllIntegrity returns False
 Raise condition (group index = 0; condition index = 0)

The Task_NumberOfSplits action is required and specifies the number of jobs the batch is sent to
before returning to the main workflow (almost always 1).

The Task_RaiseCondition action specifies the group index (almost always 0) and condition‟s index
value. In the example below, the Rulerunner task has one condition so the index for this condition is 0.
Branch to FixUp job when
condition is raised
Rulerunner task has
one condition (index=0)
In this example, the batch is diverted to the Fixup job so an operator can fix the document integrity problem.
Keep in mind that the branch doesn‟t actually happen until the current task profile completes.
Later in this chapter, we‟ll update the TravelDocs application to branch when there are pages requiring
manual identification.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
273
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
SPLITTING
The SplitBatch action implements batch splitting and also raises the condition flag.
Library
Action
Description
Split
SplitBatch
Creates one or more child batches based on the value of the specified
document-level variable.
 You must run the SplitBatch action at the batch level.
Unlike branching action, you don‟t specify a condition index – the implied condition index is always 0 (the
first condition). In the example below, the Rulerunner task has three conditions defined. The SplitBatch
action always raises the first one: “Split Condition.”
The SplitBatch action uses the document-level variable specified in the action parameter to determine if a
document is split off to a child batch or remains in the parent batch. In the example below, we‟re using a
document variable called “Split.”
SplitBatch("@D.Split")
This means any document that has a variable “Split” with any value assigned is split off into a child batch.
Furthermore, the value of the “Split” variable determines which child batch the document goes into. So
documents with <V n="Split">1</V> go into child batch 1 while documents with <V n="Split">2</V>
go into child batch 2, and so on.



Parent batch for
documents with
no Split variable
Child batch for
documents with
Split = 1
Child batch for
documents with
Split = 2
 The values need not be numeric. Also, if the variable‟s value is the same for all documents, then you‟ll
get a single child batch.
At the end of this chapter, we‟ll implement splitting for the TravelDocs application to split off documents
containing pages that weren‟t recognized during page identification (see “Updating the Routing ruleset to split
the batch” on page 309).
274
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
DEFINING A CONDITION AND THE ASSOCIATED ACTION
A task can have any number of conditions associated with it, although it‟s important to remember that
splitting uses only the first condition (branching actions can use any). To define a condition:
1. Select the task on the Taskmaster Administrator Workflow tab and click Setup.
2. In the Batch Pilot Setup window, click File > Task Settings.
3. On the General tab, select the Job router option, enter the condition name in the field beneath the Add
and Remove buttons, and click Add.
4. Click OK and then click Done to close the Setup window.
 For more detailed steps, see the example under “Adding the conditional branch to the PageID task” on
page 294.
Before you can define the action associated with the condition, you must reopen the Taskmaster
Administrator window. You can then select the condition and define the action.
Define action here (see
below for details)
The following fields define what happens at the end of the task profile if the associated condition is raised:
Field
Description
Action
Available actions are:
 Branch – sends the current batch to the specified job, then returns to the main workflow
 Jump – skips the next <steps> tasks in the main workflow
 Split – used with the SplitBatch action to send a child batch to the specified job
 Stop – stops processing the batch (status = “batch stopped”)
 Hub
Child Job
Determines where the batch is sent (used for Branch, Split, and Hub only)
Parent status
The batch status when the batch returns
Child status
Steps
When used with Jump: The number of workflow tasks to jump
When used with Branch: The return point after branching (0 = same task, 1 = next task, etc.)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
275
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING JOBS TO HANDLE SPECIAL CONDITIONS
When we used branching earlier, we sent any batches with document integrity problems to the FixUp job.
The FixUp job is generated automatically by the Application Wizard, so all we had to do was configure
branching to use the existing job.
Sometimes, however, you‟ll need to create a new job to handle a specific condition. For example, later in this
chapter we‟ll update the TravelDocs application to let an operator identify pages manually. Since there‟s no
existing job to do this, we‟ll have to create one from scratch. We can then branch to the new job if the batch
contains unidentified pages and then return to the main job when done.
A new job requires at least three items:

A job definition with at least one task

A module associated with each task

A project file (.bpp or .icp) associated with each module
Workflow tab
Modules tab
In practice, you must create these items in reverse, since you can‟t create a module without a project file or a
task without a module. The sections that follow describe how to:
276

Create a project file

Create a module and associated it with a project file

Create a job and task

Associate the task with a module

Run rules from a task
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING A PROJECT FILE
A project file has either a .bpp or a .icp extension. In practice the extension doesn‟t matter, but the
convention is:

Use .bpp with tasks that use Batch Pilot forms (although you can specify Taskmaster Web forms too).

Use .icp with tasks that use only Taskmaster Web forms.
The project file includes one or more sections that define various task-specific settings. A few key sections are
described below.
Section
Description
[General]
General task settings
[Batch]
Specifies the task’s Batch Pilot setup form and the default batch level runtime form. The setup
form lets you select the module’s task profile (if there is one), configure logging, etc.
[Page]
Specifies any runtime Batch Pilot forms. For an example, see rrs_verify for the TravelDocs
application, which defines all the page-specific forms we created earlier.
[iCap]
Specifies the web page displayed when running the task from Taskmaster Web. To enable the
task for web use, you must specify Enabled=1 in this section.
[RRC]
Specifies the task profile associated with the module (if there is one), and whether the profile
runs using the local Rulerunner service (RRS) or a web Rulerunner service (WRRS)
The steps to create a project file from scratch using Batch Pilot are described later (see “Creating the
ManualPageID project file” on page 295). An example for a VScan task is shown below.
[General]
AutoMode=1
CreateDir=1
[Batch]
Types=2
Type0=SetupForm
Type0Form=..\..\bpilot\rulerun\rrs_setup.dcf
Type0Props=0
Type1=Batch
Type1Form=..\..\bpilot\rulerun\rrs_run.dcf
Type1Props=0
 Quits task automatically upon completion
 Batch creation task, so create folder
 Specifies number of types defined
 Batch Pilot setup form
 Batch Pilot runtime form (status dialog)
[Page]
Types=0
 No page-specific forms (background task)
[iCap]
Enabled=1
Page1=vscancl.aspx
 Enabled for Taskmaster Web
 Web page to use for web VScan
[RRC]
Application=TravelDocs
TProfile=VScan
HttpWRRS=http://127.0.0.1/RRS/
RRSType=LocalRRS
ExecMode=1
[Scan]
LocalProc=1
ScanDir=c:\Datacap\TravelDocs\batches
FileExt=tif
FileType=10
StartPanel=0
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
 Task runs the VScan task profile
 URL for Web Rulerunner
 Configured for local Rulerunner service
 Settings specific to vscancl.aspx web page
277
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING A MODULE
A “module” defines a project file to run and how to run it. To create a module, click the Add button on the
Taskmaster Administrator Modules tab and then enter the module definition.
Define module here (see
table below for details)
Field
Description
ID
The module name
Type
Available types are:
Normal
Batch creation – typically for scan or vscan modules only
Job router – enables job routing (conditional branching) for the corresponding task
Batch creation router – combines “Batch creation” and “Job routing”
Program name
For standard modules this is always Batch Pilot
Parameters
Specifies the project file to run – use /inet if the module can run from Taskmaster Web
Statistics table
<usually blank>
Batch ID field
<usually blank>
Taskmaster stores module definitions in the “taskmod” table of the application‟s “admin” database. For the
TravelDocs application, the admin database is C:\Datacap\TravelDocs\TravelDocsAdm.mdb. In the
example below, the numeric field corresponds to the “Type” field in Taskmaster Administrator.
278
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING A JOB AND TASK
Jobs and tasks are defined on the Taskmaster Administrator Workflow tab.

To create a job, select the workflow (there‟s usually only one – with the same name as the application)
and click Add. Then enter the job name.

To create a task, right-click the parent job and choose New > Task. Then enter the task name and
define the task.
Define task here (see table
below for details)
Field
Description
ID
The task name
Module
Select the associated module from the drop-down list
Task Monitor
<usually “Normal”>
Queue to
Defines who can process a batch that completes this task (“Anybody anywhere” is the
default, meaning there are no restrictions)
Store
<usually “Nothing”>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
279
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RUNNING RULES FROM A NEW TASK
Tasks can run rules, although they can still be useful even if they don‟t. For example, later in this chapter we‟ll
create a task that lets the operator identify pages manually. This task doesn‟t (by default) run any rules, but the
associated web page can read the application‟s document hierarchy to retrieve the available page types and
then update the runtime batch hierarchy when the operator has specified the correct type. Similarly, the
standard FixUp task doesn‟t run any rules but it too can access the document hierarchy and update the
runtime hierarchy.
Most tasks do, however, run rules. To do so, you must associated the task with a task profile (created in
Datacap Studio).
1. Select the task on the Taskmaster Administrator Workflow tab and click Setup.
All of the application’s task
profiles are listed here
2. Select the task profile to run and click Done.
280
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
GENERATING FINGERPRINTS AUTOMATICALLY
You can add functionality to your application to generate fingerprints automatically from unrecognized pages.
You can add this functionality to the Verify task so it can be done by an operator, although you may prefer to
route a document with unidentified pages to a supervisor for fingerprint generation (we‟ll implement both
methods at the end of this chapter). Either, way the fingerprint and recognition zones are saved and are used
the next time you encounter a page of the same type.
The basic steps for generating fingerprints automatically are:

Identify the page type, either manually or using a text-based identification technique.

Display the page to an operator and have the operator define the recognition zones.

Use the CreateFingerprint action to create a new fingerprint file from the current page image.

Use the SetFingerprint action to set the class and type for the new fingerprint.

Use the iloc_SetZones action to add the recognition zone position information to the document
hierarchy.
The new actions are described in the table below.
Library
Action
Description
AutoDoc
CreateFingerprint
Creates a fingerprint from the current page. The fingerprint consists of
two files: the image (TIF) file and the fingerprint processing (CCO) file.
AutoDoc
SetFingerprint
Sets the new fingerprint's class and type.
Intellocate
iloc_SetZones
Writes the recognition zone coordinates from the current page data file
to the Pos properties of the corresponding field objects in the document
hierarchy XML file.
We‟ll go through the steps in detail when we configure the TravelDocs application to generate fingerprints
automatically (see “Creating the AutoFingerprint ruleset” on page 303).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
281
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
TRAVELDOCS: AUTOMATING BACKGROUND PROCESSING USING QUATTRO
In this section, we‟ll use Quattro to process background tasks. The steps we‟ll follow are:

Define the application‟s background tasks in the Taskmaster Application Manager.

Set up the background tasks in the Rulerunner Quattro Manager.

Enable Quattro logging.

Configure the Job Monitor so we can monitor a batch.

Start Quattro and run a batch through the workflow.

Examine the Quattro log file to review the sequence of events.
DEFINING THE BACKGROUND TASKS IN THE TASKMASTER APPLICATION MANAGER
1. Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Application Manager.
2. Select the TravelDocs application and click the Quattro tab.
3. Use the Add new task button to create four task profiles – one for each background task:
Task
Task profile
PageID
PageID
CreateDocs
CreateDocs
Rulerunner
Rulerunner
Export
Export
 The CreateDocs task doesn‟t exist yet. We‟ll be creating it later in this chapter, so we‟ll define it now
to save us having to go back into the Taskmaster Application Manager later.
4. Close the Taskmaster Application Manager.
282
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
SETTING UP THE BACKGROUND TASKS IN THE QUATTRO MANAGER
1. Click Start > All Programs > Datacap > Taskmaster Client > Rulerunner Quattro Manager.
2. Click the Datacap Login tab.
3. Select Taskmaster Authentication and then enter User ID: admin and Station ID: 1.
 You must log on using an encrypted password. The steps that follow show you how to create one.
4. Using Windows Explorer, open the folder C:\Datacap\support\Sit and then double-click
SitManager.exe.
5. Under String to Encrypt, type admin and click Encrypt. The encrypted password is displayed below.
6. Copy (Ctrl+c) the encrypted password from the Sit Manager and paste (Ctrl+v) it into the Password
field on the Datacap Login tab in the Rulerunner Quattro Manager.
7. Click Connect. If you get a message indicating that the Quattro configuration file doesn‟t exist, click OK.
8. Click the Workflow: Job: Task tab and select the top level TravelDocs item. Then under Main Job,
select Page ID, Rulerunner, and Export (see screen shot below).
9. Right-click inside the (empty) right pane and choose Threads > Add Thread.
10. Drag the TravelDocs application‟s Main Job from the left pane onto <thread0> in the right pane.
11. Click Save and then click Yes to create the configuration file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
283
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
ENABLING QUATTRO LOGGING
1. In the Rulerunner Quattro Manager window, click the Logging tab and then click the Quattro Log tab
at the bottom of the window.
2. Select the Output to option and leave the target folder set to C:\Datacap.
3. Click Save.
4. Click the Datacap Login tab and click Disconnect.
 You must be connected to Taskmaster Server in order to configure Quattro. When you‟ve finished
configuring Quattro, you should disconnect before starting the Quattro service.
SETTING UP THE JOB MONITOR
1. If Taskmaster Client is not already started:

Click Start > All Programs > Datacap > Taskmaster Client > Taskmaster Client.

Select TravelDocs, click OK, and log in using User ID: admin, Password: admin, and Station: 1.
2. Click the Show Job Monitor button.
3. With the Job Monitor window selected, click Record > Change update timeout.
4. Set the timeout interval to 5 seconds and click OK. This will enable you to monitor the status of the
batch when you start the Quattro service.
RUNNING A BATCH THROUGH THE WORKFLOW
1. In the Taskmaster Client window, double-click the VScan icon. When the task to completes, click Stop.
2. Arrange the windows so the Job Monitor and the Rulerunner Quattro Manager are both visible.
3. In the Rulerunner Quattro Manager window, click the Datacap Quattro tab and click Start.
4. Watch the status of the batch in the Job Monitor. The Task and Status fields should go through the
following sequence, although you may not see each step due to the 5 second refresh interval and
Quattro‟s 10 second polling interval.
Task:
Status:
PageID
pending  running

Rulerunner
pending  running

Verify
pending
The batch is now ready for verification. Since verification is a manual step, you‟ll need to complete this
task before Quattro can run the Export task.
5. Verify the batch as you did before using DotEdit or Taskmaster Web. You can‟t use Batch Pilot since the
default configuration won‟t let you complete a batch that has validation errors.

When you reach the page with the invalid car type, set the type to Other.

If a validation failure message is displayed, click OK to override and continue.

When you reach the end, click OK to finish the batch.
6. Switch to the Job Monitor window and watch the status of the batch as Quattro runs the Export task.
Task:
Status:
284
Export
pending  running  Job done
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
EXAMINING THE QUATTRO LOG
Quattro generates a separate log file for each active thread. The base product licensing allows single threading
only and the log file is C:\Datacap\quattro0.log. Excerpts from the Quattro log illustrating the sequence of
events are shown below.
ExecuteCode: Grabbed Job[Main Job]:Task[PageID] Queue Id:[42].  Grab batch for PageID
RunGrabbedQNRelease: BatchID [C:\Datacap\TravelDocs\batches\20110067.005].
RunGrabbedQNRelease: Project Path set
to:[C:\Datacap\TravelDocs\dco_TravelDocs\rrs_assemble.bpp]
RunGrabbedQNRelease: Selected pagefile [rrsvscan.xml].
 Read input DCO file
RunGrabbedQNRelease: Read page file
[C:\Datacap\TravelDocs\batches\20110067.005\PageID.xml].
 Create output DCO file
RunGrabbedQNRelease: RRS run successful.
RunGrabbedQNRelease: ProcessedPages in Batch:[15]
RunGrabbedQNRelease: Job[Main Job]:Task[PageID] queue id: [42] ran batch[20110067.005]
and the status is [finished].
 PageID complete
ReleaseTheQ: Released batch, status is now [pending].
 Batch pending for Rulerunner
ExecuteCode: Grabbed Job[Main Job]:Task[Rulerunner] Queue Id:[42]. Grab batch for Rulerunner
RunGrabbedQNRelease: BatchID [C:\Datacap\TravelDocs\batches\20110067.005].
RunGrabbedQNRelease: Project Path set
to:[C:\Datacap\TravelDocs\dco_TravelDocs\rrs_rulerun.bpp]
RunGrabbedQNRelease: Selected pagefile [PageID.xml].
 Read input DCO file
RunGrabbedQNRelease: Read page file
[C:\Datacap\TravelDocs\batches\20110067.005\Rulerunner.xml].
 Create output DCO file
RunGrabbedQNRelease: RRS run successful.
RunGrabbedQNRelease: ProcessedPages in Batch:[15]
RunGrabbedQNRelease: Job[Main Job]:Task[Rulerunner] queue id: [42] ran
batch[20110067.005] and the status is [finished].
 Rulerunner complete
ReleaseTheQ: Released batch, status is now [pending].
 Batch pending for Verify
ExecuteCode: No batches to process, sleeping for [10] seconds.
ExecuteCode: No batches to process, sleeping for [10] seconds.  Monitoring queue during
ExecuteCode: No batches to process, sleeping for [10] seconds.
ExecuteCode: No batches to process, sleeping for [10] seconds.
manual Verify task
ExecuteCode: Grabbed Job[Main Job]:Task[Export] Queue Id:[42].  Grab batch for Export
RunGrabbedQNRelease: BatchID [C:\Datacap\TravelDocs\batches\20110067.005].
RunGrabbedQNRelease: Project Path set
to:[C:\Datacap\TravelDocs\dco_TravelDocs\rrs_Export.bpp]
RunGrabbedQNRelease: Selected pagefile [Verify.xml].
 Read input DCO file
RunGrabbedQNRelease: Read page file
[C:\Datacap\TravelDocs\batches\20110067.005\Export.xml].
 Create output DCO file
RunGrabbedQNRelease: RRS run successful.
RunGrabbedQNRelease: ProcessedPages in Batch:[15]
RunGrabbedQNRelease: Job[Main Job]:Task[Export] queue id: [42] ran batch[20110067.005]
and the status is [finished].
 Export complete
ReleaseTheQ: Released batch, status is now [Job done].
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
 Batch at end of workflow
285
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
DISABLING QUATTRO LOGGING
Quattro writes to the log file every 10 seconds while it‟s running, so you should only enable logging at specific
times for troubleshooting.
1.
In the Rulerunner Quattro Manager window, click the Datacap Quattro tab and click Stop. Then wait
for the Quattro service to stop. If you get a timeout message, click OK.
2. Click the Datacap Login tab and click Connect.
3. Click the Logging tab and click the Quattro Log tab at the bottom of the window.
4. Deselect the Output to option and click Save.
5. Click the Datacap Login tab and click Disconnect.
6. Click the Datacap Quattro tab and click Start to restart the Quattro service.
286
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
TRAVELDOCS: IMPLEMENTING ROUTING TO HANDLE DOCUMENT INTEGRITY FAILURES
In the “Document Assembly” chapter earlier in this guide, we saw how the TravelDocs application completes
recognition and validation before branching to the FixUp job. This wasn‟t a problem at the time since we
hadn‟t implemented recognition and validation. Now that we‟ve implemented recognition and validation,
some pages will have Status = 1, indicating low confidence values or validation errors. The Batch Pilot
FixUp task won‟t let you finish a job when there are pages with Status = 1.
To resolve this problem, we‟ll move document creation and integrity checking out of the Rulerunner task
profile and into their own task profile. This way we can correct any batch integrity problems before
performing recognition and validation.
MOVING DOCUMENT CREATION AND INTEGRITY CHECKING INTO THE PAGEID TASK PROFILE
1. Start Datacap Studio and open the TravelDocs application.
2. On the Rulemanager tab, in the Task Profiles pane, click the Unlock task profiles button.
3. Expand the Ruleruner task profile and delete the CreateDocs and Document Integrity rulesets using
the Remove button.
4. Click the Add a new task profile
button, select Custom, type CreateDocs, and click OK.
5. Select the new CreateDocs task profile and add the CreateDocs and Document Integrity rulesets by
selecting them in the Rulesets pane and clicking the Add ruleset to profile button at the left of the Task
Profiles pane.
6. Click the Save button and then click the Lock task profiles button. The task profiles should look like
those below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
287
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING THE NEW CREATEDOCS TASK AND MODULE
 Since the existing Rulerunner task and the associated module contain the job routing functionality we‟ll
need for the new CreateDocs task, we‟ll copy and modify the existing task and module. You‟ll get a
chance to create a task and module from scratch later when we implement manual page identification.
1. In the Taskmaster Client window, click the Administrator
Administrator window.
button to open the Taskmaster
2. On the Modules tab, select rrsRulerunner and click Copy. Then type rrsCreateDocs and click OK.
3. In Windows Explorer, navigate to C:\Datacap\TravelDocs\dco_TravelDocs.
4. Make a copy of the file rrs_rulerun.bpp and call it rrs_createdocs.bpp.
5. In the Taskmaster Administrator window, select the new rrsCreateDocs module.
6. Click the Parameters field and then click the Browse […] button beside the existing value.
7. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_createdocs.bpp and click Open. The click
Apply to save the changes.
8. On the Workflow tab, expand Main Job, select Rulerunner, and click Copy. Then type CreateDocs
and click OK.
9. Select the new CreateDocs task and press Ctrl+ (up arrow) to move the task between the PageID task
and the Rulerunner task.
10. With the CreateDocs task selected, click the down-arrow beside the Module field and choose
rrsCreateDocs. Then change the Description field to Create documents and click Apply.
11. With the CreateDocs task selected, click Setup.
12. In the CreateDocs Setup window, select the CreateDocs task profile and click Done (you may need to
expand the window vertically to see the button). Click Yes to save the changes and then click Apply.
13. Click the Shortcuts tab and click Add to create a new shortcut.
14. In the ShortcutID field, type CreateDocs, click Apply, and then click OK.
15. Under Batch Selection Mode, select Manual for Hold only. Then scroll down and under Main Job
select CreateDocs. Click Apply and then click OK.
16. Click Done to close the Taskmaster Administrator window.
288
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CONFIGURING QUATTRO TO RUN CREATEDOCS
Since the CreateDocs is a background task, we‟ll configure Quattro to run it automatically whenever a batch is
pending.
1. Open the Rulerunner Quattro Manager window.
2. If Quattro is running, click Stop and wait for the service to stop.
3. Click the Datacap Login tab and click Connect.
4. Click the Workflow: Job: Task tab and select  the TravelDocs application.
5. Under Main Job, select  the CreateDocs task and drag it to <thread0>.
6. Click Save.
7. Click the Datacap Login tab and click Disconnect.
8. Click the Datacap Quattro tab and click Start to restart the service with the new settings.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
289
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RUNNING A BATCH THROUGH THE WORKFLOW
In order to introduce a document integrity problem, we‟ll add an optional insurance page to the end of the
batch, as we did before. This orphan page will cause document integrity checking to raise an error condition
and Taskmaster will route the batch to the FixUp task. Once you‟ve fixed the document integrity problem by
deleting the page, processing can continue as normal.
 The goal here is to demonstrate Taskmasters routing capabilities, so deleting the problem page is
acceptable. Typically, the proper corrective action is specified in the business requirements.



VScan
FixUp
Verify
PageID
CreateDocs
Rulerunner
Export
Quattro
Quattro
Quattro
Quattro
1. Open C:\Datacap\TravelDocs\images.
2. Delete the files CarRental.tif and OffsetAirTicket.tif (the files we used for text and pattern matching).
3. Make a copy of Images_Page_02.tif (the first optional insurance page) and name the copy
Images_Page_14.tif. This will create an orphaned insurance page.
4. In the Rulerunner Quattro Manager window, make sure the Quattro service is running.
5. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
6. If the Job Monitor window isn‟t already open, click the Job Monitor
button to display the job queue.
7. Wait until Quattro picks up the pending batch, finishes it, passes it to the CreateDocs task, and finishes it.
The CreateDocs task raises a condition flag and routes the batch to the FixUp task (a manual task that
Quattro can‟t process).
8. Double-click the row number
290
for the first entry (FixUp).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
9. Click Yes to execute the selected batch. The batch opens in the Batch Pilot FixUp window. The last
document is selected and the Comments field indicates that the document has an invalid member (the
orphaned insurance page).
10. Select the page TM000014 and click Delete. Then click OK to confirm. Batch Pilot deletes the page and
the parent document.
11. Expand the Batch Pilot window if necessary so the Finish button is visible.
12. Click Finish and then click OK in the “Task Finished” message box.
The FixUp task now has status “Job done” and the batch is now pending for the Rulerunner task.
13. Wait until Quattro picks up the pending batch and processes it. When Quattro completes, the batch is
pending for the Verify task.
14. Open the pending batch in DotEdit or Taskmaster Web and submit it as before. Then switch to the Job
Monitor window and watch the status of the batch as Quattro runs the Export task.
15. Delete the file Images_Page_14.tif from the “images” folder so we don‟t run into this problem again.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
291
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
TRAVELDOCS: ADDING ROUTING TO ENABLE MANUAL PAGE IDENTIFICATION
In this section we‟ll implement another conditional branch, this time to handle the situation where we need to
identify pages manually. Currently, the application implements three page identification techniques:
fingerprint matching, text matching, and pattern matching. Here, we‟ll add another function to the PageID
ruleset to handle pages that are still unidentified. If a batch includes unidentified pages, we‟ll raise a condition
flag and send the batch to the ManualPageID task, where the operator can set the page type manually.



VScan
ManualPageID
Verify
PageID
CreateDocs
Rulerunner
Export
Quattro
Quattro
Quattro
Quattro
ADDING A FUNCTION FOR MANUAL PAGE IDENTIFICATION
1. In the Datacap Studio Rulesets pane, select the PageID ruleset and click the Lock/Unlock ruleset
button. Then expand the PageID ruleset to view the two rules.
2. Right-click the PageID rule and choose Add Function. Rename the new function Identify Manually.
3. Add the actions and parameters shown in the table below to the PageID > Identify Manually function.
Library
Action
Parameter
rrunner
rrSet
varSource = Manual
varTarget = @P.MatchType
DCO
SetPageStatus
1
rrunner
Task_NumberOfSplits
1
rrunner
Task_RaiseCondition
0,0
 Task_RaiseCondition(0,0) references the first condition (index = 0) in the PageID task. We‟ll
add the condition in the next section.
The PageID rule should look like the one below.
4. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
292
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
UPDATING THE RECOGNIZE PAGE RULESET
The Recognize ruleset isn‟t set up to handle pages that are identified manually so we‟ll need to make some
changes. Currently the ruleset works for pages identified using fingerprint, text, or pattern matching:

For pages identified using fingerprint or pattern matching, we use ReadZone and SnapCCOtoDCO to
write the recognition data into the runtime hierarchy.

For rental agreement pages identified using text matching, we use various text-based matching actions to
locate the recognition data and write it to the runtime hierarchy.
These actions work with
fingerprint zones and the full
page recognition results
These rules locate data on the
page using text-based actions
Taskmaster executes the Recognize Page rule on all page types, even though it only works when the current
page has a valid matching fingerprint. With manual page identification, there is no matching fingerprint,
although when fingerprint matching fails Taskmaster writes the ID of the closest match into the page‟s
LC_TemplateID variable. The ReadZones action uses the recognition zones from this “low confidence”
fingerprint even though they aren‟t actually valid. To get around this problem, we‟ll add an action to the
Recognize Page rule so it exits if the identification method is manual. Later, when we open the page for
verification, we‟ll define the recognition zones manually.
1. In the Rulesets pane, select the Recognize ruleset and click the Lock/Unlock ruleset button. Then
expand the Recognize ruleset and the Recognize Page rule.
2. Add the following action and parameters to beginning of Recognize Page > Recognition: Page
Function 1.
Library
Action
Parameter
rrunner
rrCompareNot
object1 = @P.MatchType
object2 = Manual
The finished rule should look like the one below.
If a page has match type “Manual” the function will exit without executing ReadZones.
3. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
293
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
ADDING THE CONDITIONAL BRANCH TO THE PAGEID TASK
Now that the rules to handle manual page identification are complete, we can configure the PageID task to
branch to the manual page identification task if there are unidentified pages.
1. In the Taskmaster Client window, click the Administrator
Administrator window.
button to open the Taskmaster
2. On the Modules tab, select rrsAssemble (the module associated with the PageID task).
3. In the Type field, open the drop-down list and choose Job router.
4. Click the Values field beside the Parameters label and then click the Browse […] button beside the
default value (rrs_assemble.bpp).
5. Select C:\Datacap\TravelDocs\dco_TravelDocs\rrs_assemble.bpp and click Open.
 When using job routing, the project file must be specified using its full path.
6. Click Apply and then click Done.
7. Click the Administrator
button to reopen the Taskmaster Administrator window.
8. On the Workflow tab, expand Main Job, select PageID, and click Setup.
9. In the PageID Setup window, click File > Task Settings.
10. On the General tab, select the Job router option, type Page Identification Failed in the field beneath
the Add and Remove buttons, and click Add.
Enter condition name
here and click Add.
11. Click OK and then click Done to close the PageID Setup window (you may need to expand the window
vertically to see the button).
12. In the Taskmaster Administrator window, click Apply and then click Done.
So far we‟ve only created the condition node for the PageID task. Before we can configure the branch, we
need to create the job we‟ll use for manual page identification. To do this we must complete the steps
identified earlier (see “Creating jobs to handle special conditions” on page 276). We‟ll do this next.
294
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING THE MANUALPAGEID PROJECT FILE
1. Click Start > All Programs > Datacap > Batch Pilot > Batch Pilot.
2. In the Batch Pilot window, click File > New Project.
3. Select C:\Datacap\TravelDocs\dco_TravelDocs\TravelDocs.xml and click Open.
4. In the Batch View pane, right-click Setup Form and choose Pick Form.
5. Select the file C:\Datacap\BPilot\RuleRun\rrs_setup.dcf and click Open.
6. In the Batch View pane, right-click TravelDocs and choose Pick Form.
7. Select the file C:\Datacap\BPilot\RuleRun\rrs_run.dcf and click Open.
8. Click File > Save Project As.
9. Navigate to the C:\Datacap\TravelDocs\dco_TravelDocs folder and save the file as
ManualPageID.icp.
 Since this will be a web-only task, save the file with the extension .icp.
10. Close Batch Pilot.
11. Open the file C:\Datacap\TravelDocs\dco_TravelDocs\ManualPageID.icp in a text editor (for example,
Notepad).
12. Edit the [iCap] section so it looks like the version below.
[iCap]
Enabled=1
Page1=ProtoId.aspx
 Taskmaster includes a web page called “ProtoId.aspx” that we‟ll use to identify pages manually. It‟s
important that you assign this web page to the “Page1” setting and not the “Page” setting. For
more information on the ProtoId web client, see “Manual page identification and batch
restructuring using ProtoId” on page 345.
13. Save and close the file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
295
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CREATING THE MANUALPAGEID JOB AND TASK
1. In the Taskmaster Client window, click the Administrator
Administrator window.
button to reopen the Taskmaster
2. Click the Modules tab and click Add.
3. In the Task Module section, enter the following values:
ID:
ManualPageID
Type:
Normal
Program name:
Batch Pilot
Parameters:
/inet ManualPageID.icp
 The /inet prefix indicates that this module can run via Taskmaster Web.
4. Click Apply.
5. Click the Workflow tab. Then select the main TravelDocs node and click Add to create a new job node.
6. Name the new job node ManualPageID Job and press Enter. Click Yes to save the changes.
7. Right-click the ManualPageID Job and choose New > Task.
8. Name the new task node ManualPageID and press Enter. Click Yes to save the changes.
9. With the new task selected, click the down arrow beside the Module field and choose ManualPageID.
10. Click Apply but leave the Taskmaster Administrator window open.
296
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CONFIGURING BRANCHING AND CREATING A SHORTCUT
1. On the Workflow tab, expand Main Job, expand the PageID task, and select the Page Identification
Failed condition.
2. Configure the values as follows:
Action: Branch
Parent Status: Pending
Child Job: ManualPageID Job
Child Status: Pending
Steps: 1
3. Click Apply.
4. Click the Shortcuts tab and click Add.
5. In the ShortcutID field, type ManualPageID, click Apply, and then click OK.
6. Under Batch Selection Mode, select Manual for Hold only.
7. Scroll down and under ManualPageID Job select  ManualPageID.
8. Click Apply and then click OK.
9. Click Done to close the Taskmaster Administrator window.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
297
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
CONFIGURING THE ROUTING RULESET TO HANDLE MANUALLY IDENTIFIED PAGES
Earlier in this guide, we configured the application to display only pages with Status = 1 to the operator.
Since manually identified pages will have no recognition data, there will be no low confidence characters to
set the page status to 1. Depending on the way your validation rules are constructed, you may also have no
validation errors.
To ensure that manually identified pages are displayed to an operator, we‟ll force the page status for these
pages to 1. We‟ll do this in the Routing ruleset, which runs immediately after validation.
1. In the Datacap Studio Rulesets pane, select the Routing ruleset and click the Lock/Unlock ruleset
button. Then expand the ruleset to view the rule.
2. Right-click the Routing Rule 1 rule and choose Add Function. Rename the new function Set Manual
Page Status.
3. Use the  button to move the new function to the beginning of the rule.
4. Add the action and parameters shown below to the Routing Rule 1 > Set Manual Page Status
function.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Manual
DCO
SetPageStatus
1
The finished rule should look like the one below.
 Make sure “Set Manual Page Status” is the first function in the rule.
5. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset.
298
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RUNNING A BATCH THROUGH THE WORKFLOW
We‟ve provided an image file for a new air ticket page that we‟ll use to test manual page identification. Since
this page will fail fingerprint matching, text matching, and pattern matching, Taskmaster will execute the
“Identify Manually” function. This in turn will cause the batch to branch to the Manual Page ID task. Once
you‟ve identified the page manually, processing can continue as normal.
1. Copy the file NewAirline.tif from the sample images download location into the TravelDocs
application‟s “images” folder.
2. Make sure the file Images_Page_14.tif is no longer in the “images” folder. Delete it if necessary.
3. Make sure Quattro is still running. If necessary, click Start on the Datacap Quattro tab in the Rulerunner
Quattro Manager window.
4. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
5. If the Job Monitor window isn‟t already open, click the Job Monitor
button to display the job queue.
6. Wait until Quattro picks up the pending batch and processes it. The PageID task raises a condition flag
and routes the batch to the ManualPageID task.
7. We don‟t have a Batch Pilot user interface for the ManualPageID task so you can‟t run the batch from
the Job Monitor. Instead, start Taskmaster Web (http://localhost/tmweb.net) and log on to the
TravelDocs application.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
299
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
8. On the Taskmaster Web Operations tab, click the ManualPageID shortcut and wait for the page images
to load. Then scroll to the bottom of the page.
 For information on other features in the ProtoId web client, see “Manual page identification and
batch restructuring using ProtoId” on page 345.
9. Click the drop-down list beneath the last page and choose Air_Ticket.
10. Click Done and then click OK and Stop.
11. Switch back to the Taskmaster Client Job Monitor window.
The ManualPageID task now has status “Job done” and the batch is now pending for the CreateDocs
task.
12. Wait until Quattro picks up the pending batch and runs it through CreateDocs and Rulerunner. When
Quattro completes, the batch is pending for the Verify task.
300
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RECOGNIZING THE DATA ON THE UNIDENTIFIED PAGE
Since the Airline #4 page isn‟t associated with a valid fingerprint and since we didn‟t create any text-based
rules for air ticket pages, the new page will have no recognition data. We can recognize the data during
verification. Here we‟ll use Taskmaster Web, although you can use any verification client.
1. If necessary, start Taskmaster Web and log in to the TravelDocs application.
2. Click the Verify/FixUp shortcut to open the pending batch.
3. Go through the batch as before until you reach the Airline #4 page.
4. Click the Outbound_From field. Then use the mouse to draw a bounding box around the field in the
image pane, as shown below left. When you release the mouse button, the web client inserts the
recognition data into the grid.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
301
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
5. Repeat for the other fields on the page.
6. Click the Verify button (or press Alt+v). You should see a message indicating that the validations passed.
7. Click Submit to submit the page, and then click OK to finish the batch.
8. Click OK and then click Stop.
9. Switch to the Job Monitor window and watch the status of the batch as Quattro runs the Export task.
10. When the export task completes, open the most recent text (.txt) file in the
C:\Datacap\TravelDocs\export folder to see the information for the Airline #4 page.
,,Okron/Canton, OH (CAK),Washington, DC (DCA),14MAR11,Washington, DC
(DCA),Okron/Canton, OH (CAK),17MAR11,313.17,64.56,377.73
302
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
TRAVELDOCS: GENERATING FINGERPRINTS AUTOMATICALLY
In the last section, we configured the TravelDocs application to route unidentified pages to the “ProtoId”
web page for manual identification. During verification, we drew bounding zones around each field to obtain
the recognition data, but we didn‟t create a new fingerprint or save the recognition zones.
Here, we‟ll update the application to generate fingerprints for unrecognized pages. When we‟re done, we‟ll
run the same batch through the workflow, but this time the application will add a new fingerprint to the
fingerprint library and the recognition zones to the document hierarchy. Next time the application encounters
an Airline #4 page, it can use the new fingerprint to do automatic page identification and field recognition.
CREATING THE AUTOFINGERPRINT RULESET
1. In the Datacap Studio Rulesets pane, right-click the TraveDocs application and choose Add Ruleset.
2. Rename the new ruleset AutoFingerprint and rename the default rule from Rule1 to Create New
Fingerprint.
3. Select Function1 and add the actions and parameters in the table below.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object2 = Manual
AutoDoc
SetFingerprintDir
@APPPATH(fingerprint)
AutoDoc
CreateFingerprint
AutoDoc
SetFingerprint
Intellocate
iloc_SetZones
@D.TYPE,@P.TYPE
 The parameter on the SetFingerprint action sets the fingerprint class to the current document
type and the fingerprint type to the current page type. Make sure “TYPE” is upper case.
4. Right-click the Create New Fingerprint rule and choose Add Function.
5. Select Function2 and add the following action.
Library
Action
rrunner
Status_Preserve_OFF
Parameter
 The purpose of this function is to ensure that the rule returns True and thus avoid triggering a
validation error.
6. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset. The finished ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
303
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
ASSIGNING THE RULE TO EACH PAGE TYPE
1. In the Document Hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so you can see all page types.
3. Select the Rental_Agreement page type.
4. In the Rulesets pane, select the Create New Fingerprint rule and click Add to DCO.
5. In the Document Hierarchy pane, select the Optional_Insurance page type and then click Add to
DCO.
6. Repeat to add the Create New Fingerprint rule to the Air_Ticket, Room_Receipt, Meals, and
Other_Charges pages.
7. In the Document Hierarchy pane, click the Save button and then click the Unlock DCO button.
ADDING THE RULESET TO THE VERIFY TASK PROFILE
1. In the Rulesets pane, select the AutoFingerprint ruleset.
2. Click the Task profiles tab and then click the Lock/Unlock task profiles button.
3. Select the Verify task profile and then click the Add ruleset to profile button.
4. Expand the Verify task profile and confirm that it looks like the one below.
5. Click the Save button and then click the Lock/Unlock task profiles button.
304
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
ENABLING LOGGING FOR TASKMASTER WEB
Before we run a batch through the workflow, we‟ll enable logging for the Verify task. Since we‟ll be running
the Verify task‟s rules from Taskmaster Web, we must enable the Rulerunner Service logging in the task‟s
project file (not through the Batch Pilot Setup window).
 When you run the Verify task, the log file is saved as verify_rrs.log in the current batch folder.
1. Open the file C:\Datacap\TravelDocs\dco_TravelDocs\rrs_verify.bpp in a text editor (for
example, Notepad).
2. Locate the [RRC] section and set ServiceLog=3.
3. Save and close the file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
305
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RUNNING A BATCH THROUGH THE WORKFLOW
1. Run a batch through the workflow as you did in the previous section (see “Running a batch through the
workflow” on page 299), but stop when the batch is pending for verification.
2. Start Taskmaster Web and log in to the TravelDocs application.
3. Click the Verify/FixUp shortcut to open the pending batch.
4. Go through the batch as before until you reach the Airline #4 page.
5. Click the Outbound_From field. Then use the mouse to draw a bounding box around the field in the
image pane, as shown below left. When you release the mouse button, the web client inserts the
recognition data into the grid.
306
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
6. Repeat for the other fields on the page.
7. Click Submit. In the background, Taskmaster runs the AutoFingerprint ruleset to create the new
fingerprint file and add the zone information to the document hierarchy. Then click OK and Stop.
8. In Datacap Studio, click the Zones tab and then click the Refresh
button.
9. Expand the first Flight class (the SetFingerprint action creates a new class even though there‟s already
one called “Flight”) and select the new fingerprint.
10. Unlock the document hierarchy and select one of the Air_Ticket fields to activate the zones in the Image
View pane.
11. When you‟ve reviewed the zones in the Image View pane, lock the document hierarchy.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
307
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
REVIEWING THE RRS LOG FILE
If your application failed to create the new fingerprint with the zone information, check the log file for details.
1. Open the current batch folder and open the file verify_rrs.log.
2. Scroll to the bottom of the file to see the log entries for the AutoFingerprint ruleset.
ruleset name="AutoFingerprint" id="14" target object="P" target id="TM000014"
target type="Air_Ticket"
dco open tag="P" id="TM000014" type="Air_Ticket"
rule "Create New Fingerprint"
func "Function1"
action rrCompare ("@P.MatchType","Manual")
action returned true
 Match type is "Manual"
/action
action SetFingerprintDir (false,false,"@APPPATH(fingerprint)")
load rrx code: "c:\datacap\rrs\autodoc.rrx"
/load
action returned true
 Fingerprint directory established
/action
action CreateFingerprint (false,false)
action returned true
 Successfully created new fingerprint
/action
action SetFingerprint (false,false,"@D.TYPE,@P.TYPE")
action returned true
 Set the fingerprint class and page type
/action
action iloc_SetZones (false,false)
load rrx code: "c:\datacap\rrs\intellocate.rrx"
/load
action returned true
/action
func result: "true"
/func
rule result: "true"
/rule
/ruleset
c:\datacap\RRS\Logs\wrrs
end log to batch
308
 Successfully saved the new zone information
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
TRAVELDOCS: SPLITTING A DOCUMENT FROM THE MAIN BATCH
In the previous section, we let the verification operator create new fingerprints for pages that were identified
manually. In this section, we‟ll split manually identified pages from the main batch and send them to a
supervisor for fingerprint creation.
Child batch with manually identified pages

VScan
Manual
PageID


PageID
CreateDocs

Supervisor
Verify

Supervisor
Export
Rulerunner

Verify

BRANCH
Export

SPLIT
Main batch
UPDATING THE ROUTING RULESET TO SPLIT THE BATCH
We‟ll use the existing Routing ruleset to split the batch using a document-level variable that we‟ll create for
this purpose. We‟ll set this variable for any document that contains a manually identified page. The Routing
ruleset runs at the end of the Rulerunner task profile, after recognition and validation have completed.
1. In the Datacap Studio Rulesets pane, select the Routing ruleset and click Lock/Unlock ruleset.
2. Expand the Routing ruleset, Routing Rule 1, and the Set Manual Page Status function.
3. Select the Set Manual Page Status function and add the following action and parameters to the end of
the function.
Library
Action
Parameter
rrunner
rrSet
varSource = Yes
varTarget = @D.Split
 This assigns the value “Yes”to a document-level variable called “Split,” creating it if necessary.
4. Right-click the Routing ruleset and choose Add Rule. Rename the new rule Batch Splitting.
5. Expand the Batch Splitting rule, select Function1, and add the following action and parameter.
Library
Action
Parameter
Split
SplitBatch
@D.Split
 This action raises the condition flag and creates a child batch containing any documents with the
document-level variable “Split.” In this case, the only possible value of the variable is “Yes,” so all
documents are sent to the same child batch.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
309
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
6. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish Ruleset. This finished ruleset should look like the one below.
ASSIGNING THE BATCH SPLITTING RULE TO THE BATCH’S “CLOSE” ELEMENT
The SplitBatch action must run at the batch level, but it depends on the status of the “Split” variable that
in this case is created by a page-level rule in the same ruleset. In the earlier section describing the order of rule
execution, we learned that batch-level rules typically run before page-level rules (see “Order of rule execution”
on page 76). However, we also saw how you can attach a rule to an element‟s “Close” element to cause a rule
to run after Taskmaster has finished processing all lower level objects. That‟s what we‟ll do here.
1. In the Document Hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so you can the batch‟s “Close” element.
 Batch’s “Close” element
3. Select the batch‟s “Close” element.
4. In the Rulesets pane, select the Batch Splitting rule and click Add to DCO.
5. In the Document Hierarchy pane, click the Save button and then click the Unlock DCO button.
310
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
ROUTING THE SPLIT DOCUMENT TO A SUPERVISOR
Before we can configure the split condition, we need to create the supervisor job to handle the child batch.
Then we can configure the job router and create the shortcuts for the supervisor job.
CREATING THE SUPERVISOR JOB
1. In the Taskmaster Client window, click the Administrator
Administrator window.
button to open the Taskmaster
2. On the Workflow tab, select the main TravelDocs node and click Add to create a new job node.
3. Name the new job node Supervisor Job and press Enter. Click Yes to save the changes.
4. Right-click the Supervisor Job and choose New > Task.
5. Name the new task node Verify and press Enter. Click Yes to reuse the existing Verify task and then
click Yes to save the changes.
6. Right-click the Supervisor Job again and choose New > Task.
7. Name the new task node Export and press Enter. Click Yes to reuse the existing Export task and then
click Yes to save the changes. The new job should like the one below.
CONFIGURING THE JOB ROUTER
1. On the Workflow tab, expand Main Job, select the Rulerunner task and click Setup.
2. In the Rulerunner Setup window, click File > Task Settings.
3. Select the existing Document Integrity Failed condition and click Remove.
 We moved this functionality to the CreateDocs task earlier, so it‟s no longer needed here.
4. In the box under the Add and Remove buttons, type Split Condition in the field beneath the Add and
Remove buttons, and click Add.
5. Click OK to close the Settings dialog, click Done to close the Rulerunner Setup window, click Apply,
and then click Done to close the Taskmaster Administrator window.
6. Click the Administrator
button to reopen the Taskmaster Administrator window, expand Main
Job, expand the Rulerunner task, and select the Split Condition node.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
311
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
7. Configure the values as follows:
Action: Split
Parent Status: Pending
Child Job: Supervisor Job
Child Status: Pending
Steps: 1
8. Click Apply. The task configuration should look like the one below.
CONFIGURING THE SUPERVISOR SHORTCUTS
1. Click the Shortcuts tab and click Add to create a new shortcut.
2. In the ShortcutID field, type Supervisor Verify, click Apply, and then click OK.
3. Under Batch Selection Mode, select Manual for Hold only.
4. Scroll down and under Supervisor Job select  Verify.
5. Click Apply and then click OK.
6. Click Add to create a new shortcut.
7. In the ShortcutID field, type Supervisor Export, click Apply, and then click OK.
8. Under Batch Selection Mode, select Manual for Hold only.
9. Scroll down and under Supervisor Job select  Export.
10. Click Apply and then click OK.
11. Click Done to close the Taskmaster Administrator window.
312
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
WORKFLOW AUTOMATION, ROUTING, AND AUTOMATIC FINGERPRINT GENERATION
RUNNING A BATCH THROUGH THE WORKFLOW
Before we run a batch through the workflow we‟ll need to delete the new Airline #4 fingerprint so we can
process it again as an unrecognized page.
 Quattro in Taskmaster Capture 8.0.1 doesn‟t set up child batches correctly for subsequent tasks, so we
need to stop Quattro for this exercise. Taskmaster Capture 8.0.1 Fix Pack 1 fixes this problem.
1. In the Rulerunner Quattro Manager window, click Stop and wait for the service to stop. Then click OK.
2. In Datacap Studio, click the Zones tab.
3. Expand the first Flight class and select the Airline #4 fingerprint.
4. Check the Image View pane and confirm that the Airline #4 fingerprint is selected and then click the
Remove selected button.
5. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
6. Double-click the PageID icon. When the task completes, click Stop.
7. Start Taskmaster Web and log in to the TravelDocs application.
8. On the Taskmaster Web Operations tab, click the ManualPageID shortcut and wait for the page images
to load. Then scroll to the bottom and set the page type for the last page to Air_Ticket.
9. Click Done and then click OK and Stop.
10. In the Taskmaster Client window, double-click the CreateDocs shortcut. When it completes, click Stop.
11. Double-click the Rulerunner shortcut. When the task completes, click Stop.
12. Check the Job Monitor. You should see the result of the split, where the child job is pending for the
Supervisor Verify task (row 1 below) and the main job is pending for the Main Verify task (row 3 below).
13. On the Taskmaster Web Operations tab, click the Supervisor Verify shortcut to open the pending batch.
The Airline #4 page is displayed.
14. Define the zone for each field as you did in the previous section (see “Running a batch through the
workflow” on page 306).
15. Click Submit and then click OK. In the background, Taskmaster runs the AutoFingerprint ruleset to
create the new fingerprint file and add the zone information to the document hierarchy. Then click OK
and Stop.
16. Check the Job Monitor. You should see the child job pending for the Supervisor Export task and the
main job still pending for the Main Verify task.
 If you run another batch through the workflow, it will run from end to end with no branching or
splitting since the Airline #4 page is now in the fingerprint library. If you want to run another batch
with branching and splitting, delete the Airline #4 fingerprint from the Datacap Studio Zones tab.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
313
Chapter 19
TASKMASTER WEB AND REMOTE SCANNING
We introduced the Taskmaster Web verification client earlier in this guide and the manual page identification
client in the last chapter. In this chapter, we‟ll look at some of the other Taskmaster Web components,
including the remote scanning client, other verification clients, and the Taskmaster Web administration
interface.
At the end of the chapter, we‟ll update the TravelDocs application using Taskmaster Web Administrator and
run a batch through the entire workflow using a combination of web components and Quattro.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
315
TASKMASTER WEB AND REMOTE SCANNING
MOVING THE WORKFLOW TO TASKMASTER WEB
The batch processing workflow we‟ve developed so far has been a mix of thick client-based tasks, web-based
tasks, and Quattro-driven background tasks.




ManualPageID
FixUp
Verify
Batch
Pilot
Batch
Pilot
Web
VScan
Batch Pilot, Web,
or DotEdit
PageID
CreateDocs
Rulerunner
Quattro
Quattro
Quattro
Export
Quattro
Additionally, we‟ve done all of our application administration using Taskmaster Client (the “thick” client).
Taskmaster includes web components that let you run and administer most of the workflow from a web
browser, including those listed in the table below.
Function
Web page
Remote scanning and image upload
scancl.aspx and
uplbfcl.aspx
Virtual scanning and image upload
scancl.aspx and
uplbfcl.aspx
Verification
prelayout.aspx
averify.aspx
imgEnter.aspx
Verification, manual page identification, and manual registration
aindex.aspx
Manual page identification and fixup
ProtoID.aspx
Application administration
Standard tmweb.net
interface
Job monitoring
Standard tmweb.net
interface
We‟ll look at these components in the sections that follow.
316
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
REMOTE SCANNING
In the earlier chapter on “Document Input,” we configured an ISIS scanner for use with Taskmaster Client
but touched only briefly on remote scanning. In this section, we‟ll look at remote scanning in more detail.
 Taskmaster Web‟s remote scanning clients support TWAIN scanners only.
The Taskmaster Web remote scanning clients (scancl.aspx and scanid.aspx) and the related upload client
(uplbfcl.aspx) let operators scan and upload batches from remote web clients. Once queued on the server,
Taskmaster processes the batch like any other job.
Taskmaster
Server
PageID
CreateDocs
etc.
 batches
Scanner
Taskmaster
Web Client
Taskmaster
Web Client
Scanner
Taskmaster supports two remote scanning options:

Scan the images from the web client directly into the application‟s “batches” folder (LocalProc=1). This
requires that you share the “batches” folder and give write permission to all client machines.
LocalProc=1
Scanner

Taskmaster
Web Client
 Scan and write
Taskmaster
Server
 batches
Scan the images to a local folder on the web client and then upload them to the application‟s “batches”
folder (LocalProc=0). Although Taskmaster Web initially stores the image files locally, it creates the
runtime batch file in the server‟s “batches” folder. It can do this without you having to enable sharing on
the “batches” folder.
LocalProc=0
 Scan and write
Scanner
Taskmaster
Web Client
 scan
 Upload
Taskmaster
Server
 batches
You configure the LocalProc setting and the image storage folder in the [Scan] section of the remote scan
client‟s project file, as described in the next section. We‟ll configure the remote scan task and the upload task
later for the TravelDocs application (see “Creating a remote scan task” on page 350 and “Configuring the
Upload task” on page 352).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
317
TASKMASTER WEB AND REMOTE SCANNING
USING THE REMOTE SCANNING CLIENT (SCANCL.ASPX)
The web page for the remote scanning client (scancl.aspx) is shown below.
The web page includes the following controls:
Suppressed certain warning messages.
Increases the display size of all page thumbnails.
Decreases the display size of all page thumbnails.
Rotates the selected page clockwise by 90 degrees.
Alter scanner
settings
Selecting this enables you to change the scanner settings through the TWAIN driver’s
native user interface after clicking Scan; otherwise the scanner uses the settings in the
project file (see “Configuring the remote scanning client” on page 319).
Use feeder
Select this option to use the scanner’s document feeder.
Expected pages
Specifies the maximum number of pages to import when scanning using the feeder.
Initiates scanning or displays the TWAIN user interface – see “Alter scanner settings.”
Moves the selected page up the page order.
Scans additional pages and inserts them before the selected page.
Removes the selected page.
Removes all pages.
Moves the selected page down the page order.
Scan into:
318
Not used (see ScanDir setting on next page).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING THE REMOTE SCANNING CLIENT
The default application framework includes an “rScan” project file (rScan.icp). Key sections from the default
rScan.icp are shown below.
[General]
CreateDir=1
 Creates a subfolder for the new batch (required)
TaskDCOFile=scan.xml
 Name of runtime DCO file (saved in application’s “batches” folder)
[iCap]
Enabled=1
 Web component enabled
Page=scancl.asp
 Not used (applies to earlier versions of Taskmaster)
Page1=scancl.aspx
 Web page used
Hold=1
 Not used
[Scan]
LocalProc=0
 0 if using the Upload task; 1 if scanning directly to a shared “batches” folder
ScanDir=c:\datacap\scan
 Local folder for scanned images if LocalProc=0; shared folder if LocalProc=1
Extension=tif
 Extension for scanned images (tif, jpg, pic, bmp)
Type=0
 Extension for scanned (0=B&W; 1=Grayscale; 2=RGB)
Bits=1
 Bit depth for above image type
Resolution=200
 Image resolution (DPI)
Compression=4
 Compression type -- for B&W images only:
ResaveType=-1
 Compression type for grayscale or color image using other than JPEG:
ResaveExt=tif
 Extension for images compressed with ResaveType if ResaveType <> -1
HoldEnabled=1
 Display the “Hold” button on web page? (0=No; 1=Yes)
StartPanel=0
 Enable start panel? (0=No; 1=Yes)
0=uncompressed; 1=RLE; 2=Group 3 Fax; 3=Group 3-2D; Fax 4=Group 4 Fax
-1 = no recompression; 40 = TIFF JPEG
The scanner settings are passed through to the TWAIN driver when you start scanning. If the settings aren‟t
compatible with each other or your scanner, you may get an error during scanning.
The default project file is configured to save the image files locally in C:\Datacap\scan and to use the Upload
task. If you want to scan directly to the application‟s “batches” folder, you must share the folder and provide
write access to all remote clients, and then change the [Scan] settings as follows:
[Scan]
LocalProc=1
ScanDir=\\<server>\<shared_folder>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
319
TASKMASTER WEB AND REMOTE SCANNING
IMPLEMENTING A START PANEL
A start panel prompts the operator to enter data before the remote scan page is displayed. You can use it to
capture any information specific to the batch that you wish to collect, for example, date, operator name, etc.

To enable a start panel, set StartPanel=1 in the [Scan] section of the remote scan client‟s project file:
[Scan]
StartPanel=1
The start panel displays a data entry field for each batch level field that‟s defined in the document hierarchy.
For example, if you want to capture the name of the person performing the scan, you need to create a batch
level field for this purpose, for example:
 Batch level field
Taskmaster stores the data in a batch level field in the runtime batch hierarchy, as shown in this example:
<B id="20110154.008">
<V n="TYPE">TravelDocs</V>
<V n="STATUS">73</V>
<V n="ScanOperator">admin</V>
<V n="ScanStation">1</V>
<D id="20110154.008.01">
<V n="TYPE"></V>
<V n="STATUS">0</V>
<P id="TM000001">
<V n="imagePath">c:\datacap\scan\20110154.008\tm000001.tif</V>
<V n="TYPE">Other</V>
<V n="STATUS">49</V>
<V n="ScanSrcPath">C:\Datacap\TravelDocs\images\Images_Page_01.tif</V>
</P>
etc.
</D>
<F id="Name">
<V n="TYPE">Name</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">0</V>
<C cn="10" cr="0,0,0,0">72</C>
'H'
<C cn="10" cr="0,0,0,0">101</C>
'e'
<C cn="10" cr="0,0,0,0">110</C>
'n'
<C cn="10" cr="0,0,0,0">100</C>
'd'
 Batch level field data
<C cn="10" cr="0,0,0,0">101</C>
'e'
<C cn="10" cr="0,0,0,0">114</C>
'r'
<C cn="10" cr="0,0,0,0">115</C>
's'
<C cn="10" cr="0,0,0,0">111</C>
'o'
<C cn="10" cr="0,0,0,0">110</C>
'n'
</F>
</B>
320
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
POPULATING DROP-DOWN LISTS
As with verification panels, you can use dictionaries or SELECT variables to populate start panel fields. For
example, if you create a dictionary of operator names and associate it with the batch level field, the operator
can select his or her name from a drop-down list:
<F type="Name">  Batch level field
<V n="ID">0</V>
<V n="TYPE">Field</V>
<V n="STATUS">0</V>
<V n="Position">0,0,0,0</V>
<V n="DICT">Operators</V>  Dictionary of operator names
</F>
Alternatively, you can use the field‟s SELECT variable to populate the drop-down list from a database. For
example, the SELECT value below gets a list of operator names from the application‟s lookup database:
<SQL flist='Name' dsn="*/lookupdb:cs">SELECT Operator FROM Operators</SQL>
RUNNING VALIDATION RULES
You can run validation rules on start panel fields. To do this, you must add an [RRC] section to the remote
scan client‟s project file and specify the name of a task profile containing the rules you want to run, for
example:
[RRC]
Application=TravelDocs
TProfile=ValidateStartPanel
HttpWRRS=http://127.0.0.1/RRS/
RRSType=LocalRRS
ExecMode=1
BatchLog=1
ServiceLog=3
 Task profile that executes when user clicks Submit
In this example, Taskmaster executes the ValidateStartPanel task profile when the user clicks the Submit
button in the start panel. To implement the validation rules, you need to:
1. Create a task profile with an associated ruleset.
2. Create a validation rule for each start panel field you wish to validate.
3. Bind each rule to the associated batch level field.
For example, the rule below validates the operator name using a database lookup.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
321
TASKMASTER WEB AND REMOTE SCANNING
USING THE REMOTE SCANNING/PAGE ID CLIENT (SCANID.APSX)
The web page for the remote scanning and page ID client (scanid.aspx) is the same as the standard remote
scanning page except it adds an “Image Type” list box for manually identifying pages .
List of available page
types comes from the
document hierarchy
During scanning, pages are assigned the default type “Other.” The operator can then override the default type
by selecting the correct type from the list before completing the batch.
To use the “scanid” client, change the Page1= setting in the [iCap] section of rScan.icp:
[iCap]
Enabled=1
 Web component enabled
Page1=scanid.aspx
 Web page for remote scanning and page identification
The rest of the settings in this file are the same as for scancl.aspx.
322
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
REMOTE VIRTUAL SCANNING
The Taskmaster Web virtual scanning client (vscancl.aspx) is very similar to the remote scanning clients, but
instead of scanning files it imports them from a local image folder. You then upload the batches to integrate
them into the regular workflow.
Taskmaster
Server
 batches
Local image
folder

Taskmaster
Web Client
PageID
CreateDocs
Taskmaster
Web Client

etc.
Local image
folder
As with the remote scanning client, the remote virtual scanning client provides two upload options:

Scan the images from the web client directly into the application‟s “batches” folder (LocalProc=1).

Scan the images to a local folder on the web client and then upload them to the application‟s “batches”
folder (LocalProc=0).
You configure the LocalProc setting and the image storage folder in the [Scan] section of the remote scan
client‟s project file. The default project file, vscan.icp, is shown below.
[iCap]
Enabled=1
 Web component enabled
Page=vscancl.asp
 Not used (applies to earlier versions of Taskmaster)
Page1=vscancl.aspx
 Web page used
Hold=1
 Not used
[General]
CreateDir=1
 Creates a subfolder for the new batch (required)
[Scan]
LocalProc=0
 0 if using the Upload task; 1 if scanning directly to a shared “batches” folder
ScanDir=c:\datacap\scan
 Local folder for scanned images if LocalProc=0; shared folder if LocalProc=1
;For jpg: FileExt=jpg, FileType=13
;For tif: FileExt=tif, FileType=10
FileExt=tif
 For jpg, FileExt=jpg; for tif, FileExt=tif
FileType=10
 For jpg, FileType=13; for tif, FileType=10
HoldEnabled=1
 Display the “Hold” button on web page? (0=No; 1=Yes)
StartPanel=0
 Enable start panel? (0=No; 1=Yes)
 The remote VScan client doesn‟t use the TaskDCOFile setting – it always names the runtime batch file
scan.xml.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
323
TASKMASTER WEB AND REMOTE SCANNING
VERIFICATION USING THE PRELAYOUT WEB CLIENT
The “Prelayout” web verification client (prelayout.aspx) generates verification pages automatically based on
the document hierarchy. It can also generate custom verification pages using predefined “static” layouts (see
“Configuring the page layouts” on page 325). The screen below shows an auto-layout. For an example of a
custom “static” layout, see the 1040EZ sample application.
In addition to the image view and the data entry panel, a Prelayout page includes a toolbar and a batch tree
view that you can use to split or join documents, reorder pages, and mark documents or pages for deletion.
Submits the current page.
Image view
Displays the previous page in the batch.
Toolbar
Data entry
panel
Batch
tree
view
Displays the next page in the batch.
Puts the batch on hold.
Select to display only problem fields.
Select to display an outline of each field in the image view.
Runs the Verify task profile (typically validation rules).
You can add buttons to run other task profiles using the
AltTProfile configuration setting (see “Configuring the
[RRC] settings” on page 326).
324
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
USING THE BATCH TREE VIEW TO RESTRUCTURE THE BATCH
The batch tree view displays the type and status of each document and page within the batch. It also lets you
restructure the batch.
Splits the current document so the selected page becomes the first page in a new document.
Joins the current document and the previous document.
Marks the selected document or page for deletion. Documents are assigned the first IPS value; pages are
assigned the second IPS value (see “Configuring the page and field status settings” on page 326).
Moves the selected page up within the current document.
Moves the selected page down within the current document.
CONFIGURING THE PRELAYOUT CLIENT
You configure the Prelayout client in the [iCap] section of the module‟s project (.bpp or .icp) file. Basic
seeting are shown in the example below. Additional setting are described later under “Configuring additional
Prelayout settings” on page 327.
[iCap]
Enabled=1
 0 to disable web client for this module; 1 to enable web client
Page=verify.asp
 Not used (applies to earlier versions of Taskmaster)
Page1=prelayout.aspx  Name of web page
Static=1
 0 for automatic panel layout; 1 to use a custom “static” layout
Types=1
 Number of static page layouts
 Page type for first static layout
Type0=Page
Src0=layout.ascx
 Custom static layout for first page type
IPS=72,74
 “Ignored page statuses”
DPS=0,2
 “Done page statuses”
IFS=-1
 “Ignored field statuses”
DFS=0
 “Done field statuses”
DOF=0,2,1
 “Done, override, fail statuses”
See “Configuring
the page layouts”
below
See “Configuring
the page status
values” below
CONFIGURING THE PAGE LAYOUTS
By default, Prelayout uses a generic 2-column layout that‟s generated automatically based on the document
hierarchy for the current page. You specify the generic layout by setting Static=0 in the project file. The
generic layout is also used by page types that don‟t have a static layout defined in the project file.
The [iCap] section lets you define a static layout for each page type. For example, if your application has
two page types, set Types=2 and then specify Type0 and Src0 for the first page, and Type1 and Src1 for
the second page.
 Although the default [iCap] settings shown in the example above specify the use of a static page, it
applies only to pages of type “Page” so is not used, unless you define a page type with this name.
For information on creating custom static pages for use with Prelayout.aspx, see “Creating and using custom
pages” on page 328.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
325
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING THE PAGE AND FIELD STATUS SETTINGS
The page and field status settings (IPS, DPS, etc.) control how Prelayout handles pages and fields with the
specified status values (see the STATUS variable in the “Standard Variable Reference” in Appendix B of this
guide for a list of commonly-used status values). The various settings are described in the table below.
Setting
Description
IPS
(Ignored Page
Statuses)
Used to determine which pages to ignore. Taskmaster does not display a page if it has one of
the specified status values. Additionally, the first value is assigned to any document you mark
for deletion in Prelayout’s tree view pane; the second value is assigned to any page you mark
for deletion. Do not includes any DOF or DPS value in the IPS list.
For example, if you specify IPS=72,74, Taskmaster does not display pages with STATUS=72
or 74. Additionally, Taskmaster assigns STATUS=72 to documents marked for deletion and
STATUS=74 to pages marked for deletion.
DPS
(Done Page
Statuses)
Used to determine when a batch is complete. When all pages have one of the specified
values, Taskmaster displays “All documents are complete. Click OK to finish the batch.”
Otherwise, you can only put the batch on hold.
For example, if you specify DPS=0,2, the batch is complete when all pages have STATUS=0
(OK) or STATUS=2 (validation failure overridden by the operator).
IFS
(Ignored Field
Statuses)
Used to determine which fields to hide. Taskmaster does not display a field if has one of the
specified status values.
DFS
(Done Field
Statuses)
Used to determine which fields to hide when the “Problems only” checkbox is selected.
Taskmaster does not display any field with one of the specified status values.
DOF
(Done, Override,
Fail)
Specifies the status value that gets assigned to the current page after validation:
For example, if you specify IFS=-1, Taskmaster does not display fields with STATUS=-1
(hidden).
For example, if you specify DFS=0 and the “Problems only” checkbox is selected, Taskmaster
does not display fields with STATUS=0 (OK) .
 The first value is assigned when validation passes (“Done”)
 The second value is assigned when the operator overrides a validation error (“Override”)
 The third value is assigned when validation fails and override is not used (“Fail”)
For example, DOF=0,2,1 specifies Done status = 0; Override status = 2; Fail status = 1.
The AIndex web client uses the same settings. For an example of how to use them, see “Updating
ManualPageID.icp” on page 362.
CONFIGURING THE [RRC] SETTINGS
The [RRC] section of the project file lets you configure Rulerunner and logging if you‟re using Prelayout with
a task profile that runs rules, which is typically the case for verification. For information on the [RRC]
settings, see “Mapping a task to a task profile” on page 48 and “Enabling logging for Taskmaster Web tasks”
on page 182. Other optional Prelayout-specific settings in the [RRC]section are described below.
326
Setting
Description
PreTProfile
Lets you specify the name of a task profile that gets executed immediately before the page is
displayed. You might, for example, use this to copy some values from higher level objects.
AltTProfile
A comma-separated list of additional task profiles that you can run from the web page.
Prelayout displays a new button for each task profile listed here in place of the default
“Verify” button. For an example, see the APT application.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING ADDITIONAL PRELAYOUT SETTINGS
The additional settings described in this section are all specified in the [iCap] section.
CUSTOMIZING THE FIELD BACKGROUND COLORS
Setting
Description
LC
Specifies the background color* for low confidence fields (default LC=yellow)
PC
Specifies the background color* for problem fields (default PC=pink)
DC
Specifies the background color* for fields that are OK (“done”) (default DC=white)
* X11 color names are supported.
PRELOADING PAGES AND IMAGES
Setting
Description
LoadDoc
When set to 1, this tells the task to preload all page XML files for the current document into
temporary folder. This is useful if you have cross-page validations that need to access
multiple pages within a document. There is a performance penalty for this option.
LoadImages
By default, all images in a document are preloaded when you open a document
(LoadImages=1). Setting this to 0 prevents preloading and is useful if you have large
documents and don’t want to spend time preloading all the images before the first page is
displayed. Enabling this option may delay the display of subsequent pages.
CONFIGURING THE BATCH TREE VIEW
Setting
Description
TreeVars
By default, the batch tree view displays two variables for each document and page: the type
and the status. If you want to display additional variables, use this setting to specify the total
number of variables. For example, set TreeVars=3 to include one additional variable.
Var<num>Name
Use this setting to specify each variable. For example, if TreeVars=3:
Var0Name=TYPE
Var1Name=STATUS
Var2Name=MyCustomVar
SUPPORTING MULTI-PASS VERIFICATION
Taskmaster applications can display the same page to multiple operators to ensure accurate data entry and
verification. The Prelayout client provides limited support for multi-pass verification that includes two-pass
verification, whereas the AIndex client provides full support for multi-pass verification, including double-blind.
The settings the Prelayout client supports are shown below. For implementation details, please refer to
“Multi-pass verification” in the AIndex section of this chapter on page 334.
Setting
Description
AltText
Specifies which “alt text” value is the primary text (defaults to AltText*0+).
AltCompare
Used to implement two-pass verification (see the “Example of two-pass data entry” on
page 336).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
327
TASKMASTER WEB AND REMOTE SCANNING
CREATING AND USING CUSTOM PAGES
The Prelayout client generates a default verification panel for each page type automatically. It maps each of
the fields in the document hierarchy into a 2-column table as shown in the example below.
Column 1

Column 2

1
2
3
4
5
6
7
1
2
3
4
5
6
7
Prelayout lets you replace the standard verification panel with a custom (“static”) panel by:
1. Creating a custom panel layout in Batch Pilot.
2. Exporting the default panel layout as a web user control (.ascx) file.
3. Modifying and relocating the .ascx file.
4. Specifying the custom panel in the task‟s bpp/icp file.
We‟ll go through each of these steps in the sections that follow.
CREATING A CUSTOM LAYOUT IN BATCH PILOT
The first step is to create the custom page layout in Batch Pilot. You‟ll need to create a .dcf file for each page
type. To do this, follow the instructions provided earlier in this guide (see “Using Batch Pilot for verification”
on page 151). Save the .dcf files in the application‟s C:\Datacap\<app_name>\dco_<app_name>\verify folder.
328
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
EXPORTING THE PANEL LAYOUT
 You‟ll need to repeat these steps for each custom page layout.
1. Start two instances of Batch Pilot (Start > All Programs > Datacap > Batch Pilot > Batch Pilot).
2. Arrange the two Batch Pilot windows side-by-side. We‟ll refer to them as Batch Pilot 1 and Batch Pilot 2.
Batch Pilot 1
Batch Pilot 2
3. In Batch Pilot 1, click File > Open Form and open C:\Datacap\tmweb.net\Task\support\verify.dcf.
4. In Batch Pilot 2, click File > Open Form and open the .dcf file for the first custom page. The .dcf files
are located in the C:\Datacap\<app_name>\dco_<app_name>\verify folder.
5. In Batch Pilot 1, click Form > View Code. Then select (General) and ProcessFrame from the dropdown lists at the top of the code view pane.
6. In Batch Pilot 2, click Form > View Code. Then, if necessary, select UserForm and Click. This is
usually the default selection and defines the event handler that runs when you click the form background.
7. In Batch Pilot 2, paste the following code between Sub and End Sub for the “Click” event handler
Set f=fso.CreateTextFile("c:\custom_layout.ascx", True)
Set pParentFrame=UserForm
sOut=""
Call ProcessFrame(pParentFrame, sOut)
f.Write(sOut)
f.Close()
8. In Batch Piilot 2, select (General) and (Declarations), and then click the Full Module View button
(highlighted in red below).
9. Scroll to the end of the Declarations section, where it says 'End of Declarations.
10. Place the cursor at the end of the line 'End of Declarations and press Enter to create a new line.
11. In Batch Pilot 1, copy the line Sub ProcessFrame(pParentFrame, sOut).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
329
TASKMASTER WEB AND REMOTE SCANNING
12. Paste the line into the empty line you created in Batch Pilot 2 and press Enter to create a subroutine shell:
Sub ProcessFrame(pParentFrame, sOut)
End Sub
13. In Batch Pilot 1, copy the lines between Sub ProcessFrame and End Sub. There are approximately 93
lines.
Sub ProcessFrame(pParentFrame, sOut)
sTag=""
.
.
.
Next
End Sub
 Copy these lines
14. Paste the code into the subroutine shell in Batch Pilot 2.
15. In Batch Pilot 2, click Form > Run Script to exit code view.
16. Click the form background. This executes the “Click” event handler and exports the panel layout.
17. Confirm that the panel layout file, custom_layout.aspx, was generated. It will be in one of the folders
listed below, depending on the version of Windows you‟re using:

C:\ (the root of the C: drive)

C:\Users\<username>\AppData\Local\VirtualStore
18. Close the Batch Pilot 2 window. When asked if you want to save the changes, consider the following:
330

If you save the changes, the export script will run each time a user clicks on the form background in
Batch Pilot.

If you‟re certain no operators will be using Batch Pilot for verification, you can save the changes.
This means if you modify the custom layout .dcf file, you can quickly regenerate the .ascx file by
clicking the form background.

If operators may be using Batch Pilot for verification, do not save the changes. In this case, you‟ll
need to recreate the ProcessFrame subroutine and the “Click” event handler if you change the
custom layout and need to regenerate the custom layout file. Alternatively, you can disable the
“Click” event handler before saving the file and then re-enable it if you modify the custom layout so
you can export the new layout.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
MODIFYING AND RELOCATING THE .ASCX FILE
 You‟ll need to repeat these steps for each custom page layout.
1. Locate the export panel layout file, custom_layout.aspx. It will be in one of the folders listed below,
depending on the version of Windows you‟re using:

C:\ (the root of the C: drive)

C:\Users\<username>\AppData\Local\VirtualStore
2. Rename the file to match the page type (for example, Rental_Agreement.aspx).
3. Open the .ascx file in a text editor and paste the four lines shown below at the beginning of the file.
<%@ Register TagPrefix="sk" Namespace="PACU"%>
<%@ Register TagPrefix="sk" TagName="DcSnip" Src="~/Task/dcsnip.ascx"%>
<%@ Register TagPrefix="sk" TagName="DcEdit" Src="~/Task/dcedit.ascx"%>
<%@ control language="C#" autoeventwireup="true" inherits="PACU.Task_layout,
App_Web_layout.ascx.2a9bbddb" %>
4. Save and close the file.
5. Move the .ascx file into the C:\Datacap\tmweb.net\Task folder.
SPECIFYING THE CUSTOM PANELS IN THE BPP/ICP FILE
 You can mix custom page layouts with default page layouts. For example, if your application has 10
page types but you create only two custom layouts, specify just the two custom layouts as described
here and Prelayout will use the default layout for the remaining page types.
1. Open the verify task‟s .bpp or .icp file in a text editor.
2. In the [iCap] section, set Static=1 and Types=<n>, where <n> is the number of page types with
custom layouts, for example:
[iCap]
Enabled=1
Page1=prelayout.aspx
Static=1
 1 = use static panels
Types=2
 Specifies the number of pages with custom panels
3. For each page type, specify the page type and the matching .ascx page using the following format:
Type<n>=<page_type>
Src<n>=<ascx_page>
where <n> is 0 for the first page type, 1 for the second page type, etc., for example:
Type0=Rental_Agreement
 Page type that uses the first static panel
Src0=Rental_Agreement.ascx
 Custom layout for the first panel
Type1=Optional_Insurance
 Page type that uses the second static panel
Src1=Optional_Insurance.ascx
 Custom HTML page for the second panel
4. Save and close the file.
5. Open a batch in Prelayout and make sure it‟s using the new custom panels.
 To revert back to the default panels at any time, set Static=0.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
331
TASKMASTER WEB AND REMOTE SCANNING
VERIFICATION, PAGE IDENTIFICATION, AND REGISTRATION USING AINDEX
The “AIndex” web client (aindex.aspx) is similar to Prelayout, but includes full support for multi-pass
verification and manual image registration (see “Multi-pass verification” on page 334 and “Manual page
identification and registration” on page 338). As with the other web clients, you enable the client by specifying
the name of the .aspx file in the [iScan] section of the verify task‟s .bpp or .icp file.
AIndex generates verification pages automatically based on the document hierarchy. The web page is similar
to Prelayout, but it doesn‟t display image snippets in the data entry panel. If you don‟t require multi-pass
verification, Prelayout may be a better choice for web verification.
Toolbar
Save ImgPos
Override
Image view
Use this checkbox to override validation errors.
Increases the display size of the page image.
Decreases the display size of the page image.
Data entry
panel
Batch
tree
view
Rotates the page image clockwise by 90 degrees.
Displays half the page and toggles between halves.
Batch view…
332
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
USING THE BATCH TREE VIEW TO RESTRUCTURE THE BATCH
The batch tree view displays the type of each page within the batch. It lets you change the document or page
type and restructure the batch.
Start Doc
Indicates that the current page is the first page in a document:
 If the checkbox is un-selected, selecting it splits the current document so the page
becomes the first page in a new document.
 If the checkbox is selected, de-selecting it joins the current document and the
previous document.
Displays the document type and page type for the current page. You can change the page
type. If the page is the first page in a document, you can also change the document type.
Move up
Moves the selected page up.
Move down
Moves the selected page down.
Problem fields only
Select to display only problem fields in the data entry panel.
See “Manual page identification and registration” on page 338
CONFIGURING THE AINDEX CLIENT
You configure the AIndex client in the [iCap] section of the module‟s .bpp or .icp file.
[iCap]
Enabled=1
 0 to disable web client for this module; 1 to enable web client
Page=index.asp
 Not used (applies to earlier versions of Taskmaster)
Page1=aindex.aspx
 Name of web page
Static=0
 0 for automatic panel layout; 1 to use a custom “static” layout
IPS=72,74
 “Ignored page statuses”
DPS=0,2
 “Done page statuses”
IFS=-1
 “Ignored field statuses”
DFS=0
 “Done field statuses”
DOF=0,2,1
 “Done, override, fail statuses”
See “Configuring
the page status
values” for the
Prelayout client
The page and field status settings are the same as for the Prelayout client (see “Configuring the page and field
status settings” on page 326).
In addition, AIndex supports settings you can use to implement multi-pass verification.
AltText=1
 Specifies which AltText values to display
AllowAll=2
 Specifies whether to display additional AltText values on the page
AltCompare=1
 Specifies how to handle changes
Multi-pass verification is discussed in the section that follows.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
333
TASKMASTER WEB AND REMOTE SCANNING
MULTI-PASS VERIFICATION
Taskmaster applications can display the same page to multiple operators to ensure accurate data entry and
verification. Taskmaster supports two main implementations of multi-pass verification: two pass and double
blind. Other implementations are possible, but we‟ll focus on these two.
In two-pass verification:
1. An operator (or a recognition engine) enters the initial value for each field.
2. Taskmaster displays the page to a second operator but hides the initial values. The operator enters a new
value for each field.
 If you‟re using a recognition engine to implement the first pass, you might choose to display only
low confidence fields to the operator.
3. For each field, Taskmaster compares the new value to the initial value. If they match, Taskmaster accepts
the value; otherwise, the operator must re-enter the value. Only when the operator has entered the same
value twice in a row does Taskmaster accept the value.
In double-blind verification:
1. An operator (or a recognition engine) enters the initial data values.
2. Taskmaster displays the page to a second operator but hides the initial values. The operator enters a new
value for each field and Taskmaster saves all the values (no comparing).
3. Taskmaster displays the page to a third operator. Using a feature of AIndex that lets you display multiple
values, the operator can see both the initial value and the second value.
Second value
Initial value
4. For fields where the initial value and the second value are different, the operator must determine which
value is correct, or enter a new value:

Clicking the initial value (or pressing Alt+Shift+A when the field has the focus) moves the initial
value into the data entry field.

If entering a new value, the operator must enter the same value twice in a row.
We‟ll look at how the AIndex web client implements two pass and double blind using a combination of rules and
[iCap] settings in the sections that follow.
334
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
STORING MULTIPLE VALUES IN THE RUNTIME PAGE DATA FILE
Taskmaster can store multiple values in the runtime page data file for any given field:
<F id="Vendor">
<V n="TYPE">Vendor</V>
<V n="Position">0,0,0,0</V>
<V n="STATUS">1</V>
<C cn="6,8,10" cr="0,0,0,0">68,49,83</C>
<C cn="6,8,10" cr="0,0,0,0">97,50,112</C>
<C cn="6,8,10" cr="0,0,0,0">116,51,105</C>
<C cn="6,8,10" cr="0,0,0,0">97,52,110</C>
</F>
AltText[0]
 Character 1
 Character 2
 Character 3
 Character 4
AltText[2]
AltText[1]
In the example above, the Vendor field has three 4-character values represented by the ASCII characters
shown. The first value (AltText[0] = ASCII 68,97,116,97 = “Data”) is the primary data value.
Your application can use this structure to store values from multiple data entry passes, but since data is always
captured in AltText[0], you need to shuffle the data around. For example, to implement double blind:
1. Capture the initial data in AltText[0].
2. Move the data into AltText[1].
3. Display the page to Operator 2 and save the new data in AltText[0].
4. Copy the AltText[0] data into AltText[2].
5. Display the page so Operator 3 can review AltText[1] and AltText[2] and accept either version or
enter new data in AltText[0].
We‟ll illustrate this graphically in the examples starting on the next page.
ACTIONS THAT SUPPORT MULTI-PASS VERIFICATION
The actions required to perform the moving and copying of data are shown in the table below.
Library
Action
Description
DCO
PropagateToAltText
Copies the character values from AltText[0] to the specified position.
DCO
ClearAltText
Clears the character values from the specified position in the field’s array.
To implement a move, do a copy (PropagateToAltText) and then delete the original (ClearAltText).
The action used to compare AltText[0] and AltText[1] values is VoteFld.
Library
Action
Description
Vote
VoteFld
Sets the confidence level on each character to high and the field status to
0 (OK) if the AltText[0] and AltText[1] values match; sets the confidence
levels to low and the field status to 1 (problem) if the values don’t match.
To implement double blind verification, use the VoteFld action before displaying the page to the last
operator. If the initial and second values don‟t match, the action sets the field status to 1 and the field is
displayed in red.
Second value
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Initial value
335
TASKMASTER WEB AND REMOTE SCANNING
[ICAP] SETTINGS THAT SUPPORT MULTI-PASS VERIFICATION
Setting
Value
Description
AltText
0, 1 or 2:
Determines which “alt text” is the primary text (defaults to AltText*0+). AIndex
displays the primary text in the data entry field.
3:
AIndex displays AltText[0] in the data entry field and AltText[1] and AltText[2]
beside it.
AllowAll
AltText[1]
AltText[0]
2:
AIndex displays AltText[0] in the data entry field and AltText[1] beside it.
AltText[0]
1:
0 or -1:
AltCompare
1:
-1:
AltText[2]
AltText[1]
AIndex displays the AltText[0] value initially, but you can toggle back and forth
between AltText[1] and AltText[0] using the Alt+Shift+A hotkey combination.
AIndex displays the AltText[0] value in the data entry field. AltText[1] is hidden but
is used for comparisons.
AIndex compares the new value to the initial value and forces the operator to enter
a new value twice.
The operator can enter any new value. There is no comparison and no need to
enter the value twice.
EXAMPLE OF TWO-PASS DATA ENTRY
In the example below, “Operator 1” can be a person or a recognition engine. If you use a recognition engine
for the initial data entry, it runs as a background process so there is no user interface displayed.
User interface
Operator 1
(initial data entry)
1
PropageToAltText("1")
ClearAltText("0")
AltText [0]
1
(initial state)
(after data entry)
2
2
AltText [2]
[iCap] settings
AllowAll = -1
AltCompare = -1
(hide alt text)
(don’t compare)
1
AllowAll = -1
AltCompare = 1
(hide alt text)
(compare)
1
AllowAll = -1
AltCompare = 1
(hide alt text)
(compare)
1
Operator 2
Operator 2
AltText [1]
Move
Operator 2‟s data („2‟) is now stored in AltText[0] and Operator 1‟s data („1‟) is in AltText[1]. In this
case, the compare fails, and the operator must enter the same value twice to override initial value.
336
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
EXAMPLE OF DOUBLE-BLIND DATA ENTRY
As in the previous example, “Operator 1” can be a person or a recognition engine. If you use a recognition
engine for the initial data entry, it runs as a background process so there is no user interface displayed.
User interface
Operator 1
AltText [0]
1
(initial data entry)
AltText [1]
1
(hide alt text)
(don’t compare)
1
AllowAll = -1
AltCompare = -1
(hide alt text)
(don’t compare)
1
AllowAll = -1
AltCompare = -1
(hide alt text)
(don’t compare)
AllowAll = 2
AltCompare = 1
(show alt text)
(compare)
Move
Operator 2
(initial state)
2
(after data entry)
2
Copy
PropageToAltText("2")
2
VoteFld()
2
1
2
2
1
2
Operator 3
2
(initial state)
1
[iCap] settings
AllowAll = -1
AltCompare = -1
1
PropageToAltText("1")
ClearAltText("0")
Operator 2
AltText [2]
1
In this example, VoteFld sets the field status to „1‟ (problem) since AltText[0] and AltText[1] don‟t
match, and AIndex displays the field in pink. Operator 3 can now do one of three things:

Accept the current AltText[0] value („2‟).

Swap in and accept the AltText[1] value („1‟).

Enter a new value twice. The new value becomes the new AltText[0] value.
The result for each of these three actions is shown below.
Operator 3
(accept AltText[0])
Operator 3
(use AltText[1])
Operator 3
(enter new data)
AltText [0]
AltText [1]
AltText [2]
[iCap] settings
2
1
2
1
2
AllowAll = 2
AltCompare = 1
(show alt text)
(compare)
1
2
1
1
2
AllowAll = 2
AltCompare = 1
(show alt text)
(compare)
3
1
3
1
2
AllowAll = 2
AltCompare = 1
(show alt text)
(compare)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
337
TASKMASTER WEB AND REMOTE SCANNING
MANUAL PAGE IDENTIFICATION AND REGISTRATION
You can use the AIndex web client to perform manual page identification and manual registration:

You first set the document and page type using the drop-down lists in the batch tree view pane.

You then use the “Anchors” button at the bottom of the batch tree view to select the matching
fingerprint and perform manual page registration.
When registering the page, Taskmaster displays a floating image of the page‟s anchor object that you align
with the actual anchor image on the current page. When you place the anchor object, Taskmaster computes
the required page offsets that are used to calculate the positions of the data fields on the page.
Taskmaster can usually handle page identification and registration for you automatically, even when the
offsets relative to the fingerprint are quite large (see the chapter on “Pattern Matching” starting on page 253).
The functionality described in this section may be useful to you in special situations. Note, however, that if
you‟re using AIndex for manual page identification, you must run the manual page identification task after the
CreateDocs task, since AIndex requires a structured batch. You also need to make sure each unidentified page
is a separate document. We‟ll cover this in more detail when we update the TravelDocs application to use
AIndex at the end of this chapter.
ENABLING MANUAL PAGE REGISTRATION (MANUAL ANCHORING)
You must perform the steps below to enable manual anchoring for a specific page type.
1. Define an anchor field for the page type and specify the zone position on each of the corresponding page
fingerprints (see “Setting up the pattern match anchor objects” on page 265). Make sure the field‟s
PatternMatch variable is set to 1.
2. Add a variable called Required to the anchor field and set its value to 1.
3. For the task in which you want to perform manual anchoring, add a [Fixup] section to the task‟s .bpp
or .icp file and add a TemplateFolder setting specifying the application‟s fingerprint folder, for
example:
[Fixup]
TemplateFolder=C:\Datacap\TravelDocs\fingerprint
338
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
REGISTERING A PAGE USING MANUAL ANCHORING
Manual anchoring is only enabled for pages that include an anchor field with the variable Required=1 and
where the anchor field‟s position in the runtime page data file is undefined or set to 0,0,0,0.
Unidentified pages (pages of type “Other”) will typically not have a runtime page data file, so the first thing
you must do is assign a page type. When you assign a page type in AIndex, AIndex creates a page data file
with all fields set to 0,0,0,0. If the page type you assign has an anchor field with the Required=1, AIndex
enables manual anchoring.
AIndex performs document integrity checking when you submit the batch, so your documents and pages
must be structured correctly. Additionally, you need to have appropriate DPS, DOF, DFS, and IFS values in the
task‟s bpp/icp file (see “Configuring the page and field status settings” on page 326). We‟ll do this later when
we update the TravelDocs application to use AIndex (see “Updating ManualPageID.icp” on page 362).
To identify or register a page using manual anchoring:
1. In the AIndex batch tree view pane, select the page that requires identification or manual registration.
2. If the page is unidentified, use the drop-down lists to set the document type and page type. Depending on
the position of the page within the batch, you may need to select the Start Doc option before you can set
the document type.
3. If manual anchoring is enabled for the page type you selected, Taskmaster displays a message box
notifying you that you need to set the anchor position. Click OK to close the message box.
4. Click the Anchors button in the batch tree view pane.
5. If the page is unidentified, Taskmaster displays thumbnails of all fingerprints with a required anchor
(Required=1) and asks you to double-click the matching fingerprint. Click OK to close the message box
and then double-click the matching fingerprint.
6. Taskmaster reads the position of the anchor object for the fingerprint you selected from the document
hierarchy, and floats a red version of the anchor image over the current page image.
7. Use the mouse to align the anchor object with the page image.
8. When you complete the batch, Taskmaster writes the anchor position to the runtime page data file and
writes the offset values to the runtime batch hierarchy, for example:
<P id="TM000006">
<F id="Vendor_Logo">
<V n="TYPE">Vendor_Logo</V>
 <V n="Position">221,229,608,332</V>
<V n="STATUS">-1</V>
</F>
etc.
<P id="TM000006">
<V n="IMAGEFILE">tm000006.tif</V>
<V n="TYPE">Air_Ticket</V>
<V n="STATUS">0</V>
 <V n="Image_Offset">-45,-29</V>
<V n="TemplateID">562</V>
etc.
Runtime page data file
Runtime batch hierarchy file
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
339
TASKMASTER WEB AND REMOTE SCANNING
VERIFICATION USING THE AVERIFY WEB CLIENT
The AVerify client is functionally similar to the Prelayout client, but lacks the batch restructuring features of
Prelayout.
Unlike Prelayout, AVerify performs most of the processing on the client, rather than on the Taskmaster web
server. Since the AVerify client can handle processing tasks such as generating screens, loading data, and
saving data, it may be a good choice if you‟re processing high volumes using multiple web clients attached to
the same web server. The server is invoked only for validations and for saving the XML files. Aside from this
feature, Prelayout is usually a better choice for web verification.
340
Highlights the next low confidence
character on the current page.
Displays a popup window with an
enlarged view of the current field.
Displays the previous page in the batch.
Enlarges the page image view.
Submits the current page and displays the
previous problem page.
Reduces the size of the page image view.
Puts the batch on hold.
Displays one quarter of the page.
Submits the current page and displays the
next problem page.
Outlines all words on the page image.
Displays the next page in the batch.
Outlines all lines on the page image.
Puts the batch on hold.
Outlines all recognition fields on the
page image.
Submits the current page and displays the
next problem page.
Select to override a validation failure.
Runs the Verify task profile (typically
validation rules).
Displays the data from the runtime batch
hierarchy.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
To use the AVerify client, set Page1=averify.aspx in the [iCap] section of the verify task‟s bpp/icp file.
Also make sure you set Static=0 to disable custom panels, for example:
[iCap]
Enabled=1
 Web component enabled
Page1=averify.aspx
 Web page for “AVerify” web client
Static=0
 Disable custom panels
 Creating and using custom (static) panels is covered in the next section.
AVerify uses the same page and field status settings and Rulerunner settings as the Prelayout verification
client (see “Configuring the page and field status settings” on page 326 and “Configuring the [RRC] settings”
on page 326).
CREATING AND USING CUSTOM (“STATIC”) PANELS
AVerify generates a default verification panel for each page type automatically. It maps each of the fields in
the document hierarchy into a 2-column table as shown in the example below.
Column 1

Column 2

1
2
3
4
5
6
7
1
2
3
4
5
6
7
AVerify lets you replace the standard verification panel with a custom (“static”) panel by:
1. Exporting the default panel layout as an HTML file.
2. Customizing the panel layout using a HTML editor.
3. Specifying the HTML file in the task‟s bpp/icp file.
We‟ll go through each of these steps in the sections that follow.
If you specify that AVerify is to use static panels but don‟t define a static panel for each page type, you‟ll get a
runtime error like the one below and AVerify will be unable to display a verification panel.
Therefore, if you choose to use static panels, you must define a static panel for each page type.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
341
TASKMASTER WEB AND REMOTE SCANNING
EXPORTING THE DEFAULT PANEL LAYOUT
A custom panel is referred to as a static panel. To create a static panel, you must first export the default 2column panel layout. You do this by setting Static=1 and Types=0 in the task‟s bpp/icp file and then
opening a batch in AVerify.
1. Open the task‟s .bpp or .icp file in a text editor.
2. In the [iCap] section, set Static=1 and Types=0, for example:
[iCap]
Enabled=1
Page1=averify.aspx
Static=1
Types=0
 1 = use static panels; 0 = don’t use static panels (uses the default panels)
3. Save and close the file.
4. Prepare a batch for verification.
5. Open the batch in AVerify and display at least one instance or each page type. For each page you open,
Taskmaster creates an HTML file with the layout of the standard 2-column verification panel for that
page. The files are named static<page_type>.htm (for example, staticRental_Agreement.htm) and they are
saved in one of the following locations:

C:\ (the root of the C: drive)

C:\Users\<username>\AppData\Local\VirtualStore
CUSTOMIZING THE PANEL LAYOUT
The HTML file that AVerify exports for each page type defines each of the fields within a standard 2-column
table layout. Each cell represents one field and contains a label, an image snippet, and an edit control.
 Label
 Image snippet
 Edit control
The HTML within the image snippet and edit control cells contains code that can‟t be modified.
<OBJECT style="WIDTH: 100%; HEIGHT: 30px" id=dcim_Pickup_Date
title=Pickup_Date tabIndex=-1 codeBase="dcim.cab"
classid=clsid:BA893287-8932-11D3-A0DB-58B204C16365 width=191
height=30> . . . </OBJECT><BR>
<OBJECT onblur=reSetValue(me) style="BORDER-BOTTOM: gray 1px solid;
BORDER-LEFT: gray 1px solid; WIDTH: 99%; FONT-FAMILY: Arial; FONTSIZE: 12pt; BORDER-TOP: gray 1px solid; BORDER-RIGHT: gray 1px
solid" onkeydown=hkPress() id=txt_Pickup_Date language=vbscript
class=AFlatEdit onfocus=Remember(me) title=Pickup_Date name=dcedit
codeBase="dcim.cab" classid=clsid:D20D94B4-B85C-466C-B29B19B2ADAF60EC height=23> . . . </OBJECT></TD>
Image snippet
Edit control
However, you can change the labels, rearrange the cells, remove the image snippets if you don‟t want to
display them, etc. To customize the default panel layout:
1. Open the HTML file in an HTML editor.
2. Make the required modifications to the labels, layout, etc.
3. Save the file.
342
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
SPECIFYING THE CUSTOM PANELS IN THE BPP/ICP FILE
You control the use of custom (“static”) panels through the task‟s bpp/icp file.
1. Copy the modified .htm files into the C:\Datacap\tmweb.net\Tasks folder on the Taskmaster Web
server.
2. Open the task‟s .bpp or .icp file in a text editor.
3. In the [iCap] section, set Static=1 and Types=<n>, where <n> is the number of page types, for
example:
[iCap]
Enabled=1
Page1=averify.aspx
Static=1
 1 = use static panels
Types=6
 Specifies the number of static panels
4. For each page type, specify the page type and the matching HTML page using the following format:
Type<n>=<page_type>
Src<n>=<html_page>
where <n> is 0 for the first page type, 1 for the second page type, etc., for example:
Type0=Rental_Agreement
 Page type that uses the first static panel
Src0=staticRental_Agreement.htm
 Custom HTML page for the first panel
Type1=Optional_Insurance
 Page type that uses the second static panel
Src1=staticOptional_Insurance.htm
 Custom HTML page for the second panel
5. Save and close the file.
6. Open a batch in AVerify and make sure it‟s using the new custom panels.
 To revert back to the default panels at any time, set Static=0.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
343
TASKMASTER WEB AND REMOTE SCANNING
VERIFICATION USING THE IMGENTER WEB CLIENT
The “ImgEnter” (imgenter.aspx) web client is different from the other web verification clients in that you
enter data through the page image view.
ImgEnter displays a gray border around each data field. When you click within a field, the web client displays
a data entry edit field immediately above it displaying the recognized data and letting you change it if
necessary. Fields with low confidence characters are displayed in yellow, whereas fields with validation errors
are displayed in red.
To use the “ImgEnter” client, change the Page1= setting in the [iCap] section of the verify task‟s bpp/icp
file, for example:
[iCap]
Enabled=1
 Web component enabled
Page1=imgenter.aspx
 Web page for “ImgEnter” web client
ImgEnter uses the same page and field status settings and Rulerunner settings as the Prelayout verification
client (see “Configuring the page and field status settings” on page 326 and “Configuring the [RRC] settings”
on page 326).
 If use ImgEnter with the TravelDocs application, you may notice that the “Options” groups on the
Rental_Agreement pages don‟t work. This is a limitation of the ImgEnter web client, which doesn‟t
support the “MultiPunch” option properly.
344
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
MANUAL PAGE IDENTIFICATION AND BATCH RESTRUCTURING USING PROTOID
We used the ProtoId web client (ProtoId.aspx) earlier in this guide to do manual page identification.
The drop-down list beneath each thumbnail lets you change the current page type.
Additionally, the mini toolbar above each thumbnail image provides batch restructuring functionality. The
batch restructuring controls are described on the right in the table below.
Enlarges all thumbnails
Moves the page to the left (ALT+U)
Shrinks all thumbnails
Indicates that the page has been copied to the
clipboard
Displays hot key list
Copies the page to the clipboard (CTRL+C)
Runs document integrity checking
Inserts the page from the clipboard* (CTRL+V)
Rotates the page thumbnail by 90 degrees
(CTRL+G)
Moves the page to the right (ALT+N)
* When you copy a page, Taskmaster adds a page-level variable to the runtime hierarchy with the ID of the source
page. For example, if you copy page 1, Taskmaster adds <V n="Copy">TM000001</V> to the cloned version.
Note also that you can move between thumbnails using TAB/SHIFT+TAB and display a full page image by
clicking the thumbnail or pressing ENTER. Additional functionality is described in the section that follows.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
345
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING PROTOID
To use the ProtoId client, set Page1=ProtoId.aspx in the [iCap] section of the verify task‟s bpp/icp file.
[iCap]
Enabled=1
Page1=ProtoId.aspx
 Web component enabled
 Web page for “ProtoId” web client
INSERTING PAGES
The CTRL+M hotkey inserts a new page before the selected page. To specify the type for the new page, add
the following to the task‟s bpp/icp file:
[PageID]
InsertType=<page_type>
where <page_type> specifies an existing page type. If you specify an invalid type, ProtoId assigns the type
of the page that‟s selected when you press CTRL+M.
For example, if you set InsertType=Separator_Page and press CTRL+M, ProtoId inserts a page of type
Separator_Page and assigns the image file “blank.tif”:
<P id="TM000001">
<V n="STATUS">0</V>
<V n="TYPE">Separator_Page</V>
<V n="IMAGEFILE">blank.tif</V>
<V n="Insert">1</V>
</P>
CONTROLLING THE LIST OF AVAILABLE PAGE TYPES
By default, ProtoId lists all available page types in the drop-down list below each thumbnail image.
If you want to limit the available page types or display “aliases” you can do so by creating a dictionary of
available page types in the application‟s document hierarchy XML file. The dictionary must have the name
“PageNames,” for example:
<DICT n="PageNames">
<W v="Page_Type_1">Page_Type_1</W>
<W v="Page_Type_2">Page_Type_2</W>
<W v="Page_Type_3">Page_Type_3</W>
</DICT>
DISABLING DOCUMENT INTEGRITY CHECKING
By default, ProtoId performs document integrity checking automatically when you click “Done” and will not
let you complete the batch if there are integrity problems.
To disable automatic document integrity checking, add the following to the task‟s bpp/icp file:
[PageID]
DocIntegrity=0
RUNNING RULES FROM PROTOID
ProtoId uses the same Rulerunner settings as the Prelayout verification client (see “Configuring the [RRC]
settings” on page 326). The specified task profile runs immediately before document integrity checking.
346
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
USING “SUPER VARIABLES”
The ALT+S hotkey assigns a “super variable” to the selected page in the runtime batch hierarchy, for
example:
<P id="TM000001">
<V n="Super">Three</V>
 Super variable assigned with value = ”Three”
<V n="STATUS">0</V>
<V n="TYPE">Rental_Agreement</V>
etc.
The value is displayed above the page thumbnail image, for example:
 Super variable value = “Three”
To specify the available super variable values, add the following to the task‟s bpp/icp file:
[PageID]
SuperVars=<value_1>,<value_2>,<value_3>,etc.
for example:
[PageID]
SuperVars=One,Two,Three
Each time you press ALT+S, ProtoId assigns the next value in the series. In this example, the first time you
press ALT+S, it assigns Super=One; the second time, it assigns Super=Two; the third time, it assigns
Super=Three; the fourth time, it removes the “Super” variable.
You can use super variables to perform additional integrity checking by specifying a page type with each
value, for example:
[PageID]
SuperVars=One|Rental_Agreement,Two|Air_Ticket,Three|Room_Receipt
ProtoId uses these <value<|<page_type> combinations during document integrity checking. In this
example:

Integrity checking fails if you assign Super=One to a page that is not of type Rental_Agreement.

Integrity checking fails if you assign Super=Two to a page that is not of type Air_Ticket.

Integrity checking fails if you assign Super=Three to a page that is not of type Room_Receipt.
In addition, you can use the tilde (~) character to indicate values that are not valid for a specific type, for
example:
[PageID]
SuperVars=One|~Rental_Agreement
In this example, integrity checking fails if you assign Super=One to a page that is of type Rental_Agreement.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
347
TASKMASTER WEB AND REMOTE SCANNING
WEB-BASED ADMINISTRATION AND JOB MONITORING
Up until now, we‟ve done all application administration and job monitoring using the Taskmaster Client‟s
Administrator window. The Taskmaster Web client provides equivalent functionality for all administration
and monitoring. This is available through the web client‟s Administrator and Monitor tabs.
APPLICATION ADMINISTRATION
The web client‟s Administrator tab lets you configure your application from any machine on the network. It
includes six sub-tabs that match the tabs in the Taskmaster Client Administrator window.
The main difference between the web client and the thick client is that there‟s no Batch Pilot Setup window
available from the web client. This means you must edit the project file (.bpp or .icp) in order to configure the
task profile, task settings, etc. However, you can access the project file from the web client to view and
modify the settings.
To access the project file from the web client:
1. Click the Modules tab and select the module.
2. Click the Parameter hyperlink in the pane on the right. The project file opens in a text editor window.
3. If you make any changes, click the Save button at the bottom left of the text editor widow to write the
changes to the file on the Taskmaster Server machine and then close the window.
348
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
JOB MONITORING
The web client‟s Monitor tab lets you monitor the status of the job queue.
The labels at the top of the Web Monitor are active:

Items per page controls how many jobs are displayed

Delete batches deletes all displayed batches (use the Batch, Job, Task, and/or Status fields to control
which jobs are displayed, or use the Filter button)
 To delete an individual batch, click the batch number and then click Delete (see below).

Filter provides finer control over which jobs are displayed

Refresh refreshes the job list (or set the rate to refresh automatically)

Default returns to the default view (all jobs)
Additional functionality in the Web Monitor includes:

Click the QID to run a batch

Click the batch number to view the batch details, change the batch status, and optionally delete the batch.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
349
TASKMASTER WEB AND REMOTE SCANNING
TRAVELDOCS: SCANNING FROM TASKMASTER WEB
CREATING A REMOTE SCAN TASK
You‟ll need a TWAIN scanner in order to complete this section. Before proceeding, make sure the scanner is
connected and functioning.
 If you want to run Taskmaster Web from a different machine, you‟ll need to run the WebClientConfig
utility on that machine to configure the required Internet Explorer security settings. You‟ll also need to
add the Taskmaster Web (tmweb.net) server as a trusted site. For details, please refer to the IBM
Datacap Capture Installation and Configuration Guide.
1. Start Internet Explorer and open the Taskmaster home page:

If the server is running on the same machine: http://localhost/tmweb.net

If the server is running on a different machine: http://<tmweb_server>//tmweb.net
2. Log in to the TravelDocs application.
3. Click the Administrator tab and then click the Modules sub-tab.
4. Click |new| to create a new module.
5. Enter the module details as follows:
Name: iScan
Description: Taskmaster Web Remote Scanning
Type: Batch Creation
Program Name: TMTask.BPilot
Parameter: /inet rScan.icp
6. Click Save.
7. Click the Workflow sub-tab, expand TravelDocs, and then expand Web Job.
8. Select the iVScan task and click |remove|. Then click OK.
350
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
9. Select Web Job and click |new|.
10. Enter the task details as follows:
Name: Scan
Description: Scan from Taskmaster Web
Module: iScan
Queue by: None
Store: None
11. Click Apply.
12. Select the new Scan task and click  to move the task to the top of the Web Job task list.
 If the new task disappears, expand any job with conditions and you should see the new task again.
13. Click the Shortcuts sub-tab and click |new|.
14. Enter the shortcut details as follows:
Name: RScan
Description: Remote Scanning
Mode: Auto
15. Scroll down and under Web Job select  Scan. Then click Save.
 If the web page hangs, open the Taskmaster Server Manager, then stop and restart the server.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
351
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING THE REMOTE SCANNING CLIENT
Here we‟ll make sure the remote scanning client is configured to use the 2-step scan-upload process.
1. On the Taskmaster Web Administrator tab, click the Modules sub-tab.
2. Select the iScan module and then click the hyperlinked Parameter label in the right pane.
This opens the project file, rScan.icp, in the web text editor.
3. In the [Scan] section, make sure LocalProc=0.
4. Make a note of the ScanDir setting (the default is C:\Datacap\scan). This is the folder used for
temporary local storage of the scanned image files.
 For information about the other settings in rScan.icp, see “Configuring the remote scanning client”
on page 319.
5. Click the Save button at the bottom left of the text editor and then close the window.
CONFIGURING THE UPLOAD TASK
The default application framework includes an Upload task. The only thing we need to do is create a shortcut.
1. On the Taskmaster Web Administrator tab, click the Shortcuts sub-tab.
2. Click |new|.
3. Enter the shortcut details as follows:
Name: Upload
Description: Upload from Taskmaster Web
Mode: Auto
4. Scroll down and under Web Job select  Upload. Then click Save.
 If the web page hangs when you save the shortcut, open the Taskmaster Server Manager window,
then stop and restart the server.
352
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
SCANNING AND UPLOADING A BATCH
1. Print the file ScanPage.pdf that‟s included in the Sample Documents download.
2. In Taskmaster Web, click the Operations tab. You should see the two shortcuts we just created. If you
don‟t, click Logout and then log back on.
3. Click the RScan shortcut to display the remote scanning page (vscancl.aspx).
4. Open a second Internet Explorer window and open tmweb.net. Then click the Monitor tab and set the
Refresh rate to 10 sec. Arrange the Internet Explorer windows so you can see them both.
This item tells us Taskmaster created the batch folder and created a new entry in the job queue.
5. Make sure your scanner is selected and the page is in the scanner‟s feeder or on the flatbed. Then click
the Scan button in the remote scanning window.
 If your scanner doesn‟t have a feeder you may see a couple of information messages about
unsupported features. You can click OK to these and continue.
6. When the scan completes and the page thumbnail is displayed, click Done. Then click OK and Stop. In
the Monitor window you‟ll see the job as pending for the Upload task (you may need to wait a moment).
7. On the Operations tab, click the Upload shortcut. Then wait for the file to upload. When it completes,
click OK. You‟ll see the job as pending for the PageID task.
 Quattro won‟t process the batch automatically because it isn‟t configured to process web jobs. We‟ll
do this shortly.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
353
TASKMASTER WEB AND REMOTE SCANNING
8. Open the most recent “batches” folder (if you‟re using two machines this is on the Taskmaster Server
machine).
You can see the image file and the two runtime batch files:

scan.xml is the file generated by the RScan task

upload.xml is the file generated by the Upload task
CREATING THE WEB JOB CREATEDOCS TASK
When we created the CreateDocs task in the previous chapter, we created it for Main Jobs only. We also need
to create the task for Web Jobs.
1. In Taskmaster web, click the Administrator tab.
2. On the Workflow sub-tab, expand the TravelDocs workflow and expand Web Job.
3. Select the Web Job node and click |new| to create a new task.
4. Enter the task details as follows:
Name: CreateDocs
Description: Create documents
Module: rrsCreateDocs
Queue by: None
Store: None
 Since a task with the same name already exists in the Main Job workflow, the fields autopopulate
once you‟ve entered the name.
5. Click Apply.
6. Select the new CreateDocs task and click  to move the task between PageID and Rulerunner.
 If the new task disappears, expand any job with conditions and you should see the new task again.
354
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
CONFIGURING QUATTRO TO RUN WEB JOBS
1. Open the Rulerunner Quattro Manager. If the Quattro service is running, click Stop to stop the service.
2. Click the Datacap Login tab and click Connect.
3. Click the Workflow: Job: Task tab and select TravelDocs in the left pane.
4. Under TravelDocs > Web Job, select PageID, CreateDocs, Rulerunner, and Export.
5. Drag Web Job to <thread0>. It should look like the image below.
6. Click Save.
7. Click the Datacap Login tab and click Disconnect.
8. Click the Datacap Quattro tab and click Start.
9. Switch to the Taskmaster Web Monitor window to watch Quattro process the web job through the
PageID, CreateDocs, and Rulerunner tasks. It should stop when the web job is pending for the Verify
task.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
355
TASKMASTER WEB AND REMOTE SCANNING
MODIFYING THE VERIFY/FIXUP SHORTCUT
The default Verify/FixUp shortcut is only configured for Main Jobs, not Web jobs. Before we can open the
batch for verification we must modify the shortcut.
1. In the Taskmaster Web Operations window, click the Administrator tab and then click the Shortcuts
sub-tab.
2. Select the Verify/FixUp shortcut.
3. Scroll down and under Web Job select  Verify. Then click Save.
OPENING THE BATCH FOR VERIFICATION
1. Click the Taskmaster Web Operations tab and click the Verify/FixUp shortcut.
 If the “All documents are complete” message is displayed, click Cancel so you can review the page.
2. Scroll the second panel so you can see the checkbox options and the listbox control below. You can see
that in this case the checkbox outlines have been clipped but the recognition engine was still able to
identify the “selected” Fuel Service option.
 To fix the clipped checkbox outlines, you may need to modify the image processing settings based
on a scanned page. Due to scanner differences, your scanned page may be different from the one
shown here. If the Fuel Service option is not selected, select it before proceeding.
3. Click Submit and OK to finish the batch. Then click OK and Stop.
4. Check the Taskmaster Web Monitor window and you should see Quattro run the Export task.
356
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
TRAVELDOCS: USING AINDEX FOR MANUAL PAGE IDENTIFICATION AND REGISTRATION
 You will need Taskmaster Capture 8.0.1 Fix Pack 1 or higher to complete this section, since it includes
required changes to aindex.aspx.
MAKING A COPY OF THE APPLICATION
This section requires changes to the application that we‟ll need to roll back when we‟re done looking at
AIndex. The easiest thing to do is to make a copy of the application using the Datacap Studio Application
Wizard and then work on the copy.
1. Close the Taskmaster Client window. If the Datacap Studio window is open, click Exit to close Datacap
Studio as well.
2. Start Datacap Studio but don‟t connect to any application. Instead, click Close in the Application
window.
3. In the Datacap Studio window, click the Datacap Application Wizard
button at the top right.
4. In the Datacap Application Wizard dialog, click Next.
5. Select Copy an existing RRS application and click Next.
6. In the top field, select the TravelDocs application.
7. Leave the default values for the Root folder, Datacap folder, and Taskmaster Web folder fields.
8. Select Rename copy and enter TravelDocs2 in the New Name field.
9. Click Next and then click Finish. Then wait for the copy to complete and click Close.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
357
TASKMASTER WEB AND REMOTE SCANNING
UPDATING THE APPLICATION
Earlier in this guide, we configured the PageID ruleset to use multiple page identification techniques that we
implemented in a “cascade” fashion.
Fingerprint
matching
Text
matching
Success?
Pattern
matching
N
Y
Success?
Done
N
Y
Success?
Done
N
Branch to manual ID
Y
Done
To demonstrate manual page identification using AIndex, we‟re going to remove the text matching and
pattern matching functions and send any batch with unidentified pages to AIndex.
However, as mentioned earlier, if you‟re using AIndex for manual page identification you need to run the
manual page identification task after you‟ve created a structured batch. This means we need to move the
branching function out of the PageID task and into the CreateDocs task. You also need to make sure each
unidentified page is a separate document, which requires an update to the document hierarchy.
UPDATING THE PAGEID RULESET
1. In the Datacap Studio window, click the Connection Wizard button
at the top right.
2. Select the TravelDocs2 application, click Next, enter the “admin” password, and click Finish.
3. Select the PageID ruleset and click the Lock/Unlock rulset for editing button.
4. Expand the PageID ruleset and the PageID rule.
5. Remove the Identify using Text Match and Identify using Pattern Match functions.
6. Expand the Identify Manually function and remove the Task_NumberOfSplits and
Task_RaiseCondition actions.
7. Change the parameters on the rrSet action as shown below.
Library
Action
Parameter
rrunner
rrSet
varSource = Yes
varTarget = @B.ManualID
8. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish ruleset. The
PageID rule should look like the one below.
358
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
UPDATING THE DOCUMENT INTEGRITY RULESET
The CreateDocs task profile includes two rulesets: CreateDocs and Document Integrity. Since we need to
branch to AIndex after creating the structured batch, we‟ll add the branching function to the Document
Integrity ruleset.
Previously, we used the Document Integrity ruleset to raise a branch condition if the batch had structural
problems requiring manual fixup. We‟re going to repurpose this ruleset to raise the branch condition if the
batch contains any pages requiring manual identification.
1. Select the Document Integrity ruleset and click the Lock/Unlock rulset for editing button.
2. Expand the Document Integrity ruleset, the Batch Document Integrity Check rule, and both
functions.
3. Remove the CheckAllIntegrity action from the first function and replace it with the action below.
Library
Action
Parameter
rrunner
rrCompareNot
object1 = @B.ManualID
object2 = Yes
 This action returns False means if the batch variable “ManualID” is Yes, causing the rule to execute
the Batch Route To Fixup function. The PageID rule sets this variable to “Yes” if the batch
includes pages that failed fingerprint matching.
4. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish ruleset. The
Document Integrity ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
359
TASKMASTER WEB AND REMOTE SCANNING
UPDATING THE DOCUMENT HIERARCHY
1. In the document hierarchy pane, click the Lock DCO for editing button.
2. Expand the document hierarchy so you can see the “Other” page and the “Air_Ticket” page.
3. Right-click the Other page and choose Manage variables. Then set the Order value to -1 and click
Done.
 Setting the Order value to -1 causes the CreateDocs action to create a new document for each
page of type “Other.”
4. Expand the Air_Ticket page.
5. Right click the Vendor_Logo field and choose Manage variables.
6. Click New, type Required, and press Enter.
7. Set the value of the Required variable to 1 and click Done.
8. Click the Save changes button and then click the Unlock DCO button.
360
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
CORRECTING THE MODULE PATHS
 The steps below correct a problem introduced by the copy wizard if the new application name includes
a number (for example, copying “TravelDocs” to “TravelDocs2”).
1. Start Taskmaster Client (Start > All Programs > Datacap > Taskmaster Client > Taskmaster
Client).
2. Select TravelDocs2, click OK, and log in using User ID: admin, Password: admin, and Station: 1.
 Make sure you select TravelDocs2 application and not the TravelDocs application.
3. Click the Administrator
button to open the Taskmaster Administrator window.
4. Click the Modules tab and select the rrsAssemble module. In the Parameters field, set the full path and
filename as shown below and then click Apply.
C:\Datacap\TravelDocs2\dco_TravelDocs2\rrs_assemble.bpp
5. Repeat for the rrsCreateDocs and rrsRulerunner modules, setting the paths as shown below
C:\Datacap\TravelDocs2\dco_TravelDocs2\rrs_createdocs.bpp
C:\Datacap\TravelDocs2\dco_TravelDocs2\rrs_rulerun.bpp
6. Click Done to close the Taskmaster Administrator window.
CHANGING THE BRANCH CONDITION
1. Click the Administrator
button to re-open the Taskmaster Administrator window.
2. Click the Workflow tab, expand Main Job, expand CreateDocs, and select the Document Integrity
Failed condition.
 If you‟re unable to expand the CreateDocs node, the module path may be incorrect. Click the Setup
button and check the path displayed in the error message. If necessary, click the Modules tab and
correct the path (as described in the previous section), and then re-open the Administrator window.
3. Click the down-arrow beside the Child Job field and choose ManalPageID Job.
4. Click Apply and then click Done.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
361
TASKMASTER WEB AND REMOTE SCANNING
UPDATING MANUALPAGEID.ICP
Next, we‟ll configure the ManualPageID task to use aindex.aspx. We also need to add the required page and
field status settings and set the template path to point to the application‟s fingerprint folder. Additionally, we
need to create a rule that runs if a user changes an existing page status.
The page and field status settings for AIndex are the same as those for Prelayout (see “Configuring the page
and field status settings” on page 326). Depending on what you‟re trying to do, they can be tricky to get
correct. Descriptions are provided below and the instructions for editing the file are on the next page.
IFS
The IFS setting determines which fields to hide, meaning AIndex won‟t display any field that has one of the
specified status values. Since we‟re using AIndex for manual page identification, there will be no field data and
so it might be confusing to display a page of blank fields. For this reason, we‟re going to hide all fields by
setting IFS=0,-1:

0 is the default status of each field when the page data file is created initially
 -1 is typically assigned to anchor fields so they can be set to “hidden”
DFS
The DFS setting determines which fields to hide when the “Problem fields only” checkbox is selected. Since
we‟re hiding all fields, it doesn‟t really matter what we put, but DFS=0 is the standard setting so we‟ll use this.
DPS
The DPS setting is used to determine when a batch is complete. When all pages have one of the specified
values, you can complete the batch; otherwise, you can only put it on hold.
By default, the VScan task assigns a status of 49 all pages. In the PageID task, we assign a status of 1 to the
batch if it includes pages that failed fingerprint matching.
When you manually assign a page type in AIndex, AIndex assigns the status specified as the first DOF value.
We‟ll assign the status „0‟ (the standard “OK” status) and so we‟ll set DPS=0,49. This way the user can only
complete the batch when all pages are identified.
DOF
The DOF (“done, override, failed”) setting specifies three page status values:

The first value is the “done” status that‟s assigned when you set the page type for an unidentified page.
When we set up the DPS values we decided to use 0.

The second value is used for validation overrides. These don‟t apply in this situation, but we‟ll use the
standard override value, which is 2.

The third value is assigned if you change an existing page type. However, it‟s also the “failed” status and
AIndex won‟t let you complete the batch if a page has this status, regardless of what you put in the DPS
list. In most situations users won‟t change an existing page status, but they may do so by mistake or for
some other reason, so we‟ll handle this situation using a rule (see “Creating the ManualIDValidate rule”
on page 364). We‟ll assign the value 99 initially and then use the rule to assign the “done” status.
In order to set the “done” status to 0, the “override” status to 2, and the “failed” status to 99, we use the
setting DPS=0,2,99.
362
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
EDITING THE .ICP FILE
1. Open the file C:\Datacap\TravelDocs2\dco_TravelDocs2\ManualPageID.icp in a text editor. The
existing [iCap] section should look like the one below.
[iCap]
Enabled=1
Page1=ProtoId.aspx
2. Set Page1=aindex.aspx and add the new DPS, DOF, DFS, and IFS settings as shown below.
[iCap]
Enabled=1
Page1=aindex.aspx
DPS=0,49
DOF=0,2,99
DFS=0
IFS=-1,0
3. Add a [Fixup] section with the path to the fingerprint folder as shown below.
[Fixup]
TemplateFolder=C:\Datacap\TravelDocs2\fingerprint
4. Add an [RRC] section as shown below. This runs the ManualIDValidate task when the user clicks
Submit.
[RRC]
Application=TravelDocs2
TProfile=ManualIDValidate
HttpWRRS=http://127.0.0.1/RRS/
RRSType=LocalRRS
ExecMode=1
BatchLog=1
ServiceLog=3
5. Save and close the file.
UPDATING RRS_VERIFY.BPP TO USE AINDEX
We‟ll also configure the application to use AIndex for verification, so you can see how it behaves.
1. Open the file C:\Datacap\TravelDocs2\dco_TravelDocs2\rrs_verify.bpp in a text editor.
2. In the [iCap] section, set Page1=aindex.aspx.
3. Save and close the file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
363
TASKMASTER WEB AND REMOTE SCANNING
CREATING THE MANUALIDVALIDATE RULE
AIndex assigns the “failed” (DOF) page status if we change an existing page type. You can‟t complete the
batch if any page has a “failed” status, so we need a rule to change the status to “done.”
The user must click the Submit button for each changed page in order to run the rule. The rule doesn‟t do
anything except return True, but this is enough to cause AIndex to assign the “done” status to the page.
1. In Datacap Studio, in the Rulesets pane, right-click TravelDocs2 and choose Add Ruleset.
2. Change the name of the new ruleset to ManualIDValidate and then select Function1.
3. Click the Actions library tab and expand the rrunner library.
4. Select the Status_Preserve_OFF action and click the Add to function
Function1.
button to add the action to
5. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The new ruleset should look like the one below.
6. Click the Task profiles tab and then click the Lock/Unlock task profiles button.
7. Click the Add a new task profile
button, select Custom, type ManualIDValidate, and click OK.
8. Select the new ManualIDValidate profile, select the ManualIDValidate ruleset, and click the Add
ruleset to profile button at the left of the Task Profiles pane.
9. Click the Save button and then click the Lock task profiles button. The task profile should look like the
one below.
364
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
RUNNING A BATCH THROUGH THE WORKFLOW
 Quattro isn‟t configured to run the TravelDocs2 application, so we‟ll need to use Taskmaster Client to
run the batch through the background tasks.
1. Using Windows Explorer, open C:\Datacap\TravelDocs2\images.
2. Delete the page NewAirline.tif.
3. Copy the file OffsetAirTicket.tif from the sample documents download (this is the same file we used
earlier in the “Pattern Matching” chapter) into the C:\Datacap\TravelDocs2\images folder. The
folder should contain the files Images_Page_01.tif through Images_Page_13.tif plus OffsetAirTicket.tif.
4. In the Taskmaster Client - TravelDocs2 window, double-click the VScan icon and wait for the task to
complete. Then click Stop.
5. Double-click the PageID icon and wait for the task to complete. Then click Stop.
6. Double-click the CreateDocs icon and wait for the task to complete. Then click Stop.
7. Open the Job Monitor window. You should see the batch pending for the ManualPageID task.
8. Start Internet Explorer and open the Taskmaster home page:

If the server is running on the same machine: http://localhost/tmweb.net

If the server is running on a different machine: http://<tmweb_server>//tmweb.net
9. Log in to the TravelDocs2 application.
 Make sure you log on to the TravelDocs2 application and not the TravelDocs application.
10. On the Operations tab, click the ManualPageID shortcut. You should see the last page of the batch
displayed in the AIndex web client. Although the PageID task assigned the page type “Other” to this
unidentified page, AIndex assigns the first page type (“Rental_Agreement”) but forces you to change it
since it has STATUS=1.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
365
TASKMASTER WEB AND REMOTE SCANNING
11. Use the drop-down lists at the top of the batch tree view pane to set the document type to Flight and the
page type to Air_Ticket.
12. When you select the Air_Ticket page type, Taskmaster activates the Anchors button prompts you to set
anchors for the page. Click OK to close the message box.
13. Click the Anchors button. Taskmaster prompts you to select a thumbnail image. Click OK to close the
message box.
 Taskmaster displays all fingerprints that have an anchor field with the same name as the anchor
field defined for the selected page type. In our case, the Vendor_Logo field is defined only for the
Air_Ticket page, so only the air ticket fingerprint thumbnails are displayed.
14. Double-click the Airline #2 fingerprint thumbnail (the second one) to select it. Then use the mouse to
align the red anchor object over the Airline #2 vendor logo
15. Click Done to complete the batch. Then click OK and Stop.
16. Since the batch is now pending for the Rulerunner task, double-click the Rulerunner icon in the
Taskmaster Client window. When the task completes click Stop.
17. In Taskmaster Web, click the Verify/FixUp shortcut to open the batch in AIndex for verification.
18. Complete the batch by clicking Submit for each page. On page TM000004, set the car type to “Other.”
On pages TM000006 and TM000013, select the yellow
checkbox at the top of the page
to override the validation failures.
 The Submit button is at the bottom of the data entry panel. You may need to scroll down to see it.
Also, you may notice that the “Options” groups on the Rental_Agreement pages don‟t let you select
multiple options since the AIndex web client doesn‟t support the MultiPunch variable.
366
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
TASKMASTER WEB AND REMOTE SCANNING
TESTING THE MANUALIDVALIDATE RULE
We‟ll run another batch through the workflow but this time we‟ll change the page status on one of the pages
to invoke the ManualIDValidate profile.
1. In the Taskmaster Client window, double-click the VScan icon and wait for the task to complete. Then
click Stop.
2. Double-click the PageID icon and wait for the task to complete. Then click Stop.
3. Double-click the CreateDocs icon and wait for the task to complete. Then click Stop.
4. On the Taskmaster Web Operations tab, click the ManualPageID shortcut.
5. For the last page, set the document type to Flight and the page type to Air_Ticket. Then use the
Anchors button to assign the matching fingerprint and register the page, as you did before.
6. In the batch tree view pane, scroll to the top of the page list and select the first Rental_Agreement page.
7. Use the top drop-down list to change the document type to Hotel. Then change it back to Car_Rental.
This causes AIndex to set the page status to “failed” (99).
 If you want to confirm this, click Hold to put the batch on hold and then open the file
manualpageid.xml located in the current batch folder. You should see <V n="STATUS">99</V>
under page TM000001. Then re-open the batch by clicking the batch ID on the Taskmaster Web
Monitor tab.
8. With the Car Rental #1 page displayed, click Submit. AIndex runs the validation rule and changes the
page status to 0.
9. Click Done to complete the batch. Then click OK and Stop.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
367
Chapter 20
FINGERPRINT MANAGEMENT
We‟ve used fingerprints throughout this guide, both for page identification and for specifying recognition
zones. In this chapter, we‟ll review basic fingerprint functionality, look in more detail at the fingerprint
database, and examine an alternative method for storing zone position information using fingerprint XML
(FPXML) files. At the end of the chapter, we‟ll update the TravelDocs application to use FPXML.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
369
FINGERPRINT MANAGEMENT
REVIEW OF BASIC FINGERPRINT FUNCTIONALITY
In Taskmaster applications, fingerprints have two basic functions:

You can use them during page identification to determine if the incoming page matches a known page:
Incoming page fingerprint (type unknown)
?
?
 No
Fingerprint library
Car Rental #1
rental agreement

?
 No
Car Rental #2
rental agreement
?
 No
Airline #1
air ticket
 Yes
Hotel #1 room
receipt
You can use them to identify the field positions for each variant of each known page type:




 Pickup_Date

Air_Ticket
 Pickup_Location
 Return_Date
 Return_Location
 Total_Cost





370
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
FINGERPRINT MANAGEMENT
CREATING FINGERPRINT FILES
Taskmaster provides two basic methods for creating fingerprint files:

Image analysis: This scans the page image to identify the composite “blackness” of different regions of
the page. This method provides fast page identification, but requires that you perform recognition later.

Full page recognition: This performs optical character recognition to identify the locations of text
within the page. This method takes longer, especially with pages that include handwritten text, but cuts
time from subsequent workflow tasks since the full page recognition results are available for use.
 The method you use for creating library fingerprints must be the same as the method you use to
generate runtime fingerprints during page identification.
Whichever method you use, Taskmaster creates two files each time you generate a new fingerprint:

A TIFF file with an image of the page

A CCO file with the fingerprint information
Additionally, if you‟re using full page recognition, Taskmaster creates:

An XML file (<fingerprint_id>c.xml) with the recognition results. This is a temporary file generated during
recognition.
ADDING FINGERPRINTS TO THE FINGERPRINT LIBRARY
You can add new fingerprints to the fingerprint library from the Datacap Studio Zones tab. Each time you
add a new fingerprint, Taskmaster executes the FingerprintAdd ruleset. This is where you define the
fingerprint generation method (image analysis or full page recognition).
You can also add new fingerprints and define the recognition zones using actions. We did this earlier to create
fingerprints for manually identified pages (see “Creating the AutoFingerprint ruleset” on page 303).
Fingerprint files are stored in the application‟s fingerprint folder. The location of this folder is specified
through the Taskmaster Application Manager and stored in the application configuration (.app) file.
In addition to saving the TIFF and CCO files, Taskmaster creates an entry for the new fingerprint in the
fingerprint database (see “About the fingerprint database” on page 373). The fingerprint database is also
specified in the .app file (see screen above).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
371
FINGERPRINT MANAGEMENT
DEFINING FIELD ZONES
The position of each field zone is defined by coordinates (x1, y1, x2, y2) that specify the top left and bottom
right corners of the zone relative to the top left corner of the page.
During recognition, Taskmaster uses the zone information to determine where the required information is
located on the page. If the runtime page image doesn‟t align precisely with the matched fingerprint,
Taskmaster uses the calculated offset values to adjust the zone positions.
When you define the field zones in Datacap Studio, Taskmaster writes the position information into the
document hierarchy. If you‟re defining the zones during verification, the iloc_SetZones action does the
same. The document hierarchy XML file below shows the position of the “Pickup_Date” field for three
different fingerprints:
<F type="Pickup_Date">
<V n="ID">0</V>
<V n="TYPE">Field</V>
<V n="STATUS">0</V>
<V n="POSITION">0,0,0,0</V>
<V n="MIN_TYPES">0</V>
<V n="MAX_TYPES">0</V>
<V n="ReqConf">8</V>
<V n="rules">&lt;in&gt;&lt;r id=&quot;1&quot; rs=&quot;9&quot; </V>
<V n="Pos556">183,402,535,463</V>
 Zone position for fingerprint 556
<V n="Pos558">568,331,967,389</V>
 Zone position for fingerprint 558
<V n="Pos560">1199,389,1600,448</V>
</F>
 Zone position for fingerprint 560
It‟s also possible to store the field position information for each fingerprint in a separate file (see “Using
fingerprint XML files” on page 374). This can be especially helpful if your application has a large number of
fingerprints.
372
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
FINGERPRINT MANAGEMENT
ABOUT THE FINGERPRINT DATABASE
The information to manage the application‟s fingerprint files is stored in the application‟s fingerprint
database. By default, this is an Access database file (<app_name>Fingerprint.mdb) that‟s stored in the root of
the application folder. The connection string for the fingerprint database is defined through the Taskmaster
Application Manager and stored in the application configuration file (see the screen shot on page 371).
The fingerprint database includes three tables:

Host: This defines the name, host ID, and reference ID for each fingerprint class, for example:
Name
Host ID
Reference ID
<Global>
9
-1
Car_Rental
226
DC226
Flight
227
DC227
Hotel
228
DC228
 You can see the host ID and reference ID by moving the mouse pointer over the class name in
Datacap Studio, as shown on the right above.


PageType: This defines the name and ID for each page type, for example:
Page Type Name
Page Type ID
Other
1
Rental_Agreement
40
Optional_Insurance
41
Air_Ticket
42
Room_Receipt
43
Meals
44
Other_Charges
45
Template: This defines the ID, CCO file, TIFF file, host ID (class), and page type for each fingerprint,
for example:
ID
CCO Path
Image Path
Host ID
Page Type
555
C:\Datacap\...\fingerprint\555.cco
C:\Datacap\...\fingerprint\555.tif
9
1
556
C:\Datacap\...\fingerprint\556.cco
C:\Datacap\...\fingerprint\556.tif
226
40
557
C:\Datacap\...\fingerprint\557.cco
C:\Datacap\...\fingerprint\557.tif
226
41
558
C:\Datacap\...\fingerprint\558.cco
C:\Datacap\...\fingerprint\558.tif
227
42
559
C:\Datacap\...\fingerprint\559.cco
C:\Datacap\...\fingerprint\559.tif
228
43
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
373
FINGERPRINT MANAGEMENT
USING FINGERPRINT XML FILES
As we‟ve seen so far, a fingerprint typically consists of:

An entry in the fingerprint database

A TIFF image file

A CCO fingerprint file

Position information stored in the document hierarchy XML file
This setup works well when the number of fingerprints is small, but as the number of fingerprints increases
the document hierarchy file gets bigger and the time to locate the position information increases.
ABOUT FPXML
An alternative approach lets you move a fingerprint‟s zone position information out of the document
hierarchy and into a separate fingerprint XML (FPXML) file.
<F type="Pickup_Date">
<V n="ID">0</V>
<V n="TYPE">Field</V>
<V n="STATUS">0</V>
etc.
<V n="Pos556">183,402,535,463</V>
<V n="Pos558">568,331,967,389</V>
<V n="Pos560">1199,389,1600,448</V>
</F>
<F type="Pickup_Location">
<V n="ID">0</V>
<V n="TYPE">Field</V>
<V n="STATUS">0</V>
etc.
<V n="Pos556">180,528,532,589</V>
<V n="Pos558">573,448,967,502</V>
<V n="Pos560">1199,335,1600,387</V>
</F>
<S>
<P type="Rental_Agreement">
<V n="HostID">226</V>
<V n="HostName">Car_Rental</V>
<F type="Pickup_Date">
<V n="Position">183,402,535,463</V>
</F>
<F type="Pickup_Location">
<V n="Position">180,528,532,589</V>
</F>
etc.
</P>
</S>
Fingerprint XML file (556.xml)
Document hierarchy XML file
Although this increases the total number of files (since each fingerprint now has an TIFF file, a CCO file, and
an XML file), this approach has several benefits:

The document hierarchy XML file remains small

Overall performance can increase

It eliminates possible contention if multiple users attempt to add fingerprints simultaneously
 These benefits apply mostly to applications that use dynamic (action-based) fingerprint generation, since
Datacap Studio writes to the document hierarchy as well as the fingerprint XML file when FPXML is
enabled.
374
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
FINGERPRINT MANAGEMENT
ENABLING FPXML
This section covers two scenarios:

Adding new fingerprints using the Datacap Studio Zones tab

Adding new fingerprints from a verification panel using actions
USING THE DATACAP STUDIO ZONES TAB
By default, Datacap Studio writes the zone position information to the application‟s document hierarchy
XML file.
To configure Datacap Studio to write zone position information to a fingerprint XML file:

Select the Enable FPXML option in the Taskmaster Application Manager.
After enabling FPXML and restarting Datacap Studio, each time you add a new fingerprint or modify an
existing fingerprint‟s zones, Datacap Studio creates or updates the fingerprint XML file.
 Datacap Studio also adds the information to the document hierarchy XML file since it uses this
information to display the zone outlines on the Image tab.
USING ACTIONS
Earlier in this guide, we showed how you can generate fingerprints automatically from a verification panel
using the iloc_SetZones action in the “Intellocate” library. Another actions library, the “FPXML” library,
includes actions for reading and writing zone information using fingerprint XML files.
Library
Action
Description
FPXML
SetDirectoryFPX
Sets the location for the fingerprint XML files.
FPXML
ReadZonesFPX
Loads the zone position information for the current fingerprint.
FPXML
WriteZonesFPX
Writes the position information for all fields on the current page.
We‟ll demonstrate how to use these actions later when we update the TravelDocs application (see “Updating
the AutoFingerprint ruleset” on page 377).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
375
FINGERPRINT MANAGEMENT
EXPORTING EXISTING POSITION INFORMATION FROM THE DOCUMENT HIERARCHY
The techniques described in the previous section work for fingerprints generated after you‟ve decided to use
FPXML. For existing fingerprints, the zone position information still resides in the document hierarchy.
Taskmaster‟s Fingerprint Maintenance Tool (FMT) lets you export existing position information from the
document hierarchy and into individual XML files. The FMT ships with Taskmaster 8.0.1 as an APT-specific
utility. However, it‟s possible to use the tool with other Taskmaster applications by following the instructions
below.
 If you move existing zone position information out of the document hierarchy and into FPXML files,
you‟ll need to update the actions your application uses to read the zone information. The ReadZones
action reads information from the document hierarchy, whereas the ReadZonesFPX action reads
information from FPXML files.
SETTING UP THE FMT FOR YOUR APPLICATION
1. In the folder C:\Datacap\APT\dco_APT, locate the files Fingerprint Maintenance Tool.exe and
Interop.DCAppleLib.dll.
2. Copy the two files:
From: C:\Datacap\APT\dco_APT
To: C:\Datacap\<app_name>\dco_<app_name> (where <app_name> is the name of your application)
3. Using a text editor, create a file called Settings.ini in the C:\Datacap\<app_name>\dco_<app_name>
folder and paste in the following information (replace <app_name> with the name of your application):
[Database]
FingerprintDatabase=Provider=Microsoft.Jet.OLEDB.4.0;Data
Source=C:\Datacap\<app_name>\<app_name>Fingerprint.mdb;Persist Security
Info=False
[Paths]
FingerprintDirectory=C:\Datacap\<app_name>\fingerprint
FingerprintBackupDirectory=C:\Datacap\<app_name>\Fingerprint Backup
SetupDCO=C:\Datacap\<app_name>\dco_<app_name>\<app_name>.xml
[FMT]
FilteredSummary=Select
Template.tp_TemplateID,Template.tp_DateAdded,Template.tp_HitCount,Template.tp_L
astHit,Host.hs_RefName FROM Template,Host WHERE host.hs_HostID =
Template.tp_HostID
EXPORTING THE POSITION INFORMATION
 This procedure removes the zone position information from the document hierarchy XML file
(C:\Datacap\<app_name>\dco_<app_name>\<app_name>.xml). Please make a backup copy of this file
before performing an export.
1. Double-click Fingerprint Maintenance Tool.exe.
2. Confirm that the information displayed at the top of the FMT window is correct.
3. Click the DCO to FPXML button and click OK to create the FPXML files.
376
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
FINGERPRINT MANAGEMENT
TRAVELDOCS: UPDATING AUTO FINGERPRINTING TO USE FPXML
In this section, we‟ll update the AutoFingerprint ruleset to save new fingerprint position information in a
separate fingerprint XML file. We‟ll also update the Recognize Page rule to read the zone information from a
fingerprint XML file, if one exists.
UPDATING THE AUTOFINGERPRINT RULESET
1. Start Datacap Studio and connect to the TravelDocs application.
2. On the Rulemanager tab, in the Rulesets pane, select the AutoFingerprint ruleset and click the
Lock/Unlock ruleset button.
3. Expand the AutoFingerprint ruleset completely.
4. Right-click the ilocSetZones action and choose Remove.
5. Add the actions and parameters in the table below to the end of Function1.
Library
Action
Parameter
FPXML
SetDirectoryFPX
@APPPATH(fingerprint)
FPXML
WriteZonesFPX
@D.TYPE,@P.TYPE,@P.TYPE
 The WriteZonesFPX action sets the fingerprint host name to the current document type, the
fingerprint host ID to the current page type, and the fingerprint page type to the current page type.
6. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish Ruleset. The
finished ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
377
FINGERPRINT MANAGEMENT
UPDATING THE RECOGNIZE PAGE RULE
1. On the Datacap Studio Rulemanager tab, in the Rulesets pane, select the Recognize ruleset and click
the Lock/Unlock ruleset button.
2. Expand the Recognize ruleset and the Recognize Page rule.
3. Change the name of the function to Recognition: Fingerprint - Non-FPXML.
4. Remove the rrCompareNot action and replace it with the following action and parameters:
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object 2 = Fingerprint
5. Right-click the Recognize Page rule and choose Add Function.
6. Rename the new function and Recognition: Fingerprint - FPXML and use the  button to move it to
the beginning of the rule (before the other recognition function).
7. Add the actions and parameters below to the Recognition: Fingerprint – FPXML function.
Library
Action
Parameter
rrunner
rrCompare
object1 = @P.MatchType
object 2 = Fingerprint
FPXML
SetDirectoryFPX
@APPPATH(fingerprint)
FPXML
ReadZonesFPX
Recog_Shared
SnapCCOtoDCO
8. Click the Save button. Then click the Lock/Unlock ruleset button and choose Publish Ruleset. The
finished ruleset should look like the one below.
 ReadZonesFPX fails if there is no matching FPXML file and Taskmaster executes the “NonFPXML” function.
378
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
FINGERPRINT MANAGEMENT
PREPARATIONS FOR RUNNING A BATCH THROUGH THE WORKFLOW
1. Open the Rulerunner Quattro Manager window. If the Quattro service is running, click Stop and wait for
the service to stop.
2. In Datacap Studio, click the Zones tab and then click the Refresh
button.
3. Check to see if there is already a fingerprint for the Airline #4 ticket. If there is, select it and click the
Remove selected button.
 The Airline #4 fingerprint, if you have one, will be under its own “Flight” class.
4. In Windows Explorer, open C:\Datacap\TravelDocs\images and make sure you have the files
Images_Page_01.tif through Images_Page_13.tif and NewAirline.tif.
RUNNING A BATCH THROUGH THE WORKFLOW
1. Start Taskmaster Client and log in to the TravelDocs application.
2. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
3. Double-click the PageID icon. When the task completes, click Stop.
4. Start Taskmaster Web and log in to the TravelDocs application.
5. On the Taskmaster Web Operations tab, click the ManualPageID shortcut and wait for the page images
to load. Then scroll to the bottom and set the page type for the last page to Air_Ticket.
6. Click Done and then click OK and Stop.
7. In the Taskmaster Client window, double-click the CreateDocs shortcut. When it completes, click Stop.
8. Double-click the Rulerunner shortcut. When the task completes, click Stop.
9. Check the Job Monitor. You should see the result of the split, where the child job is pending for the
Supervisor Verify task (row 1 below) and the main job is pending for the Main Verify task (row 3 below).
10. On the Taskmaster Web Operations tab, click the Supervisor Verify shortcut to open the pending batch.
The Airline #4 page is displayed.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
379
FINGERPRINT MANAGEMENT
11. Define the zone for each field as you did earlier (see “Running a batch through the workflow” on
page 306).
12. Click Submit and then click OK. In the background, Taskmaster runs the AutoFingerprint ruleset to
create the new fingerprint and the fingerprint XML file. Then click OK and Stop.
13. In Windows Explorer, open C:\Datacap\TravelDocs\fingerprint and locate the most recent
fingerprint. You should see a .cco, a .tif, and a .xml file for the new fingerprint, for example:
14. Open the fingerprint XML file in a text editor. It should look like the one below.
<S>
<P type="Air_Ticket">
<V n="HostID">Air_Ticket</V>
<V n="HostName">Flight</V>
<F type="Outbound_From">
<V n="Position">649,488,1046,537</V>
</F>
<F type="Outbound_To">
<V n="Position">1052,486,1477,545</V>
</F>
<F type="Outbound_Date">
<V n="Position">182,484,386,541</V>
</F>
<F type="Return_From">
<V n="Position">646,619,1023,674</V>
</F>
<F type="Return_To">
<V n="Position">1053,621,1475,671</V>
</F>
<F type="Return_Date">
<V n="Position">188,617,360,675</V>
</F>
<F type="Airfare">
<V n="Position">1374,884,1513,928</V>
</F>
<F type="Taxes">
<V n="Position">1391,936,1509,981</V>
</F>
<F type="Total_Cost">
<V n="Position">1387,989,1522,1037</V>
</F>
</P>
</S>
 If you run another batch through the workflow, it runs from end to end with no branching or
splitting since the Airline #4 page is now in the fingerprint library. This time recognition task
profile uses the zone position information from the fingerprint XML file.
380
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 21
MOVING YOUR APPLICATION INTO PRODUCTION
So far, we‟ve done all of our application development and batch processing on a single machine. This is fine
for initial development and for low throughput testing by a single user, but eventually you‟ll need to migrate
your application to an environment that can support multiple users and process large numbers of pages
rapidly.
Taskmaster‟s scalable multi-machine architecture lets you run different Taskmaster components on different
machines. You can run multiple instances of processor-intensive components like Quattro, and Taskmaster
Web clients let multiple users scan and process batches simultaneously from distributed locations.
Additionally, while Access is fine for initial testing when performance requirements are minimal, most
applications will require a dedicated database server like SQL Server® or Oracle®.
In this chapter, we‟ll look briefly at Taskmaster‟s multi-machine architecture. Detailed discussion is beyond
the scope of this guide, but we‟ll cover some of the basic considerations and look at how the various
components work together in a distributed environment. Most of this chapter is devoted to migrating the
TravelDocs application from a single machine to a simple 3-machine distributed environment that uses SQL
Server.
 You can complete the hands-on portion of this chapter using SQL Server or SQL Server Express. If
you want to configure a simple demo setup, SQL Server Express is sufficient and is available for
download from the Microsoft website.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
381
MOVING YOUR APPLICATION INTO PRODUCTION
TASKMASTER’S MULTI-MACHINE ARCHITECTURE
Taskmaster‟s scalable multi-machine architecture lets you run different Taskmaster components on different
machines and run multiple instances of processor-intensive components like Quattro.
n
Client Workst n
Client Workst n
Client Workst
Browser
n
Client Workst n
Client Workst n
Client Workst
TM Web Server
IIS
Taskmaster
Web
Taskmaster
Client
Components can access the
fingerprint and lookup databases
and the batches folder directly,
or via Taskmaster Server
Database Server
FINGERPRINT
RV2 Web Server
LOOKUP
IIS
RV2
Taskmaster
Server
NENU Workst
Taskmaster
Server
n
TM Client
File Server
 Batches
 Applications
Datacap Studio
Database Server
NENU
ADMIN
RV2 is Taskmaster’s
report viewer component
NENU is Taskmaster’s
notification and batch
management component
Quattro Server
Quattro Server
Quattro
Server
Rulerunner
Quattro
WRRS Server
IIS
Web Rulerunner
Service
ENGINE
Only Taskmaster
Server can access
these databases
Detailed discussion of optional components like RV2, NENU, and WRRS is beyond the scope of this guide,
but later in this chapter we‟ll configure the TravelDocs application to run across three machines:
382

One machine running Taskmaster Server and hosting the application

One machine running Quattro and a Taskmaster Web server

One machine running SQL Server (or SQL Server Express) and a Taskmaster Web client
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
LOCATING APPLICATIONS AND THEIR COMPONENTS ON THE NETWORK
When you have different Taskmaster components running on different machines, you need a mechanism to
ensure that each component can locate the information it needs to do its job. Taskmaster uses two key files to
achieve this:

Datacap.xml defines the location of each Taskmaster application. Each Taskmaster component can
maintain its own local datacap.xml file, or multiple components can share a centralized datacap.xml file.
You use the Taskmaster Application Manager‟s Service tab to define the location of datacap.xml.
Windows Registry

The application configuration (.app) file defines the location of the various databases and folders an
application requires. Each Taskmaster application has its own .app file that you also manage using the
Taskmaster Application Manager.
The diagram below shows two Taskmaster machines sharing a centralized datacap.xml file that defines the
locations of three Taskmaster applications.
n
Client Workst 1
Taskmaster Client
Application
Manager
app1.app
n
Client Workst 2
Taskmaster Client
Taskmaster
Application
Manager
Client
datacap.xml

Use the App Mgr. to
locate Datacap.xml



app2.app
Taskmaster
Server
Taskmaster
Server
Database Server
app3.app
File Server
 Applications
 Batches
<datacap ver="8.0">
<app name= "App1" ref="\\svr\datacap\app1"/>
<app name= "App2" ref="\\svr\datacap\app2"/>
<app name= "App3" ref="\\svr\datacap\app3"/>
</datacap>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
383
MOVING YOUR APPLICATION INTO PRODUCTION
TRAVELDOCS: MIGRATING THE TASKMASTER DATABASES TO SQL SERVER
To complete this section, you‟ll need access to a SQL Server. If you want to create your own isolated test
environment, you can use Microsoft SQL Server Express, which you can download from the Microsoft
website. The instructions below assume that you‟re using SQL Server Express installed on a separate
machine. If you‟re using a shared SQL Server, please consult with your SQL administrator.
CREATING THE SQL SCRIPTS
1. Close the Taskmaster Client window and Datacap Studio.
2. Open the Taskmaster Server Manager window and click Stop to stop the service.
3. In Windows Explorer, open C:\Datacap\support\DBSQL and double-click SQLserverDB.exe.
4. Click the Browse […] button beside the Save SQL File As field.
5. Select the C:\Datacap\TravelDocs folder, type TravelDocsAdm.sql, and click Save.
 When you specify the target file name you must include the .sql extension.
6. Make sure the Database Type field is set to Admin.
7. Select the Transfer Records From a Database option and select Access.
8. Click the Browse […] button beside the Access Database File field.
9. Select the admin database file C:\Datacap\TravelDocs\TravelDocsAdm.mbd and click Open.
10. Click Create Database Script File, confirm the settings are correct, and click Yes. Then click OK.
11. Repeat to create the SQL script files for the engine and fingerprint databases using the settings below.
Engine database
Fingerprint database
Save SQL
File As
C:\Datacap\TravelDocs\TravelDocsEng.sql
C:\Datacap\TravelDocs\TravelDocsFingerprint.sql
Database
Type
Engine
Fingerprint
C:\Datacap\TravelDocs\TravelDocsEng.mdb
C:\Datacap\TravelDocs\TravelDocsFingerprint.mdb
Transfer
Records
Access
Database
12. Close the SQL Server Script Utility window.
384
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
CREATING THE SQL DATABASES
Follow the steps below to create the three SQL databases and configure them using the SQL scripts. You‟ll
need administrator rights on the SQL Server to do this.
1. Copy the three SQL scripts from the Taskmaster machine to the SQL Server machine.

TravelDocsAdm.sql

TravelDocsEng.sql

TravelDocsFingerprint.sql
2. Start the SQL Server Management Studio (for example, Start > All Programs > Microsoft SQL Server
2008 > SQL Server Management Studio) and log on to the server as an administrator.
3. In the SQL Server Management Studio, right-click the Databases object and choose New Database.
4. In the Database name field, type TravelDocsAdm and click OK.
5. Repeat to create the TravelDocsEng and TravelDocsFingerprint databases.
6. Select the TravelDocsAdm database and click the File Open
button.
7. Select the file TravelDocsAdm.sql and click Open.
8. Click the ! Execute button.
9. Repeat for the TravelDocsEng and TravelDocsFingerprint databases using the files
TravelDocsEng.sql and TravelDocsFingerprint.sql.
10. Expand each of the databases and confirm that the Taskmaster tables were created. For example, if you
expand the TravelDocsFingerprint database you should see the dbo.Host, dbo.PageType, and
dbo.Template tables.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
385
MOVING YOUR APPLICATION INTO PRODUCTION
ENABLING REMOTE ACCESS TO THE SQL SERVER DATABASES
The instructions below assume that you‟re running SQL Server Express on a separate machine. If you‟re
running SQL Server Express on the same machine as Taskmaster, these steps aren‟t required. If you‟re using a
shared SQL Server, please consult with your SQL administrator.
1. Start the SQL Server Configuration Manager (for example, Start > All Programs > Microsoft SQL
Server 2008 > Configuration Tools > SQL Server Configuration Manager).
2. Expand SQL Server Network Configuration and select Protocols for SQL(Express).
3. Make sure TCP/IP is enabled. If necessary, double-click the TCP/IP entry and set Enabled to Yes.
4. Close the SQL Server Configuration Manager.
5. In SQL Server Management Studio Object Explorer pane, expand the Security folder.
6. Right-click Logins and choose New Login.
7. Enter a login name for the Taskmaster machine (we‟re using “Taskmaster”), specify SQL Server
authentication, and enter a password. Then deselect Enforce password policy.
 You can use Windows authentication or SQL Server authentication. Using SQL Server
authentication will make setup easier for this tutorial.
8. In the Login-New window, click the Server Roles page and select the sysadmin role.
 Configuring the Taskmaster account as a sysadmin is the easiest way to ensure Taskmaster has the
permissions it needs to write to the Taskmaster databases. However, this would clearly not be
appropriate for a production environment. If you prefer, you can customize the login configuration
so the Taskmaster account has only the permissions it needs to read from and write to the
Taskmaster databases.
9. Click OK to close the Login window.
386
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
MODIFYING THE APPLICATION CONFIGURATION FILE TO USE THE SQL DATABASES
In this section, we‟ll configure the SQL database connection strings through the Taskmaster Application
Manager. The connection strings are stored in the application configuration (.app) file.
1. Start the Taskmaster Application Manager, select TravelDocs, and click the Taskmaster tab.
2. Click the Browse […] button beside the Administrator field.
3. In the Database type field, select Microsoft SQL Server.
4. In the Server field, enter the name of your SQL Server.
5. In the Database field, enter TravelDocsAdm.
6. Under Database authentication, deselect  Network authentication.
7. Select  User and Password, enter the SQL account and password you created for Taskmaster, and
click OK.
 If you configured the Taskmaster account to use Windows authentication, leave Network
authentication selected and leave the User and Password fields unchecked and blank.
8. Repeat for the Engine field to configure the application to use the TravelDocsEng SQL database.
9. In the Taskmaster Application Manager window, click the Application tab.
10. Click the Browse […] button beside the Fingerprint database field and configure the application to
use the TravelDocsFingerprint SQL database.
11. Close the Taskmaster Application Manager window.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
387
MOVING YOUR APPLICATION INTO PRODUCTION
RUNNING A BATCH THROUGH THE WORKFLOW
 If you didn‟t complete the earlier section where we configured Quattro for the TravelDocs application,
please do this before proceeding (see “Automating background task processing using Quattro” on
page 282).
1. In the Taskmaster Server Manager window, click Start to restart the Taskmaster Server service.
2. In the Rulerunner Quattro window, click Start to restart the Quattro service.
3. Start the Taskmaster Client and log into the TravelDocs application as usual.
 SQL Server databases
 If Taskmaster is unable to connect to the SQL administrator and engine databases, make sure you
restarted the Taskmaster Server, that the Taskmaster account on the SQL Server is configured to
allow read and write access to the databases, and that you entered the correct login information in
the Taskmaster Application Manager.
4. Double-click the VScan icon to create a new batch. When the task completes click Stop.
5. In the Job Monitor window, watch as Quattro moves the batch through the PageID, CreateDocs, and
Rulerunner tasks.
6. When the batch is ready for verification, start Taskmaster Web and log in to the TravelDocs application.
Then click the Verify/Fixup shortcut and complete the batch as before, setting the car type on page
TM000004 to “Other” and clicking OK to the validation failures on pages TM000006 and TM000013.
7. In the Job Monitor window, watch as Quattro runs the export task.
388
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
TRAVELDOCS: MOVING THE APPLICATION TO A PRODUCTION SERVER
To complete this section, you‟ll need another machine with a complete installation of Taskmaster. This
machine we‟ll refer to as the “production server,” whereas the one you‟ve been using to develop the
application we‟ll refer to as the “development machine.” The two machines must be logged on to a domain.
SETTING UP THE PRODUCTION SERVER
Before you can copy the application to the production server, you need to share the C:\Datacap folder and
give the various Taskmaster service accounts the required access. Typically, you define various roles and
create the associated network login accounts, and then assign permissions as appropriate. Full details are
provided in the IBM Datacap Taskmaster Capture Installation and Configuration Guide. To make things easier for
this demo configuration, we‟ll simply share the folder and give everyone full control, although this wouldn‟t
be appropriate in a real production environment.
Additionally, Taskmaster components running in a multi-machine environment use TCP port 2402 to access
the Taskmaster Server. If there‟s a firewall running on the production server, you need to open this port.
1. On the production server, share the C:\Datacap folder and give Everyone: Full control.
2. If a firewall is enabled, open TCP port 2402. This step is firewall specific, so please check with a system
administrator if you need help.
COPYING THE APPLICATION TO THE PRODUCTION SERVER
We‟ll use the Datacap Studio Application Wizard to move the application from the development machine to
the production server. The Application Wizard won‟t work with applications configured to use databases
other than Access, so we‟ll need to reconfigure TravelDocs to use the Access databases first. After copying
the application we can reconnect the SQL databases.
SQL Server

X
Development
machine

Application
Wizard
Production
server

C:\Datacap\TravelDocs
\\<server>\Datacap\TravelDocs
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
389
MOVING YOUR APPLICATION INTO PRODUCTION
1. On the development machine, start the Taskmaster Application Manager, click the Taskmaster tab,
and configure the application to use the Access administration and engine databases
(TravelDocsAdm.mdb and TravelDocsEng.mdb in the C:\Datacap\TravelDocs folder).
2. Click the Application tab and configure the application to use the Access fingerprint database
(TravelDocsFingerprint.mdb in the C:\Datacap\TravelDocs folder).
3. Close the Taskmaster Application Manager.
4. On the development machine, start Datacap Studio but don‟t connect to any application. Instead, click
Close in the Application window.
5. In the Datacap Studio window, click the Datacap Application Wizard
button at the top right.
6. In the Datacap Application Wizard dialog, click Next.
7. Select Copy an existing RRS application and click Next.
8. In the top field, select the TravelDocs application.
9. Click the Browse […] button beside the Root folder field.
10. Select the shared “Datacap” folder on the production server and click OK.
11. Leave the Datacap folder field set to C:\Datacap.
12. Click the Browse […] button beside the Taskmaster Web folder field.
13. Select the tmweb.net subfolder of the shared “Datacap” folder on the production server and click OK.
14. Leave Rename copy unchecked.
15. Click Next and then click Finish. Then wait for the transfer to complete. This may take a few minutes
and you may see the Datacap Application Wizard display “Not responding” temporarily.
 If the wizard reports any errors, check the log file to determine the source. The most common
problems relate to having insufficient permissions on the target machine. It‟s important to resolve any
errors and copy the application successfully before proceeding.
390
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
CONFIGURING THE APPLICATION ON THE PRODUCTION SERVER
In this section, we‟ll configure the production server to run the TravelDocs application. The steps include:

Adding the new application to datacap.xml

Using the Taskmaster Application Manager to reconnect the SQL databases and set the server name

Starting the Taskmaster Server and updating the module paths using Taskmaster Administrator

Testing the configuration by running a batch through the workflow
UPDATING DATACAP.XML, RECONNECTING THE SQL DATABASES, AND SPECIFYING THE SERVER
1. On the production server, open C:\Datacap\datacap.xml in a text editor.
2.
Create an entry for the TravelDocs application like the one below:
<app name="TravelDocs" ref="\\<server>\<share>\TravelDocs"/>
where <server> is the name of the production server and <share> is the name of the share, for
example:
<app name="TravelDocs" ref="\\TMSERVER\Datacap\TravelDocs"/>
3. Save and close the file.
4. On the production server, start the Taskmaster Application Manager and select the TravelDocs
application.
5. Click the Taskmaster tab and configure the application to use the SQL administration and engine
databases (TravelDocsAdm and TravelDocsEng) on the SQL Server machine (see “Modifying the
application configuration file to use the SQL databases” on page 387).
6. Also on the Taskmaster tab, set the Server name/address field to the name or IP address of the
production server.
7. Click the Application tab and configure the application to use the SQL fingerprint database
(TravelDocsFingerprint) on the SQL Server machine.
8. Close the Taskmaster Application Manager.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
391
MOVING YOUR APPLICATION INTO PRODUCTION
STARTING THE TASKMASTER SERVER AND UPDATING THE MODULE PATHS
The copy wizard updates all of the application‟s hardcoded paths. However, there are certain paths that are
stored in the administration database. Since we had to decouple our SQL databases, the copy wizard can‟t
update these for us, so we‟ll have to do this manually.
1. On the production server, open the Taskmaster Server Manager (Start > All Programs > Datacap >
Taskmaster Server > Taskmaster Server Manager ). If the Taskmaster Server isn‟t already running,
click Start to start the service.
2. Also on the production server, start the Taskmaster Client (Start > All Programs > Datacap >
Taskmaster Client > Taskmaster Client) and select the TravelDocs application.
3. Confirm that the Taskmaster Server field displays the production server and that the administrator and
engine database entries are set to the SQL Server machine. Then click OK and log on to the application.
4. In the Taskmaster Client window, click the Administrator
Administrator window and then click the Modules tab.
button to open the Taskmaster
5. Select the rrsAssemble module and change the C:\Datacap portion of the path in the Parameter field
to \\<server>\<share>, where <server> is the name of the production server and <share> is the share
name, for example:
\\TMSERVER\Datacap\TravelDocs\dco_TravelDocs\rrs_assemble.bpp
6. Repeat for the rrsCreateDocs and rrsRulerunner modules. Click Apply after each change.
 This step is required because these paths reside in the “taskmod” table in the administrator
database. Since we had to decouple the SQL databases before copying the application, the
Application Wizard can‟t update these paths automatically.
7. Click Done to close the Taskmaster Administrator window.
RUNNING A BATCH THROUGH THE WORKFLOW
1. In the Taskmaster Client window, double-click the VScan icon. When the task completes, click Stop.
2. Check the Job Monitor and confirm that the new batch is pending for the PageID task.
3. Manually run the batch through the PageID, CreateDocs, and Rulerunner tasks.
392
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
UPDATING THE FINGERPRINT DATABASE WITH THE SERVER PATH
As we saw earlier, Taskmaster stores the fingerprint paths in the fingerprint database‟s “Templates” table.
ID
CCO Path
Image Path
Host ID
Page Type
555
C:\Datacap\...\fingerprint\555.cco
C:\Datacap\...\fingerprint\555.tif
9
1
556
C:\Datacap\...\fingerprint\556.cco
C:\Datacap\...\fingerprint\556.tif
226
40
557
C:\Datacap\...\fingerprint\557.cco
C:\Datacap\...\fingerprint\557.tif
226
41
The copy wizard updates these paths in the Access database for you automatically. However, since our
application is using a SQL database, we‟ll need to do this ourselves. We haven‟t created any new fingerprints
since we started using the SQL database, and the Access file on the production server already contains the
updated paths, so we can recreate the SQL database from the Access file.
1. On the production server, use the SQL Server Script Utility to create a new TravelDocsFingerprint.sql
file (see “Creating the SQL scripts” on page 384 for details).
2. Copy TravelDocsFingerprint.sql from the production server to the SQL Server machine.
3. On the SQL Server machine, start the SQL Server Management Studio and log on as an administrator.
4. In the SQL Server Management Studio, expand the Databases object.
5. Select the TravelDocsFingerprint database and click the File Open
button.
6. Select the file TravelDocsFingerprint.sql and click Open.
7. Click the ! Execute button.
 The script will delete the existing fingerprint database tables and replace them with new ones
populated with the data from the Access database.
8. Expand the TravelDocsFingerprint database and expand the Tables node.
9. Right-click the dbo.Template table and choose Select Top 1000 Rows. Then confirm that the
fingerprint paths include the production server name.
 “Select Top 1000 Rows” is a utility script included with SQL Server Express 2008. If you‟re using
another version of SQL Server this script may not be available.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
393
MOVING YOUR APPLICATION INTO PRODUCTION
TRAVELDOCS: RUNNING THE APPLICATION IN A MULTI-MACHINE ENVIRONMENT
CONFIGURING THE QUATTRO/TASKMASTER WEB SERVER
Now that the TravelDocs application is installed and running on the production server, we‟ll configure the
development machine to act as our Quattro and Taskmaster Web server. Since Quattro, Taskmaster Web,
and IIS are already installed on the development machine, all we need to do is edit datacap.xml to point to the
new location. Note that the Taskmaster Server, Taskmaster Web, and Quattro machines must be logged on to
a domain.
SQL Server machine
SQL Server
Taskmaster Web
IIS
Quattro
TravelDocs App.
Taskmaster Server
Development machine
Production server
Domain
 Here, we‟re configuring the development machine‟s datacap.xml file to point to the TravelDocs
application on the production server. Alternatively, you could point the development machine to the
datacap.xml file on the production server by changing the application management path on the Service
tab in the Taskmaster Application Manager. However, if you do this, any other Taskmaster applications
on your development machine will be unavailable.
1. On the development machine, open the Rulerunner Quattro Manager window (Start > All Programs >
Datacap > Taskmaster Client > Rulerunner Quattro Manager).
2. Click Stop to stop the Quattro service.
3. Also on the development machine, open C:\Datacap\datacap.xml in a text editor.
4. Make sure the entry for the TravelDocs application looks like the one below:
<app name="TravelDocs" ref="\\<server>\<share>\TravelDocs"/>
where <server> is the name of the production server and <share> is the name of the share, for
example:
<app name="TravelDocs" ref="\\TMSERVER\Datacap\TravelDocs"/>
5. Save and close the file.
394
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
CONFIGURING THE TASKMASTER WEB CLIENT
To complete the multi-machine demo configuration, you‟ll need one more machine to act as the Taskmaster
Web client. If you have another non-Taskmaster machine available you can use this; otherwise, you can run
the web client on the SQL Server machine as shown below.
SQL Server machine
Web
Client
Taskmaster Web
IIS
Quattro
TravelDocs App.
Taskmaster Server
Development machine
Production server
Domain
1. On the development machine, open the C:\Datacap\support\WebConfiguration folder.
2. Make a backup copy of the file WebClientConfig.exe.config.
3. Open WebClientConfig.exe.config in a text editor.
4. Locate the line <value>http://localhost/tmweb.net</value> and change “localhost” to the
URL of your Taskmaster Web machine, for example:
<value>http://TMWebServer/tmweb.net</value>
5. Copy the following three files to the machine you‟re using as the web client:

WebClientConfig.exe

WebClientConfig.exe.config

Datacap.Config.dll
6. On the web client machine, double-click WebClientConfig.exe.
7. Click Configure. Then click OK and Exit.
8. On the web client machine, start Internet Explorer.
9. Click Tools > Internet Options and then click the Security tab.
10. Select Trusted Sites and click Sites.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
395
MOVING YOUR APPLICATION INTO PRODUCTION
11. If the Taskmaster Web machine is not already listed as a trusted site, add the URL for the web server.
12. Close the Trusted Sites dialog and the Internet Options dialog.
13. In Internet Explorer, enter the URL for Taskmaster Web, for example:
http://TMWebServer/tmweb.net
14. Log into the TravelDocs application using User ID: admin; Password: admin; Station: 2.
 If the logon page is displayed but you‟re unable to log on, try using a different station ID.
CONFIGURING A REMOTE VSCAN TASK
The goal in this chapter is to run a batch through the workflow using just the web client and Quattro. We
don‟t yet have a way to create a batch from a web client other than the remote RScan task, so here we‟ll create
a remote VScan task that we can run from the web client. We‟ll also copy the sample images to the web client.
1. On the production server, start Taskmaster Client, log into the TravelDocs application, and then open
the Taskmaster Administrator window.
2. Expand the TravelDocs workflow and the Web Job, and then select the Scan task.
3. In the Module field, select the iVScan module (not the iScan module). Then click Apply.
4. Open C:\Datacap\TravelDocs\dco_TravelDocs\vscan.icp in a text editor.
5. In the [Scan] section, make sure LocalProc=0 and set ScanDir=C:\Datacap\scan.
[Scan]
LocalProc=0
ScanDir=C:\Datacap\scan
 You may see the \\server name. This needs to be C:\Datacap.
6. On the web client machine, create two folders: C:\Datacap\images and C:\Datacap\scan.
7. Copy the TravelDocs images from C:\Datacap\TravelDocs\images on the production server into
C:\Datacap\images on the web client machine.
396
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
RUNNING THE REMOTE VSCAN TASK FROM THE WEB CLIENT
1. On the web client machine, log back into the TravelDocs application if necessary.
2. On the Operations tab, click the RScan shortcut and wait for the page to load. If you see any messages
like the one below, click the message to load the required add-on.
3. On the Web Job: VScanning page, click the Browse button beside the Source directory field.
4. Browse to C:\Datacap\images, select the first file (Images_Page_01.tif), and click Open.
5. Click Scan and then click OK. You should see the 13 page thumbnails.
6. Click Done and then click OK. The pages are now ready for upload.
UPLOADING THE FILES TO THE SERVER
1. On web client machine, open the C:\Datacap\scan folder and make sure there‟s a batch folder
containing the 13 TIF files.
2. On the Taskmaster Web Operations tab, click the Upload shortcut.
3. Wait while the web client transfers the batch to the server. Then click OK and Stop.
4. Click the Monitor tab. You should see the batch pending for the PageID task.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
397
MOVING YOUR APPLICATION INTO PRODUCTION
STARTING QUATTRO AND COMPLETING THE BATCH
1. On the development machine, open the Rulerunner Quattro Manager window (Start > All Programs >
Datacap > Taskmaster Client > Rulerunner Quattro Manager).
2. Click Start to start the Quattro service.
3. On the web client machine, on the Taskmaster Web Monitor tab, set the Refresh rate to 10 sec. Watch
as Quattro processes the batch through the PageID, CreateDocs, and Rulerunner tasks.
4. When the batch is pending for the Verify task, click the Operations tab.
5. Click the Verify/FixUp shortcut and complete the batch as before, setting the car type on page
TM000004 to “Other” and clicking OK to the validation failures on pages TM000006 and TM000013.
6. Click the Monitor tab and watch as Quattro runs the Export task.
398
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
TRAVELDOCS: ROLE-BASED BATCH PROCESSING
We‟ve run everything so far using the “admin” login. In this final section, we‟ll set up Operator and
Supervisor accounts and configure them so they can perform only specific tasks. We‟ll then run a batch with
an unidentified page through the workflow and have the Operator and Supervisor perform their respective
jobs.
CREATING THE OPERATOR AND SUPERVISOR ACCOUNTS
1. On the web client machine, if you‟re not already logged on to the TravelDocs application, start
Taskmaster Web and log on using the “admin” account.
2. Click the Administrator tab and then click the Users sub-tab.
 These are the default user accounts created by the Application Wizard.
3. Click |new| to create a new user account and enter the account details as shown for Account 1 below.
Then click Save User. After that, click |new| again and enter the details for Account 2 below.
Account 1
Account2
Name
Operator
Supervisor
Description
Operator
Supervisor
New password
password
password
Retype
password
password
password
Privileges
Job Monitor:
 <first option only>
 <first option only>
Clients:
 Taskmaster Web
 Taskmaster Web
Web Job:
 Scan
Supervisor Job:
Web Job:
 Upload
Web Job:
 Verify
ManualPageID Job:
 ManualPageID
Permissions
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
 Verify
399
MOVING YOUR APPLICATION INTO PRODUCTION
PREPARING TO RUN A NEW BATCH THROUGH THE WORKFLOW
We want to re-introduce the Airline #4 page and handle it as a manually identified page. This requires the
following steps that are detailed in the sections that follow:

Delete the existing Airline #4 fingerprint

Configure the branch and split conditions for the Web Job tasks

Copy the Airline #4 page to the web client machine
DELETING THE AIRLINE #4 FINGERPRINT
1. On the production server machine, start Datacap Studio and connect to the TravelDocs application
using the “admin” account.
2. Click the Zones tab, open the first Flight class, and select the Airline #4 fingerprint. Check the Image
View pane to confirm you‟ve selected the correct fingerprint.
3. Right-click the Airline #4 fingerprint and choose Remove selected.
4. Close Datacap Studio.
CONFIGURING THE BRANCH AND SPLIT CONDITIONS
When we configured the branch and split conditions earlier, we did so only for the Main Job. Here, we‟ll do
the same for the Web Job using the web client‟s Administrator tab.
1. On the web client machine, if you‟re not already logged on to the TravelDocs application, start
Taskmaster Web and log on using the “admin” account.
2. Click the Administrator tab and expand the Web Job workflow on the Workflow sub-tab.
3. Expand the PageID task and select the Page Identification Failed condition. Configure the values as
follows:
Spawn Type: Branch
Parent Status: Pending
Child Job: ManualPageID Job
Child Status: Pending
Steps: 1
4. Click Save condition.
5. Expand the CreateDocs task and select the Document Integrity Failed condition. Configure the
values as follows:
Spawn Type: Branch
Parent Status: Pending
Child Job: Fixup Job
Child Status: Pending
Steps: 1
6. Click Save condition.
7. Expand the Rulerunner task and select the Split Condition condition. Configure the values as follows:
Spawn Type: Split
Parent Status: Pending
Child Job: Supervisor Job
Child Status: Pending
Steps: 1
8. Click Save condition.
COPYING THE AIRLINE #4 PAGE TO THE WEB CLIENT MACHINE

400
Copy the file NewAirline.tif from the sample document download into the C:\Datacap\images folder
on the web client machine.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
MOVING YOUR APPLICATION INTO PRODUCTION
LOGGING ON AS THE OPERATOR AND CREATING A BATCH
1. On the web client machine, click the Logout link at the top right of the Taskmaster Web window and
click OK to confirm.
2. Log on to the TravelDocs application using the Operator account. You should see the shortcuts for the
four Operator tasks: ManualPageID, RScan, Upload, and Verify/FixUp.
3. Click the RScan shortcut and prepare a batch as before using the images in C:\Datacap\images. Make
sure there are 14 images this time, including the Airline #4 page.
4. Click the Upload shortcut and upload the images to the server as before.
5. Click the Monitor tab and set the Refresh rate to 10 sec. Then watch as Quattro runs the PageID task
and marks the batch as pending for the ManualPageID task.
6. Click the Operations tab, click the ManualPageID shortcut, and wait for the images to load.
7. Scroll to the bottom of the page and set the page type for the Airline #4 page to Air_Ticket.
8. Click Done. Then click OK and Stop.
9. Click the Monitor tab and watch as Quattro runs the CreateDocs and Rulerunner tasks.
The batch is now split, with the main batch pending for the Web Job.Verify task and the child batch
pending for the Supervisor Job.Verify task.
10. Click the Operations tab. Then click the Verify/FixUp shortcut and complete the batch as before,
setting the car type on page TM000004 to “Other” and clicking OK to the validation failures on pages
TM000006 and TM000013.
11. When you‟ve completed the main batch, click the Logout link at the top right of the Taskmaster Web
window and click OK to confirm.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
401
MOVING YOUR APPLICATION INTO PRODUCTION
LOGGING ON AS THE SUPERVISOR
1. Log on to the TravelDocs application using the Supervisor account. You should see the shortcut for the
one Supervisor task, as shown below.
2. Click the Supervisor Verify shortcut to display the child batch containing the Airline #4 page.
 If instead you see an error message “Unable to cache input page file,” make sure you‟re running
Taskmaster Capture 8.0.1 Fix Pack 1 or higher.
3. Define the zone for each field as you did earlier in the section on automatic fingerprint generation (see
“Running a batch through the workflow” on page 306).
4. Click Submit and then click OK. Taskmaster runs the AutoFingerprint ruleset to create the new
fingerprint file and add the zone information to the document hierarchy. Then click OK and Stop.
 This completes the section on multi-machine configuration. You may want to edit datacap.xml on
the development machine so it reverts to using the TravelDocs application on the local machine.
402
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Chapter 22
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND
EXECUTION ENVIRONMENT
The TravelDocs application you‟ve developed throughout this guide uses only actions from the standard
Taskmaster action libraries. The standard libraries provide an extensive range of capabilities that address most
typical application requirements, but as you develop more sophisticated applications you may need specialized
functionality they don‟t provide.
Taskmaster lets you develop custom actions to implement functionality that‟s not available in the standard
libraries. To create custom actions, you need a basic understanding of the Taskmaster object model and the
runtime execution environment.
This chapter is intended both as a starting point for application developers wishing to create their own
custom actions and for anyone who needs a more in-depth understanding of the runtime execution
environment. We‟ll examine the objects used when you execute a task profile and see how individual actions
can access and update the runtime batch hierarchy. At the end of the chapter we‟ll run a task profile and a
custom action from the Microsoft Visual Studio® debugger so we can examine the runtime objects in real
time.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
403
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
OVERVIEW OF MAIN STEPS IN EXECUTING A TASK PROFILE
The Taskmaster component that executes your application‟s rules is the Datacap Rulerunner Service
(DCRRS). The Rulerunner service runs in the background under the control of a Taskmaster client
application (Taskmaster Client, Taskmaster Web, Rulerunner Quattro, etc.) or an external application.
Before the Rulerunner service can begin executing a task profile, the Taskmaster client application must set
up the runtime environment. This involves creating several Datacap objects in memory and populating them
with the information the Rulerunner service needs to execute rules on the current batch. Information includes
the application name, the task profile name, the batch ID, the input runtime batch file, etc.
After the initial setup, the Taskmaster client application transfers control to the Rulerunner service. The
service reads the rulesets and rules for the specified task profile and loads the first object from the runtime
batch hierarchy into memory as the “current DCO object.”
The Rulerunner service goes through the runtime batch hierarchy one object at a time, loading each object in
turn and executing the applicable rules. As it processes each object, it updates the memory resident batch
hierarchy and logs status information according to the Rulerunner logging settings for the current task profile.
When the task profile completes, the Rulerunner service writes the updated runtime batch hierarchy and the
log file to disk and notifies the Taskmaster client that it is finished.
Client application
(Taskmaster Client, etc.)
Taskmaster application
Application name
Profile name
 Batch ID
Input batch file
Profiles and rulesets
Actions
Datacap objects
Doc hierarchy

Run rules
Datacap Rulerunner
Service (DCRRS)

Load rules and
batch info

Write batch and log
files when done
Runtime batch files
Runtime page files
Image files
Log files
Batch
404
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
SETTING UP THE INITIAL EXECUTION ENVIRONMENT
Before the Rulerunner service can begin running rules on the current batch, the Taskmaster client application
(Taskmaster Client, Taskmaster Web, Quattro, etc.) must set up the runtime environment. This involves
specifying several parameters, including:

The name of the application and the task profile to run

The name and location the document hierarchy (“Setup DCO”)

The name and location of the input runtime batch hierarchy (“Runtime DCO”)

The ID and location of the current batch

The Taskmaster login credentials to use
The Taskmaster client application defines the runtime environment by creating and then setting properties on
the following memory resident runtime objects:

DCO object (TDCOLib.DCOClass)

PilotProps object (PILOTCTRLLib.BPilotCtrl)

Client object (DCRRSClient.CRRClient) and its child objects:
Session, State, FlowControl, Error, and Setup (see diagram on right)
Client
Session
State
Setup
FlowControl
Error
Detailed information about these objects as well as their properties and methods is provided in the various
Taskmaster Capture API reference guides. Some of the key properties in terms of setting up the runtime
environment are shown in the table below.
Object
Property
Example
DCO
ID
20110131.001
Type
TravelDocs
XML
<B id="20110131.001"><V n="TYPE">TravelDocs etc.
BatchDir
C:\Datacap\TravelDocs\batches\20110131.001
BatchID
20110131.001
DCOFile
C:\Datacap\TravelDocs\batches\20110131.001\VScan.xml
Operator
admin
Station
1
PilotProps
<B id="20110131.001"><P n="BatchDir">C:\Datacap etc.
Victim
B
XML
lib="TravelDocs" tprofile="PageID" logfile= etc.
PilotProps
Client > Session > State
We‟ll look at these objects and their settings in more detail later in this chapter when we execute a task profile
using the ProfileRunner sample application, starting on page 412.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
405
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
TRANSFERRING CONTROL TO THE RULERUNNER SERVICE
After setting up the initial runtime execution environment, the Taskmaster client application passes control to
the Rulerunner service to execute the specified task profile on the current batch.
The first thing Rulerunner must do is retrieve the rules and actions associated with the specified task profile.
Each task profile includes one or more rulesets and each ruleset is made up of rules, functions, and actions.
 Task profile
 Ruleset
 Ruleset
 Rule
 Function
 Actions
Taskmaster stores the definitions for the task profiles and rulesets as files in the application‟s
dco_<application_name>\rules folder.
dco_TravelDocs
Task Profiles
 Workflow folder
rules
Rulesets
collection.xml
Rules
VScan.rul
Functions
PageID.rul
Actions
One file
per ruleset
etc.
We‟ll look at the structure of the collection.xml and the .rul files in the sections that follow.
The individual actions are referenced from action library (RRX) files stored in C:\Datacap\RRS. The code to
implement the individual actions is either in the RRX file (in the case of scripted actions) or in a DLL in
C:\Datacap\dcshared\NET (in the case of .NET actions).
Datacap
RRS
ocr_s.rrx
recog_shared.rrx
etc.
Actions
dcshared
NET
Convert.dll
FileIO.dll
etc.
406
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
STRUCTURE OF COLLECTION.XML
Collection.xml defines the application‟s task profiles and the rulesets within each profile. Each ruleset has an
ID number and each task profile is listed with the IDs of the rulesets it includes.
<rsapp name="TravelDocs">
<rsc>
<ruleset name="VScan" id="1"/>
<ruleset name="ImageFix" id="2"/>
<ruleset name="PageID" id="3"/>
<ruleset name="CreateDocs" id="5"/>
<ruleset name="Recognize" id="4"/>
<ruleset name="Validate" id="6"/>
<ruleset name="Export" id="7"/>
<ruleset name="FingerprintAdd" id="8"/>
<ruleset name="Clean" id="9"/>
<ruleset name="Routing" id="10"/>
<ruleset name="Document Integrity" id="11"/>
<ruleset name="ExportDB" id="12"/>
<ruleset name="ExportXML" id="13"/>
</rsc>
<tps>
<tprofile name="VScan">
<ruleset id="1"/>
</tprofile>
<tprofile name="PageID">
<ruleset id="2"/>
<ruleset id="3"/>
</tprofile>
<tprofile name="CreateDocs">
<ruleset id="5"/>
<ruleset id="11"/>
</tprofile>
<tprofile name="Rulerunner">
<ruleset id="4"/>
<ruleset id="9"/>
<ruleset id="6"/>
<ruleset id="10"/>
</tprofile>
Each ruleset is
assigned an ID
number
Each task profile
includes one or
more rulesets
etc.
</tps>
</rsapp>
In the example above:

The VScan profile includes ruleset 1 (VScan)

The PageID profile includes rulesets 2 and 3 (ImageFix and
PageID)

The CreateDocs profile includes rulesets 5 and 11 (CreateDocs
and Document Integrity)

The Rulerunner profile includes rulesets 4, 9, 6, and 10
(Recognize, Clean, Validate, and Routing)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
407
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
STRUCTURE OF A RULESET FILE
A .rul (ruleset) file defines the rules, functions, and actions in a given ruleset. The example below shows the
PageID ruleset and the PageID rule as they‟re defined in the file PageID.rul. The rule contains two functions:
Identify using Fingerprint and Identify using Text Match.
<ruleset name="PageID" id="3" ver="6" modder="admin.1" dt="02/23/11.990
10:26:43.990 " src_ver="5">
 PageID ruleset (ID=3 in collection.xml)
<rule name="PageID" id="1" qi="">
 PageID rule (ID=1 in PageID.rul)
<func name="Identify using Fingerprint">
<a name="RecognizePageOCR_S" ns="OCR_s" qi="">
<p type="bInter" name="bInter"/>
<p type="bDebug" name="bDebug"/>
</a>
<a name="FindFingerprint" ns="autodoc">
“Identify using
<p type="bInter" name="bInter"/>
Fingerprint”
<p type="bDebug" name="bDebug"/>
function
<p type="strParam" v="False" name="StrParam"/>
</a>
<a name="rrSet" ns="rrunner" qi="">
<p name="varSource" v="Fingerprint"/>
<p name="varTarget" v="@P.MatchType"/>
</a>
</func>
<func name="Identify using Text Match">
<a name="RegExFind" ns="Locate">
<p type="bInter" name="bInter"/>
<p type="bDebug" name="bDebug"/>
<p type="strParam" v="Car" name="StrParam"/>
</a>
<a name="SetPageType" ns="DCO">
“Identify using
<p type="bInter" name="bInter"/>
Text Match”
<p type="bDebug" name="bDebug"/>
function
<p type="strParam" v="Rental_Agreement"
name="StrParam"/>
</a>
<a name="rrSet" ns="rrunner" qi="">
<p name="varSource" v="Text"/>
<p name="varTarget" v="@P.MatchType"/>
</a>
</func>
</rule>
etc.
</ruleset>
In the example above, the Identify using Fingerprint
function includes three actions:
408

RecognizePageOCR_S from the OCR_s action library
(ns="OCR_s"), which has no parameters.

FindFingerprint from the autodoc action library
(ns="autodoc"), which has one parameter: False.

rrSet from the rrunner action library (ns="rrunner"),
which has two parameters: Text and @P.MatchType.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
GOING THROUGH THE DOCUMENT HIERARCHY
As we saw earlier, the Taskmaster client is responsible for instantiating a batch level DCO object and
populating it with the runtime batch information.
Batch level DCO object
ID
20110136.005
ImageName
Options
C:\Datacap\ …\TravelDocs.xml
Status
0
 Document hierarchy file (“Setup DCO”)
Text
Type
TravelDocs
XML
<B id="20110136.005">
<V n="TYPE">TravelDocs</V>
 Runtime batch hierarchy
<P id="TM000001">
<V n="TYPE">Other</V>
etc.
When control passes to the Rulerunner service, it goes through the runtime batch hierarchy object by object
in the order shown in the example below (“depth first traversal”).
1
2
3
4
Document 1
Page 1
Field 1 5
6
Field 2
Batch
7
9
Page 2
Field 1
8
10
Field 2 11
Document 2
Page 1
Field 1 12
13
Field 2 14
Field 1
Page 2
15
Field 2
As it moves from object to object, the Rulerunner service populates another DCO object known as the
“current DCO object” with the relevant information. The examples below show the current DCO object
first with a page object and then with a field object.
Current DCO object – Page 1
Current DCO object – Field 1
ID
TM000001
ID
ImageName
C:\Datacap\ …\tm000001.tif
ImageName
Options
C:\Datacap\ …\TravelDocs.xml
Options
C:\Datacap\ …\TravelDocs.xml
Status
49
Status
0
Text
345.70
Text
Total_Cost
Type
Rental_Agreement
Type
Total_Cost
XML
<P id="TM000001">
XML
<F id="Total_Cost">
<F id="Pickup_Date">
<V n="TYPE">Total_Cost</V>
<V n="TYPE">Pickup_Date</V>
<V n="Position">576,904,817,961</V>
<V n="Position">183,402,535,463</V>
<V n="STATUS">0</V>
<V n="STATUS">0</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
<C cn="8" cr="609,918,621,939">51</C>
409
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
As the Rulerunner service loads each object, it determines if any rule in the current ruleset applies to the
current object. If it does, Rulerunner executes the rule; otherwise, it moves to the next object in the hierarchy.
Referring to the examples below, if we‟re executing the TravelDocs application‟s Validate ruleset on a
rental agreement page, Rulerunner executes the Validate Page rule on the page and the Validate
Currency Field rule on “Total_Cost” field.
The other fields on the Rental_Agreement page aren‟t bound to any Validate rules so there‟s nothing to run.
The examples below show the rules that are bound to the “Pickup_Date” and “Options” fields and you can
see there are no references to the Validate ruleset.
On the other hand, if Rulerunner is executing the Recognize ruleset then there‟s a rule attached to each of
these fields. In this case, Rulerunner executes the actions associated with the specified rule as it processes
each object.
The example below shows how the rule information for each object is stored in the document hierarchy XML
file. This portion of the file shows the rules defined for the Total_Cost field type.
<P type="Total_Cost">
<V n="ID">0</V>
<V n="TYPE">Field</V>
etc.
<V n="rules">
<in>
<r id="1" rs="9" />
 Ruleset 9 (Clean), rule 1 (Fields Clean)
<r id="2" rs="6" />
 Ruleset 6 (Validate), rule 2 (Validate Currency Field)
<r id="12" rs="4" />
 Ruleset 4 (Recognize), rule 12 (Recognize Total Cost)
</in>
</V>
etc.
410
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
ACCESSING RUNTIME OBJECTS FROM AN ACTION
The diagram below shows the runtime objects Rulerunner maintains when you execute a task profile.
Individual actions can access those same objects, and this is how custom actions can read and manipulate the
runtime batch hierarchy. For example, you could implement a custom action that reads the value of a specific
field, converts it into a specific format, and then writes it back to the runtime batch hierarchy.
You can access the runtime objects either directly or through the Datacap SmartNav object. The SmartNav
object provides a convenient way to access the objects without having to obtain an interface to each one.
 The SmartNav object is available only to actions implemented as COM DLLs, and not those
implemented in VBScript.
Datacap Rulerunner Service (DCRRS)
Batch level DCO object
Current DCO object
State object
ID
20110027.001
ID
TM000001
Options
C:\Datacap…
Options
C:\Datacap…
Type
TravelDocs
Type
Rental_Agreement
XML
<P id="..." /B>
<F id="…"
etc.
XML
<B id="..." /B>
<V n="TYPE"…
etc.
Props
Victim
XML
etc.
etc.
TDCOLib.IDCO
<stack>
<ruleset id="…">
etc.
etc.
TDCOLib.IDCO
dcrroLib.IRRState
Log object
BPilotCtrl object
Enabled
True
BatchDir
C:\Datacap\...
Filename
C:\Datacap\...
DCOFile
C:\Datacap\...
Write()
TaskID
VScan
etc.
etc.
dclogXLib.IDCLog
DCO
<B id="..." /B>
<P n= "…" …
</P>
etc. B
PILOTCTRLLib.IBPilotCtrl
RRLog
CurrentDCO
BatchPilot
RRState
SmartNav object
Action Library COM DLL
Later in this chapter, starting on page 422, we‟ll use a custom action created using the Datacap Custom
Action Library template to access the runtime objects using the SmartNav object.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
411
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
TRAVELDOCS: EXECUTING A TASK PROFILE USING PROFILERUNNER
 This section requires Microsoft Visual Studio. If you don‟t have Visual Studio but would like to evaluate
Visual Studio, you can download Visual C# 2010 Express from the Microsoft website. ProfileRunner
will run under Visual Studio 2010, but you‟ll need the full version of Visual Studio 2005 or 2008 for the
next section, “Using a custom action to examine runtime objects” starting on page 422.
ABOUT PROFILERUNNER
ProfileRunner is a sample Taskmaster client application that lets you examine some of Taskmaster‟s internal
workings. It‟s included with the Taskmaster installation as a Visual Studio project, so if you have Visual
Studio on your machine you can run ProfileRunner in debug mode and examine the Taskmaster runtime
environment. The ProfileRunner user interface is shown below.
ProfileRunner lets you run any available profile on an existing batch (in other words, you must have already
created the batch using the application‟s batch creation task). In the ProfileRunner window, you specify the
runtime settings and then click Run Profile.
 ProfileRunner is not connected to Taskmaster‟s queuing engine, so it won‟t change the task status in the
engine database or the job monitor.
412
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
OPENING THE PROFILERUNNER PROJECT
 Before you begin, make a backup copy of the folder C:\Datacap\DeveloperKit\ProfileRunner.
1. Start Visual Studio.
2. Click File > Open Project and open C:\Datacap\DeveloperKit\ProfileRunner\RRS.NET.sln.
3. If you‟re using Visual C# 2010 Express, convert the project when prompted to do so.
4. In the Visual Studio Solution Explorer pane, double-click frmRRS.NET.cs.
5. Click View > Code (or press F7) to switch to code view.
PREPARING A BATCH USING VSCAN
ProfileRunner lets you run any task profile except the application‟s batch creation task. For this reason, you
must create a batch first using Datacap Studio, Taskmaster Client, or the Taskmaster Web client. The steps
below use Datacap Studio.
1. If necessary, start Datacap Studio and open the TravelDocs application.
2. Click the Test tab.
3. In the Workflow pane, select the VScan task profile under Main Job.
4. Click the New button to start a new batch.
5. Click the Process rules for target object  button. When the task profile completes, click Task
finished.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
413
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
BUILDING AND RUNNING THE PROJECT
In this section, we‟ll run the new batch through the PageID task profile under the control of ProfileRunner.
We‟ll run the task profile straight through without breakpoints to confirm that everything‟s set up properly.
1. In Visual Studio, make sure the Solution Configuration is set to Debug.
 If you‟re using Visual Studio 2005 or 2008 and the Solution Configuration drop-down is not
displayed on the standard toolbar, click Tools > Import and Export Settings > Reset all
settings and select General or C#. The Solution Configuration drop-down is not available with
Visual C# 2010 Express but you can run ProfileRunner in the standard execution mode.
2. Click the Start Debugging  button on the main Visual Studio toolbar to build and run the project.
3. In the Library Name field select TravelDocs and then in the Profile field select PageID.
4. In the InputPageFile field, type VScan.xml. This is the name of the runtime batch file generated by the
VScan task profile and used as the input to the PageID task profile.
5. In the OutputPageFile field, type, PageID.xml. This is the name of the runtime batch file to be
generated by the PageID task profile.
6. Click the Browse […] button beside the Runtime batch folder field.
7. Select the folder C:\Datacap\TravelDocs\batches\<batch_num> where <batch_num> is the ID of
the most recent batch (the one you created in Datacap Studio). Then click OK.
8. In the ProfileRunner window, click Run Profile and wait for the profile to complete. Then click OK.
9. Open the folder C:\Datacap\TravelDocs\batches\<batch_num> and then open PageID.xml to
confirm that page identification was successful.
10. In Visual Studio, click the Stop Debugging  button on the main Visual Studio toolbar.
414
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
SETTING A BREAKPOINT IN THE EXECUTE METHOD
In this section, we‟ll set a breakpoint within the ProfileRunner executable so we can stop execution and
examine the Datacap runtime objects.
1. In Visual Studio or Visual C# Express, click the Types field at the top left of the code view pane and
choose RRS.NET.RRSProfileRunner. Then click the Members field at the top right of the code view
pane and choose Execute to go to the portion of the code that executes when you click the Run Profile
button.
 Object setup
2. Click once in the gray margin to the left of the line m_oClient = new DCRRSClient.CRRClient()
to set a breakpoint.
3. Click the Start Debugging  button on the main Visual Studio toolbar to run the project.
4. Confirm that the settings are as you specified them earlier and then click Run Profile. This time the
breakpoint stops execution immediately and you‟ll see the line highlighted in yellow in Visual Studio.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
415
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
EXAMINING THE INITIAL STATE OF THE RUNTIME OBJECTS
The //initialize objects section of the Execute method is responsible for creating the runtime objects
the Datacap Rulerunner service requires.
m_oClient = new DCRRSClient.CRRClient();
m_oDCO = new TDCOLib.DCOClass();
m_oPilotProps = new PILOTCTRLLib.BPilotCtrl();
In this section, we‟ll single-step through these three lines and then look at the runtime objects.
1. In Visual Studio, press F10 (single step) three times to execute the lines creating the runtime objects.
2. Right-click the m_oClient object and choose Add Watch.
3. In the Watch pane, expand the m_oClient object and the Session, FlowControl, State, Error, and
Setup objects. This shows the initial state of the Rulerunner Client object and its child objects.
Client
Session
State
Setup
FlowControl
Error
4. In the Watch pane, collapse the m_oClient object back to its closed state.
5. In the code view, right click the m_oDCO object, choose Add Watch.
416
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
6. In the Watch pane, expand the m_oDCO object. This shows the initial state of the DCO object.
7. In the Watch pane, collapse the m_oDCO object back to its closed state.
8. In the code view, right click the m_oPilotProps object, choose Add Watch.
9. In the Watch pane, expand the m_oPilotProps object and the BPilotCtrlClass. This shows the initial
state of the PilotProps object.
10. In the Watch pane, collapse the m_oPilotProps object back to its closed state.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
417
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
POPULATING THE RUNTIME DCO OBJECT
Having looked at the runtime objects in their initial state, we now want to see them with the runtime batch
information. First, we‟ll load the DCO object.
1. Press F10 (single step) four more times, until you reach the line:
This line combines the runtime batch path and input page file as specified in the ProfileRunner window
and reads the input file into the DCO object‟s XML property.
2. Press F10 again to execute the line.
3. Expand the m_oDCO object in the Watch pane. The ID, Type, and XML fields are now populated.
4. Click the Text Visualizer
file VScan.xml.
button beside the XML field. This property is populated from the input
 The pages all have TYPE= "Other" since we haven‟t executed the PageID profile yet.
5. Close the Text Visualizer window and collapse the m_oDCO object back to its closed state.
418
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
POPULATING THE PILOTPROPS OBJECT AND THE STATE.PILOTPROPS PROPERTY
Next, we‟ll populate the PilotProps object and then look at the properties that get transferred to the State
object prior to executing the task profile.
1. In the Visual Studio code view, scroll through the Execute method until you reach the following line:
//run the profile!
l_iRes = m_oSession.ExecuteEx(m_oSession.State.Victim, DCRRSClient.EFlowMode.eRun);
 The code you‟re scrolling past is responsible for setting up the values on the PilotProps object and
then transferring those values to the State object‟s PilotProps property.
2. Click once in the gray margin to the left of the line l_iRes = m_oSession.ExecuteEx(…) to set a
breakpoint.
3. Click the Continue  button on the main Visual Studio toolbar to run to the breakpoint.
4. In the Watch pane, expand the PilotProps object and the BPilotCtrlClass. This shows the state of the
PilotProps object immediately prior to executing the task profile.
The BatchDir, BatchID, and DCOFile are populated from the ProfileRunner window. The other nonzero properties are hardcoded into the Execute method (m_oPilotProps.Station="1", etc.). These
properties get transferred to the State object and provide the information the Rulerunner service needs to
execute the task profile.
5. Collapse the m_oPilotProps object back to its closed state.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
419
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
6. Expand the m_oClient object, the Session child object, and the State child object.
7. Click the Text Visualizer
button beside the PilotProps field to view the properties that were
transferred from the PilotProps object.
8. Close the Text Visualizer window and then click the Text Visualizer
to view the library, profile, and logging information.
button beside the XML field
The library and profile fields come from the ProfileRunner window; the logging settings are hardcoded.
9. Close the Text Visualizer window and collapse the m_oClient object back to its closed state.
420
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
EXECUTING THE TASK PROFILE
We‟ve seen how the DCO and State objects get set up prior to execution. Now we can execute the task
profile.
1. Press F10 to execute the line l_iRes = m_oSession.ExecuteEx(…) that runs the task profile. Then
wait for the task profile to complete.
2. In the Watch pane, expand the m_oDCO object. Then click the Text Visualizer
XML field.
button beside the
Notice that the page types are still “Other” since we‟re looking here at the XML property we populated
earlier using the m_oDCO.Read() method. The Session.ExecuteEx() method populates an internal
data structure and we must explicitly get the information in XML format using the
m_oSession.State.GetDCOXML() method. This step comes next.
3. Close the Text Visualizer window and then press F10 five times to execute the line:
m_oDCO.XML = m_oSession.State.GetDCOXML("");
4. In the Watch pane, expand the m_oDCO object. Then click the Text Visualizer
XML field.
button beside the
You can see now that the XML property contains the updated runtime DCO information.
5. Close the Text Visualizer window and click the Continue  button on the main Visual Studio toolbar to
run the application to completion. Then clock OK to close the message box.
6. In Visual Studio, click the Stop Debugging  button on the main Visual Studio toolbar.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
421
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
TRAVELDOCS: USING A CUSTOM ACTION TO EXAMINE RUNTIME OBJECTS
 This section requires the full version of Visual Studio 2005 or 2008. The Datacap Custom Action
Library template is not yet compatible with Visual Studio 2010.
In this section, we‟ll examine the runtime objects used during rule execution. We‟re going to create a custom
action library using the Datacap Custom Action Library template. We‟re then going to insert a custom action
at specific points within the TravelDocs application where we‟d like to examine the state of the runtime
objects. We‟ll then attach the Visual Studio debugger to the Datacap Studio process so we can intercept the
task profile when the custom action executes and look at the runtime objects.
OBTAINING THE CUSTOM ACTION LIBRARY TEMPLATE
You can use the Custom Action Library template that‟s included in the Taskmaster Capture 8.0.1 Fix Pack 1,
or you can download the template from the IBM Taskmaster Capture Publications Library at this URL:
http://www-01.ibm.com/support/docview.wss?uid=swg27021107
From the Taskmaster Capture Publications Library, download DeveloperKit.zip and then unzip the file.
 Do not use the Custom Action Library template that‟s included with the Taskmaster Capture 8.0.1.
CREATING A CUSTOM ACTION LIBRARY
1. Copy the Datacap Custom Action Library template into your Visual Studio templates folder:
From:
C:\Datacap\DeveloperKit\VSTemplates\Datacap Custom Action Library.zip (or the download)
To:
(My) Documents\Visual Studio\Templates\ProjectTemplates\Datacap
2. Start Visual Studio by right-clicking the Microsoft Visual Studio Start menu item and choosing Run as
administrator. You must be running as an administrator in order to build the library.
3. Click File > New > Project. Then expand Visual C# and select the Datacap category.
 If Visual C# is not configured as the primary language, Visual C# is under “Other Languages.”
4. Select the Datacap Custom Action Library template.
422
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
5. In the Name field, enter CustomActions and then click OK.
6. In Visual Studio, make sure the Solution Configuration is set to Debug.
 If the Solution Configuration drop-down is not displayed on the standard toolbar, click Tools >
Import and Export Settings > Reset all settings and select General or C#.
7. Click Build > Build Solution.
ADDING THE CUSTOM ACTION TO THE TRAVELDOCS APPLICATION
The Datacap Custom Action Library template includes two default actions, one of which is already set up so
we can examine the runtime Datacap objects. In this section, we‟ll add the SampleQuerySmartParamValue
action to the beginning and the end of the VScan ruleset so we can look at the runtime DCO object at the
start and the end of the task profile. Before we can access the custom actions from Datacap Studio, you‟ll
need to copy the RRX file into the C:\Datacap\RRS folder.
1. Copy the new library‟s CustomActions.rrx file from the Debug folder into the RRS library folder:
From:
(My) Documents\Visual Studio\Projects\CustomActions\bin\Debug
To:
C:\Datacap\RRS
2. Start Datacap Studio and open the TravelDocs application.
3. On the Rulemanager tab, select the VScan ruleset and click the Lock/Unlock ruleset button.
4. Expand the VScan ruleset completely and select the SetSourceDirectory action.
5. Click the Actions library tab and expand the CustomActions library completely.
 If the CustomActions library is not displayed, click the Refresh
button. If it‟s still not
displayed, make sure you copied the RRX file into C:\Datacap\RRS.
6. Select the SampleQuerySmartParamValue action and click the Add to function
action to the beginning of the function.
7. In the Rulesets pane, select VScan: Batch Function 1 and click the Add to function
the same action to the end of the function.
button to add the
button to add
8. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The ruleset should look like the one below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
423
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
EXECUTING THE ACTION WITHIN THE VISUAL STUDIO DEBUGGER
1. In Visual Studio, click Debug > Attach to Process.
2. Select DStudio.exe and click Attach.
3. In the SampleQuerySmartParamValue method implementation, set a breakpoint on the line string
objectValue = localSmartObj.DCONavGetValue(CurrentDCO.ID), as shown below.
 If Datacap Studio has not yet loaded the sample action into memory or you rebuilt the project since
running it from Datacap Studio, you may see a warning on the breakpoint indicating that no
symbols are loaded. You can ignore the warning as it will disappear when you execute the action.
4. In Datacap Studio, click the Test tab and select the VScan task profile under Main Job.
5. Click the New button to create a new batch.
424
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
6. Click the Process rules for target object  button. The task profile begins but execution stops
immediately at the Visual Studio breakpoint.
7. In Visual Studio, right-click localSmartObj within the highlighted line and choose Add Watch.
 dcSmart.SmartNav provides a convenient way to access multiple Datacap runtime objects used
by the Rulerunner service during execution.
8. Expand localSmartObj > base > Non-Public members in the Watch pane to view the Datacap
objects. The expand the CurrentDCO object.
 Since the VScan rule is bound to the document hierarchy at the batch level, the current DCO is the
batch level DCO object.
9. In the Watch pane, Click the Text Visualizer
DCO XML.
button beside the XML field to display the runtime
Since we haven‟t executed the Scan action yet, there are no pages – just the batch level wrapper.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
425
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
10. Close the Text Visualizer window and then click the Continue  button on the main Visual Studio
toolbar to run the application to the next breakpoint. This occurs when Rulerunner executes the
SampleQuerySmartParamValue action at the end of the rule.
11. In the Watch pane, the red highlighting indicates that the value of the XML property has changed. Click
the Text Visualizer
button beside the XML field to display the runtime DCO XML.
This time you see the complete runtime batch hierarchy with each of the “scanned” pages.
12. Close the Text Visualizer window and then click the Continue  button on the main Visual Studio
toolbar to run the task profile to completion. Then click Cancel batch in Datacap Studio.
426
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
VIEWING PAGE AND FIELD OBJECTS
In the previous section, we executed the SampleQuerySmartParamValue action within a rule running at
the batch level. Next, we‟ll use the same action within a rule running at the page level and another running at
the field level.
UPDATING THE VALIDATE RULESET
1. On the Datacap Studio Rulemanager tab, select the Validate ruleset and click the Lock/Unlock ruleset
button.
2. Expand the Validate ruleset and then expand the Validate Page and Validate Currency Field rules
completely.
 The Validate Page rule is bound to all page types and the Validate Currency Field rule is bound to
the various cost fields on the Rental_Agreement, Air_Ticket, and Room_Receipt pages.
3. Add the SampleQuerySmartParamValue action in the CustomActions library to the beginning of
each function.
4. In the Rulesets pane, click the Save button. Then click the Lock/Unlock ruleset button and choose
Publish ruleset. The updated rules should look like those below.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
427
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
WATCHING THE RUNTIME OBJECTS IN VISUAL STUDIO
1. Click the Test tab and select the VScan task profile under Main Job.
2. Click the New button to create a new batch.
3. Click the Process rules for target object  button. The VScan task profile begins but execution stops
immediately at the Visual Studio breakpoint. Notice that the Current DCO object in the Watch pane
shows the batch as the current object.
 Batch
4. In Visual Studio, click the Continue  button twice and then click the Advance button to move the
batch to the next task in the workflow.
5. Click the Process rules for target object  button and wait for the PageID task profile to complete.
Then click Advance.
6. Click the Process rules for target object  button and wait for the CreateDocs task profile to
complete. Then click Advance.
7. Click the Process rules for target object  button to start the Rulerunner task profile. Execution stops
at the breakpoint in Visual Studio. Notice that the Current DCO object in the Watch pane shows page
TM000001 as the current object.
 Page
428
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
UNDERSTANDING THE TASKMASTER OBJECT MODEL AND EXECUTION ENVIRONMENT
8. Click the Text Visualizer
button beside the XML field to display the runtime DCO XML.
9. Close the Text Visualizer and then click the Continue  button to run to the next breakpoint. Notice
that the Current DCO object in the Watch pane shows the Total_Cost field as the current object.
 Field
10. Click the Text Visualizer
button beside the XML field to display the runtime DCO XML.
11. Close the Text Visualizer. Then keep pressing the Continue  button and watch the current object
change as you work through the batch hierarchy from page to page.
 You may need to re-expand the CurrentDCO object at some breakpoints. After the first few pages,
you may want to remove the breakpoint and click Continue  to run the task to completion.
12. When the task profile completes, click Cancel batch.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
429
Appendix A
SMART PARAMETER SPECIAL VARIABLE REFERENCE
 Special variables are for use with smart parameters. Smart parameters do not work with all actions.
Please check the action help
in Datacap Studio for compatibility information.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
431
SMART PARAMETER SPECIAL VARIABLE REFERENCE
SPECIAL VARIABLES FOR ACCESSING THE APPLICATION CONFIGURATION FILE
@APPPATH(<key_path>)
DESCRIPTION
Retrieves the path to a file or folder from the application‟s configuration (.app) file. Use the Taskmaster
Application Manager to modify the information contained in this file – do not edit the file directly.
SYNTAX
@APPPATH(key_path)
ARGUMENTS
key_path
Specifies the path through the XML hierarchy to field you want:

If the field name is unique within the file, you can specify just the field name.

If the field name is not unique within the file (for example, if you have multiple
workflows), you must specify the path to the instance you want. If you don‟t specify the
path, you‟ll get the first instance. The easiest way to obtain the proper path is to move
the mouse pointer over the field in the Taskmaster Application Manager and read the
path from the balloon help.

For additional information about specifying key paths, see GetKeyValue in the Datacap
Application Service Control API Reference.
EXAMPLES
You can use @APPPATH to retrieve the following information from the application configuration file.
Field name
Key name
Example
Batches folder
runtime
@APPPATH(runtime)
(field name is unique)
Export folder
export
@APPPATH(export)
(field name is unique)
fingerprint
@APPPATH(fingerprint)
(field name is unique)
Setup DCO
setupdco
@APPPATH(setupdco)
(assumes single workflow)
Rules folder
rules
@APPPATH(dco_Workflow1/rules)
(specifies Workflow 1)
VScan source folder
vscanimagedir
@APPPATH(vscanimagedir)
(assumes single workflow)
Imagefix INI
imagefix
@APPPATH(dco_Workflow2/imagefix)
(specifies Workflow 2)
Application fields
Fingerprint folder
Workflow fields
4
4
If your application has only one workflow, the path is not required. If you have multiple workflows, use
dco_<workflow_name>/ to reference the required field.
432
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@APPVAR(<key_path>)
DESCRIPTION
Retrieves a connection string, value, or other attribute from the application configuration (.app) file. Use the
Taskmaster Application Manager to modify the information contained in this file – do not edit the file
directly.
 With Taskmaster Capture prior to version 8.0, this parameter returns the value for the specified variable
as defined in the [Variables] section in “paths.ini,” located in the application‟s “process” folder.
SYNTAX
@APPVAR(key_path[:attribute])
ARGUMENTS
key_path
Specifies the path to field you want. See @APPPATH for details.
attribute
Specifies the attribute name (optional):

For custom values, use "v" (if you don‟t specify an attribute, "v" is assumed)

For connection strings, use "cs".

For additional information, see GetKeyValue in the Datacap Application Service Control
API Reference.
The easiest way to obtain the proper path and attribute is to look in the Taskmaster
Application Manager:

On the Application and Taskmaster tabs, move the mouse pointer over the field and
read the smart parameter value from the tooltip.

On the Custom Values tab, read the smart parameter value displayed in each section.
EXAMPLES
You can use @APPVAR to retrieve the following information from the application configuration file.
Field name
Key name:attribute
Example
Lookup database
lookupdb:cs
@APPVAR(lookupdb:cs)
(Workflow 1)
Fingerprint database
fingerprintconn:cs
@APPVAR(fingerprintconn:cs)
(one workflow)
Export database
exportdb:cs
@APPVAR(dco_Wkflw1/exportdb:cs)
(Workflow 1)
Engine database
tmengine:cs
@APPVAR(tmengine:cs)
(field is unique)
Admin database
tmadmin:cs
@APPVAR(tmadmin:cs)
(field is unique)
General string values
values/gen/<value_name>
@APPVAR(values/gen/Value1)
(Value1)
Connection strings
values/dsn/<value_name>:cs
@APPVAR(values/dsn/DB1:cs)
(CS for DB1)
TM connection strings
values/tmdsn/<value_name>:cs
@APPVAR(values/tmdsn/TMDB1:cs)
(CS for TMDB1)
Advanced values
values/adv/<value_name>
@APPVAR(values/adv/Value1)
(Value1)
Workflow fields
5
Taskmaster fields
Custom values
5
If you have multiple workflows, use dco_<workflow_name>/ to reference the required field.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
433
SMART PARAMETER SPECIAL VARIABLE REFERENCE
SPECIAL VARIABLES FOR ACCESSING THE RUNTIME HIERARCHY
@BATCHID
DESCRIPTION
Returns the value of the “id” attribute for the current batch.
EXAMPLE
In this example, the smart parameter returns the ID of the current batch.
Action: rr_Get("@BATCHID")
Return value: 20110046.001
<B id="20110046.001">
<V n="TYPE">TravelDocs</V>
<V n="STATUS">1</V>
@ID
DESCRIPTION
Returns the value of the “id” attribute for the current object. For example, if the rule containing this special
variable is bound to a page, the current object is the current page.
EXAMPLE
In this example, the rule containing the action is bound to a page and the smart parameter returns the ID of
the current page.
Action: rr_Get("@ID")
Return value: TM000001
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">1</V>
@STATUS
DESCRIPTION
Returns the value of the STATUS variable for the current object. Note that STATUS is also a setup property
and may therefore specify a special characteristic (for example, -1 indicates a hidden field).
EXAMPLE
In this example, the rule containing the action is bound to a page and the smart parameter returns the status
of the current page.
Action: rr_Get("@STATUS")
Return value: 1
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">1</V>
434
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@VALUE
DESCRIPTION
Returns the text of the current object (usually a field).
EXAMPLE
In this example, the rule containing the action is bound to a field and the smart parameter returns the text of
the current field.
Action: rr_Get("@VALUE")
Return value: SUV
<F id="Car_Type">
<V n="TYPE">Car_Type</V>
<C cn="10" cr="588,748,600,769">83</C> ASCII ‘S’
<C cn="10" cr="605,748,620,769">85</C> ASCII ‘U’
<C cn="10" cr="625,748,643,769">86</C> ASCII ‘V’
</F>
@VAR(<variable_name>)
DESCRIPTION
Returns the value of the specified variable on the current object.
EXAMPLE
In this example, the rule containing the action is bound to a page and the smart parameter returns the value of
the “TYPE” variable for the current page.
Action: rr_Get("@VAR(TYPE)")
Return value: Rental_Agreement
<<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="STATUS">1</V>
@P\<field_name>[.<variable_name>]
DESCRIPTION
Returns the text of the specified field on the current page, or the value of the specified variable of the
specified field on the current page.
EXAMPLE
In the first example, the smart parameter returns the text of the “Car_Type” field on the current page.
In the second example, the smart parameter returns the value of the “Car_Type” field‟s “TYPE” variable.
Action: rr_Get("@P\Car_Type.TYPE")
<F id="Car_Type">
<V n="TYPE">Car_Type</V>
<C cn="10" cr="588,748,600,769">83</C> ASCII ‘S’
<C cn="10" cr="605,748,620,769">85</C> ASCII ‘U’
<C cn="10" cr="625,748,643,769">86</C> ASCII ‘V’
Return value: Car_Type
</F>
Action: rr_Get("@P\Car_Type")
Return value: SUV
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
435
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@F\<field_name>[.<variable_name>]
DESCRIPTION
Returns the text of the specified sub-field of the current field (for example, a field within a line item), or the
value of the specified variable of the specified sub-field.
EXAMPLE
In these examples, the rule containing the action is bound to a field with sub-fields. In the first example, the
smart parameter returns the text of the “Unit_Cost” sub-field of the current field.
In the second example, the smart parameter returns the value of the “Unit_Cost” sub-field‟s “TYPE”
variable.
Action: rr_Get("@F\Unit_Cost")
Return value: $9.90
Action: rr_Get("@F\Unit_Cost.TYPE")
Return value: Unit_Cost
<F id="Other_Charges_Line_Item0">
<V n="TYPE">Other_Charges_Line_Item</V>
<F id="Unit_Cost">
<V n="TYPE">Unit_Cost</V>
<C cn="10" cr="1290,511,1305,540">36</C>
<C cn="10" cr="1308,515,1321,536">57</C>
<C cn="10" cr="1325,533,1329,536">46</C>
<C cn="10" cr="1334,515,1348,536">57</C>
<C cn="10" cr="1350,515,1365,536">48</C>
ASCII ‘$’
ASCII ‘9’
ASCII ‘.’
ASCII ‘9’
ASCII ‘0’
</F>
@B.<variable_name>
DESCRIPTION
Returns the value of the specified batch-level variable.
EXAMPLE
In this example, the smart parameter returns the value of the “TYPE” variable for the current batch.
Action: rr_Get("@B.TYPE")
<B id="20110046.001">
<V n="TYPE">TravelDocs</V>
Return value: TravelDocs
<V n="STATUS">1</V>
@D.<variable_name>
DESCRIPTION
Returns the value of the specified variable in the current document.
EXAMPLE
In this example, the smart parameter returns the value of the “TYPE” variable for the current document.
Action: rr_Get("@D.TYPE")
Return value: Car_Rental
<D id="20110046.001.01">
<V n="TYPE">Car_Rental</V>
<V n="STATUS">0</V>
436
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@P.<variable_name>
DESCRIPTION
Returns the value of the specified variable on the current page.
EXAMPLE
In this example, the smart parameter returns the value of the “TemplateID” variable for the current page.
Action: rr_Get("@P.TemplateID")
Return value: 556
<P id="TM000001">
<V n="TYPE">Rental_Agreement</V>
<V n="TemplateID">555</V>
@F.<variable_name>
DESCRIPTION
Returns the value of the specified variable within the current field.
EXAMPLE
In this example, the smart parameter returns the value of the “TYPE” variable for the current field.
Action: rr_Get("@F.TYPE")
Return value: Pickup_Date
<F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V>
<V n="Position">183,402,535,463</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
437
SMART PARAMETER SPECIAL VARIABLE REFERENCE
SPECIAL VARIABLES FOR ACCESSING JOB AND TASK INFORMATION
@JOBID
DESCRIPTION
Returns the ID of the current job. This is the value specified in the ID field on the Taskmaster Administrator
Workflow tab.
EXAMPLE
In this example, the smart parameter returns the ID of the current job.
Action: rr_Get("@JOBID")
Return value: Main Job
@JOBNAME
DESCRIPTION
Returns the name of the current job. This is the value specified in the Description field on the Taskmaster
Administrator Workflow tab.
EXAMPLE
In this example, the smart parameter returns the name of the current job.
Action: rr_Get("@JOBNAME")
Return value: Main Job
@OPERATOR
DESCRIPTION
Returns the username of the person who ran the job.
EXAMPLE
In this example, the smart parameter returns the name of the person who ran the job.
Action: rr_Get("@OPERATOR")
Return value: admin
@STATION
DESCRIPTION
Returns the ID of the station running the job. This is the value specified in the ID field on the Taskmaster
Administrator Workflow tab.
EXAMPLE
In this example, the smart parameter returns the ID of the station running the current job.
Action: rr_Get("@STATION")
Return value: 1
438
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@TASKID
DESCRIPTION
Returns the ID of the current task. This is the value specified in the ID field on the Taskmaster Administrator
Workflow tab.
EXAMPLE
In this example, the smart parameter returns the ID of the current task.
Action: rr_Get("@TASKID")
Return value: Export
@TASKNAME
DESCRIPTION
Returns the name of the current task. This is the value specified in the Description field on the Taskmaster
Administrator Workflow tab.
EXAMPLE
In this example, the smart parameter returns the name of the current task.
Action: rr_Get("@TASKNAME")
Return value: Export
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
439
SMART PARAMETER SPECIAL VARIABLE REFERENCE
MISCELLANEOUS SPECIAL VARIABLES
@CHR(<ascii_value>)
DESCRIPTION
Returns the character corresponding to the specified ASCII code.
EXAMPLE
In this example, the smart parameter returns the character corresponding to ASCII value 38.
Action: rr_Get("@CHR(38)")
Return value: &
@DATE(<format>)
DESCRIPTION
Returns the current date in the format specified (defaults to MM/DD/YYYY).
EXAMPLE
In this example, the smart parameter returns the current date.
Action: rr_Get("@DATE(mm.dd.yyyy)")
Return value: 12.31.2010
@DCO(<property_name>)
DESCRIPTION
Returns the value of the specified DCO object property. The DCO object is an internal data structure
containing the current runtime batch hierarchy information, including the batch ID (ID), batch type (TYPE),
batch status (STATUS), etc., as well as the full runtime batch hierarchy XML.6
EXAMPLE
In this example, the rule containing the action is bound to a page and the smart parameter returns the portion
of the runtime batch hierarchy XML for the current page.
Action: rr_Get("@DCO(XML)")
Return value: <?xml-stylesheet type="text/xsl" href="..\..\dco.xsl"?><P id="TM000001"><F id="Pickup_Date">
<V n="TYPE">Pickup_Date</V><V n="Position">0,0,0,0</V><V n="STATUS">0</V></F>etc.
Valid DCO properties are ID, TYPE, STATUS, BATCHDIR, BATCHPRIORITY, IMAGENAME, TEXT,
CONFIDENCESTRING, and XML.
6
440
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@DICT_VALUE(<field>)
DESCRIPTION
This special variable works with OMR fields (RecogType=4) that are bound to a dictionary. It returns the
dictionary values corresponding to the items that are selected in the specified OMR field.
EXAMPLE
In this example, the rule containing the rr_Get action is bound to a page containing an OMR field
(“Options”). The OMR field has three sub-fields and is bound to the dictionary shown on the right. On the
current page all three options are selected, so the return string contains all three dictionary values.
Action: rr_Get("@DICT_VALUE(Options)")
Return value: Navigation System Child Seat Fuel Service
<DICT n="Options">
<W v="Navigation System">Navigation System</W>
<W v="Child Seat">Child Seat</W>
<W v="Fuel Service">Fuel Service</W>
</DICT>
@DICT_WORD(<field>)
DESCRIPTION
Same as @DICT_VALUE, except this special variable returns the dictionary words corresponding to the
selected items.
@DICT_VINDEX(<csv_string>)
DESCRIPTION
This special variable works with actions bound to an OMR field (RecogType=4) where the OMR field is
bound to a dictionary. The parameter returns a string of 1‟s and 0‟s corresponding to the dictionary values
you specify as a comma-separated list.
EXAMPLE
In this example, the rule containing the rr_Get action is bound to an OMR field. The OMR field has three
sub-fields and is bound to the dictionary shown on the right. The values specified correspond to the second
and third items in the dictionary, so the return value is 011. Note that the rr_Get sets the character values on
the three OMR sub-fields to 0, 1, and 1 respectively.
Action: rr_Get("@DICT_VINDEX(Fuel Service,Child
Seat)")
Return value: 011
<DICT n="Options">
<W v="Nav System">Nav System</W>
<W v="Child Seat">Child Seat</W>
<W v="Fuel Service">Fuel Service</W>
</DICT>
@DICT_WINDEX(<csv_string>)
DESCRIPTION
Same as @DICT_VINDEX, except the argument for this special variable uses the dictionary words.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
441
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@EMPTY
DESCRIPTION
This special variable represents an empty string.
EXAMPLE
In this example, rrSet clears the custom page variable “MyVar”.
Action: rrSet("@EMPTY","@P.MyVar")
@PATH(<key>)
DESCRIPTION
Returns the full path for the specified identifier as defined in the [PATHS] section of “paths.ini,” located in
the application‟s dco_<app_name> folder. The functionality supported by this special variable has been
replaced by the Taskmaster Application Manager and the @APPPATH special variable.
EXAMPLE
In this example, the smart parameter returns the path to the APT application‟s image folder as defined in the
paths.ini file shown on the right.
Action: rr_Get("@PATH(VscanImageDir)")
[Paths]
Return value: C:\Datacap\APT\Images\Input
VscanImageDir=..\Images\Input
ProcessDir=..\dco_APT
RRXDir=..\..\dco_APT\Rules
FingerprintDir=..\Fingerprint
ExportDir=..\Export
@PILOT(<property_name>)
DESCRIPTION
Returns the value of the specified Pilot object property. The Pilot object is an internal data structure
configured by Taskmaster at the start of task execution. It contains the information required to execute the
task such as the batch folder (BATCHDIR), the batch ID (BATCHID), the input DCO file(DCOFILE), the
task priority (PRIORITY), etc.7
EXAMPLE
In this example, the smart parameter returns the input DCO file for the current task.
Action: rr_Get("PILOT(DCOFILE)")
Return value: C:\Datacap\TravelDocs\batches\20110004.004\Export.xml
Valid Pilot properties are BATCHID, BATCHDIR, OPERATOR, STATION, CHILDRENQUANTITY, PRIORITY,
CAPTION, PROJECTPATH, PAGESINBATCH, DOCSINBATCH, EXPECTEDPAGES, EXPECTEDDOCS,
ADJUSTEDPAGES, ADJUSTEDDOCS, JOBNAME, TASKNAME, FORMPATH, DCOFILE, JOBID, and TASKID.
7
442
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
SMART PARAMETER SPECIAL VARIABLE REFERENCE
@PROJECTDIR
DESCRIPTION
Returns the path and filename for the current task‟s Batch Pilot project (.bpp) file. The path is relative to the
application‟s dco_<app_name> folder.
EXAMPLE
Action: rr_Get("@PROJECTDIR")
Return value: \RRS_VScan.bpp
@PROCESSDIR
DESCRIPTION
Returns the full path to the application‟s dco_<app_name> folder.
EXAMPLE
Action: rr_Get("@PROCESSDIR")
Return value: C:\Datacap\TravelDocs\dco_TravelDocs
@STRING(<string_value>)
DESCRIPTION
Returns the value specified as a string.
EXAMPLE
Action: rr_Get("@STRING(MyString)")
Return value: MyString
@TIME(<format>)
DESCRIPTION
Returns the current time in the format specified (defaults to HH:MM:SS).
EXAMPLE
Action: rr_Get("@TIME(HH:MM)")
Return value: 10:45
@TYPE
DESCRIPTION
Returns the current object‟s type (Batch, Document, Page, or Field)
EXAMPLE
Action: rr_Get("@TYPE")
Return value: Page
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
443
Appendix B
STANDARD VARIABLE REFERENCE
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
445
STANDARD VARIABLE REFERENCE
VARIABLES USED ON ALL OBJECT TYPES
MAX_TYPES
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the maximum number of child object types that can be present at runtime to meet document
integrity requirements (0 = no maximum).
MESSAGE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used by many validation and lookup actions to report errors.
EXAMPLE
This example shows an error message written to the runtime hierarchy by a failed validation action.
<V n="MESSAGE">Failed Calculation:377.73=477.73</V>
MIN_TYPES
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the minimum number of child object types that must be present at runtime to meet document
integrity requirements.
rules
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Stores the object‟s rule map. This is typically established in Datacap Studio.
<in>
<r id="4"
<r id="2"
<r id="9"
<r id="3"
<r id="6"
</in>
Datacap Studio
446
rs="6" />
rs="13" />
rs="7" />
rs="14" />
rs="16" />
Setup DCO
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
EXAMPLE
This example shows how the XML above is stored in the Setup DCO file.
<V n="rules">&lt;in&gt;&lt;r id=&quot;4&quot; rs=&quot;6&quot; /&gt;&lt;r
id=&quot;2&quot; rs=&quot;13&quot; /&gt;&lt;r id=&quot;9&quot; rs=&quot;7&quot;
/&gt;&lt;r id=&quot;3&quot; rs=&quot;14&quot; /&gt;&lt;r id=&quot;6&quot;
rs=&quot;16&quot; /&gt;&lt;/in&gt;</V>
STATUS
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the object‟s status. This is initially zero or not present in the Setup DCO and is updated at runtime
depending on the object type and the task‟s rules and actions. Some common status values are shown below
with their conventional meanings. Specific applications may use other status values.
Value
Status
Applies to
Assigned by
49
ScanOK
Page objects
Scan tasks
0
OK
Batch, document, and page objects
Rulerunner tasks
1
Problem
2
Overridden
Pages that fail validation but are overridden by the
operator.
Verify tasks
48
RecogDoneOK
Page objects
Recognition tasks
0
OK
Field objects
Any task
1
Error
-1
Ignore
Can specify -1 in
Setup DCO
A few less common status values are shown below.
CannotFindAnchor=51
PageOnHold=72
DeleteApproved=77
DontNeedVerification=52
PageOverridden=73
ReviewPage=79
RescanPage=70
NoData=74
DeletedDoc=128
VerificationFailed=71
DeletedPage=75
ReviewDoc=145
EXAMPLE
<V n="STATUS">1</V>
TYPE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
In the Setup DCO, this specifies the object type (batch, document, page, or field), for example:
<V n="TYPE">Page</V>
This is updated in the Runtime DCO to specify the object name, for example:
<V n="TYPE">Rental_Agreement</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
447
STANDARD VARIABLE REFERENCE
BATCH VARIABLES
LAST_RR_PROFILE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the name of the last task profile that ran.
EXAMPLE
<V n="LAST_RR_TPROFILE">Rulerunner:m:eRun</V>
DOCUMENT VARIABLES
DD
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used by some scan tasks – this contains the value imprinted on the first page of the document by a scanner,
or an externally assigned document ID.
448
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
PAGE VARIABLES
Confidence
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the confidence level achieved during fingerprint matching.
DATAFILE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the name of the data XML file associated with this page. This is initially blank in the Setup DCO
and is assigned at runtime (for example, TM000001.xml).
Fingerprint Created
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies whether or not Taskmaster added a fingerprint for this page to the fingerprint library. This typically
happens when fingerprint matching fails and the argument to the FindFingerprint action is True.
EXAMPLE
<V n="Fingerprint Created">No</V>
Image_Offset
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the offset in pixels (x, y) between the runtime page image and the fingerprint image.
EXAMPLE
<V n="Image_Offset">-100,-100</V>
IMAGEFILE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the name of the associated runtime image file. This is blank in the Setup DCO and is assigned at
runtime.
EXAMPLE
<V n="IMAGEFILE">tm000010.tif</V>
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
449
STANDARD VARIABLE REFERENCE
PatternConfidence
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the confidence level achieved when pattern matching is used for page identification.
EXAMPLE
<V n="PatternConfidence">10</V>
PD
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used by isscan tasks only – specifies the page data string.
ScanSrcPath
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used by scan tasks only – specifies the full path and filename to the original image file.
EXAMPLE
<V n="ScanSrcPath">c:\datacap\apt\images\input\invoice_0001.tif</V>
TEMPLATE IMAGE
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the name of the matching fingerprint (CCO) file. The value is blank in the Setup DCO and is
assigned at runtime.
TemplateID
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the ID of the matching fingerprint.
EXAMPLE
<V n="TemplateID">567</V>
450
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
FIELD VARIABLES
Datatype
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies the type of characters the user can enter into the field in the Verify panel. Use the following codes to
specify the permitted data type:
0 = Alphanumeric
1 = Integer
2 = Float
3 = Date
4 = Time
5 = Currency
DensityString
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used for OMR zones – one character per zone, where the character represents the percentage of black pixels
within the zone according to the following formula:
Character‟s ASCII code value = Percentage black pixels + 48
For example, if the zone has 20% black pixels, the result is ASCII code 68 = „D‟.
EXAMPLE
This example represents a field with three OMR checkboxes.
<V n="DensityString">@@D</V>
DICT
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Used with selection fields and specifies the name of a dictionary (within this Setup DCO) containing a list of
possible values.
Index
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used in FormSpec to optionally specify the field‟s index.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
451
STANDARD VARIABLE REFERENCE
label
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
The value of this variable, if specified, defines the label that is displayed beside the field in verification panels
(Taskmaster Web and DotEdit only). If not specified, the field‟s “type” attribute is used as the label.
Lookup
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies a database lookup statement that gets executed during verification when the user clicks the
hyperlinked field label. A list of matching entries from the database specified by the dns attribute is displayed,
and the selected entry is used to populate the fields specified in the flist attribute.
EXAMPLE
The sample Lookup value below gets a list of car types from the lookup database specified in the application
configuration (.app) file and populates the “Car_Type” field with the selected value.
<SQL flist='Car_Type' dsn="*/lookupdb:cs">SELECT Car_Type FROM Car_Types</SQL>
The next example (from APT) gets a list of matching vendor names, zip codes, and vendor IDs from the
application‟s lookup database and displays the list to the user. The SQL statement uses the text in the
“Vendor” field as the search string so the user can, for example, enter the first letter and see a list of vendor
names that start with that letter. Upon selecting a vendor from the list, it populates the “Vendor,”
“Remittance_Zip,” and “Vendor_Number” fields with the information for the selected vendor.
<SQL flist='Vendor,Remittance_Zip,Vendor_Number' dsn="*/lookupdb:cs">SELECT
VendorName,VendorZip,VendorID FROM VendorTable WHERE VendorName LIKE
'@@Vendor@@%'</SQL>
Note in this example the special syntax that‟s required to reference a field value from the SQL statement:
@@Vendor@@%. Note also that WHERE <column> = '<value>' is not supported.
452
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
LookupEx
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 DotEdit
 Batch Pilot
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies a database lookup statement that gets executed during verification when the user leaves the field (for
example, by clicking on or tabbing to the next field). This is typically used to populate other fields based on
current field‟s value. The structure of the lookup statement is similar to that of the Lookup variable.
EXAMPLE
The sample LookupEx value below looks up the vendor name based on the ID in the “VendorID” field and
populates the “VendorName” field with the result.
<SQL flist='VendorName' dsn="*/lookupdb:cs">SELECT Vendor FROM VendorTable WHERE
VendorID LIKE '@@VendorID@@%'</SQL>
See the Lookup variable for additional information.
MaxLength
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies the maximum number of characters the user can type into the field in the Verify panel.
METRIC
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the size of the search region used during geometric pattern matching. The dimensions specified are
relative to the anchor field. For example, METRIC=200,300 creates a search region 200 pixels larger left and
right and 300 pixels larger above and below.
Search region
300
200
Anchor field
200
300
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
453
STANDARD VARIABLE REFERENCE
MultiLine
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
When set to „1‟ this displays the field as a multiline edit field in the DotEdit Verify pane.
MultiPunch
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Used when the field contains multiple OMR (checkbox) options. When set to „1‟ this indicates that multiple
selections are permitted; otherwise only one selection is permitted.
PatternMatch
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
When set to „1‟ this indicates that the field is an anchor field for pattern matching.
454
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
PictureString
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies which characters the user can type into the field according to the key below.
A
Alphabetic or space
a
Alphabetic, punctuation, or space
D
(Date) numeric digit, minus sign, decimal point (period), forward slash
F
(Float) numeric digit, minus sign, or decimal point (period)
f
Numeric digit or punctuation
L
Lower case alphabetic or space
l
Lower case alphabetic, punctuation, or space
N
Numeric digit
n
Uppercase alphabetic character, numeric digit, or space.
P
Punctuation or space
T
(time) numeric digit, A, P, M, or colon
U
Upper case alphabetic or space
u
Upper case alphabetic, punctuation, or space
X
Alphabetic, numeric digit, or space
x
Alphabetic, numeric digit, punctuation, or space
Z
Any character
#
Numeric digit or minus sign
For example:

PictureString="A" – Any upper/lower case letter or space allowed (no numbers or special characters)

PictureString="" – All characters are allowed (default)
 If you specify more than one character set code, the first code applies to the first character typed, the
second code applies to the second character typed, etc. The last code applies to all remaining characters
typed.
POS<TEMPLATEID>
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the position of the recognition zone for a specific fingerprint image (<templateID>) using the top left
and bottom right corners of the zone (x1, y1, x2, y2).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
455
STANDARD VARIABLE REFERENCE
Position
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the field‟s position using the top left and bottom right corners (x1, y1, x2, y2). This is initially
(0,0,0,0) in the Setup DCO and is populated at runtime.
ReadOnly
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
When set to „1‟ the field is read-only in the Verify panel and the user cannot change the preset or recognized
value.
RecogStatus
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Numeric code set by some recognition actions to indicate status of the operation.
RecogType
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the code for the recognition engine to use when reading data from this field. OMR (checkbox) fields
require RecogType=4 and these are the only fields that typically require this variable.
ReqConf
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Specifies the confidence level required for geometric pattern matching (0-10).
456
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
STANDARD VARIABLE REFERENCE
SELECT
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
Specifies a database lookup statement that converts an edit field in a verification panel into a drop-down list
with values from a database. A list of matching entries from the database specified by the dns attribute is
displayed, and the selected entry is used to populate the fields specified in the flist attribute.
EXAMPLE
The sample Lookup value below gets a list of car types from the lookup database specified in the application
configuration (.app) file and populates the drop-down list in the “Car_Type” field.
<SQL flist='Car_Type' dsn="*/lookupdb:cs">SELECT Car_Type FROM Car_Types</SQL>
Note that you can populate multiple fields simultaneously (see the Lookup variable for an example).
ShowChar
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
This variable works with the Taskmaster Web “prelayout.aspx” panel only. When set to „1‟ the character
zones are displayed in the field‟s image snippet in the Verify panel.
Sticky
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
If defined for any field (the value is not important), the Verify panel displays a checkbox to the left of the
field‟s label. If the user selects the checkbox, enters a value, and submits the page, the field value
autopopulates across all page of that type in the current batch.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
457
STANDARD VARIABLE REFERENCE
Text
APPLIES TO
 Setup DCO
 Runtime DCO
VERIFICATION PANELS THAT SUPPORT THIS VARIABLE
 Batch Pilot
 DotEdit
 TM Web (prelayout.aspx)
 TM Web (aindex.aspx)
DESCRIPTION
If defined for a field (the value must be empty), the field in the Verify panel becomes “sticky.” If the user
enters a value and submits a page, the field value autopopulates across all pages of that type in the current
batch. The field must be initially empty for autopopulation to work.
Zone_Offset
APPLIES TO
 Setup DCO
 Runtime DCO
DESCRIPTION
Used with anchor fields during geometric pattern matching and specifies the offset in pixels (x, y) between the
anchor‟s position on the runtime page image and its position in the fingerprint image.
458
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
Appendix C
ACTION LIBRARY SUMMARIES
Detailed help is available for all actions from within Datacap Studio. To access the embedded help, select an
action on the Actions Library tab and click the
button. Brief descriptions of the available actions are
provided in the sections that follow. For detailed information, including information about parameters, refer
to the embedded help.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
459
ACTION LIBRARY SUMMARIES
AUTODOC
Use these actions to create fingerprints and match pages to fingerprints using the local fingerprint service or a
web fingerprint service.
460
Action
Description
BlankPagesIDBySize
Assigns the specified page type to any page with an image file smaller
than the specified size (for example, to identify blank pages).
CalculateOffset
Sets the maximum image offset supported when matching pages to
fingerprints.
CreateFingerprint
Adds the current page’s image (TIF) file and the fingerprint (CCO) file to
the application’s fingerprint library.
DeleteFingerprint
Removes the current page’s image (TIF) file and the fingerprint (CCO)
file from the application’s fingerprint library.
FindBlackFingerprint
A special version of FindFingerprint used by the MClaims application.
FindFingerprint
Attempts to match the current page to a fingerprint in the fingerprint
library. You can optionally add the page to the fingerprint library if
there is no match.
FindTemplate
Finds the fingerprint that a preceding FindFingerprint action matched to
the current page.
MergeCCOs_ByType
Merges two or more fingerprint (CCO) files.
SetApplicationID
Specifies the application ID if using the fingerprint web service.
SetFilter_HostName
Limits fingerprint matching to the specified fingerprint class only.
SetFilter_PageType
Limits fingerprint matching to the specified page type only.
SetFingerprint
Sets the class and page type (optional) after creating a new fingerprint.
SetFingerprintDir
Specifies the application’s fingerprint directory.
SetFingerprintFailureThreshold
Specifies the percentage of fingerprint upload failures to ignore when
using the fingerprint web service.
SetFingerprintSearchArea
Specifies the portion of the current page used during fingerprint
matching.
SetFingerprintWebServiceURL
Specifies the URL for the web fingerprint service.
SetMaxOffset
Specifies the maximum offset for fingerprint matching.
SetProblemValue
Specifies the threshold value used during fingerprint matching.
SetTemplateDir
Specifies the fingerprint template folder.
UpdateFPStats
Updates the fingerprint statistics in the fingerprint database.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
BARCODE_P
Use these actions to locate and read barcodes.
Action
Description
Get2DCodeBP
Recognizes PDF-417 barcodes.
GetAllBarcodesBP
Searches the current page for all barcodes and writes them to the
current object’s GetBarcodeList variable.
GetBarcodeBP
Recognizes 1D or 2D barcodes.
GetDataMatrixCodeBP
Recognizes Data Matrix codes.
MatchBarcodeBP
Searches the current page for the specified barcode.
ReadBarCodeBP
Determines if the first barcode on the current page contains the
specified value.
BARCODE_X
Use these actions to locate and read barcodes.
Action
Description
GetBarCode
Gets the value of the first barcode on the current page or within the
current field zone.
MatchBarcode
Searches the current page for the specified barcode.
ReadBarCode
Determines if the first barcode on the current page contains the
specified value.
CCO2CCO
Use these actions to sort and filter (“normalize”) the words and lines in a fingerprint CCO file. This is only
required after full page recognition by an OCR or ICR action that does not automatically normalize the CCO
(OCR/S, OCR/A, and ICR/C normalize automatically).
Action
Description
Cco2cco
Sorts and filters (“normalizes”) the words and lines in a fingerprint
(CCO) file created by a recognition engine.
NormalizeCCO
Same as Cco2cco
SetMaxCharacterHeightAVG
Sets the maximum height of characters (by percentage over the
average) permitted by Cco2cco and NormalizeCCO actions.
SetMaxCharacterHeightTMM
Sets the maximum height of characters (by absolute value) permitted
by Cco2cco and NormalizeCCO actions.
COLORTOBW
Use these actions to change the color depth of an image.
Action
Description
C2BW_Convert
Changes the color depth of an image according to the conversion
settings specified using C2BW_SetAttributes
C2BW_SetAttributes
Specifies the color-to-BW conversion settings.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
461
ACTION LIBRARY SUMMARIES
CONVERT
Use these actions to convert a variety of electronic document files into TIFF image files.
EXCEL
 These actions work with files from Excel 2000 or later.
Action
Description
ExcelAutoFitColumns
Enables or disables automatic resizing of columns when converting
from Excel to TIFF.
ExcelAutoFitRows
Enables or disables automatic resizing of rows when converting from
Excel to TIFF.
ExcelOrientationToLandscape
Sets page orientation to landscape when converting from Excel to
TIFF.
ExcelOrientationToPortrait
Sets page orientation to portrait when converting from Excel to TIFF.
ExcelPrintBlankPage
Enables or disables creation of blank pages when converting from
Excel to TIFF.
ExcelPrintGridlines
Enables or disables gridlines when converting from Excel to TIFF.
ExcelPrintQuality
Specifies the image resolution when converting from Excel to TIFF.
ExcelScalingFactor
Specifies the scaling factor when converting from Excel to TIFF.
ExcelTiffCompression
Specifies the compression algorithm used when converting from Excel
to TIFF.
ExcelWorkbookToImage
Converts the pages in an Excel workbook to TIFF.
IMAGES
Action
Description
ImageFileTypesToConvert
Specifies the file extensions of image types to convert to TIFF.
ImageMonoThreshold
Specifies the threshold value used when converting images to TIFF.
ImageMonoType
Specifies the method used when converting color images to black and
white TIFFs.
ImageToTIFF
Converts an image file to TIFF.
OUTLOOK
462
Action
Description
OutlookMessageToAttachmentOnly
Converts email attachments only to TIFF.
OutlookMessageToImageAndAttachment
Converts the email message and any attachments to TIFF.
OutlookPrintQuality
Specifies the image resolution when converting from Outlook to TIFF.
OutlookTiffCompression
Specifies the compression algorithm used when converting from
Outlook to TIFF.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
PDF
Action
Description
PDFBitDepth
Specifies the bit depth (color depth) of the output image when
converting from PDF to TIFF.
PDFCompression
Specifies the compression algorithm used when converting from PDF to
TIFF.
PDFConversionMethod
Specifies the conversion method used when converting from PDF to
TIFF.
PDFDocumentToImage
Converts a PDF document to TIFF.
PDFGrayscale
Enables or disables grayscale output.
PDFHorizontalResolution
Specifies the horizontal resolution when converting from PDF to TIFF.
PDFQuality
Specifies the conversion quality when converting from PDF to TIFF.
PDFVerticalResolution
Specifies the vertical resolution when converting from PDF to TIFF.
TIFF
Action
Description
SplitMultipageTiff
Splits a multi-page TIFF file into individual pages.
WORD
Action
Description
WordDocumentToImage
Converts a Word document to TIFF.
WordPrintQuality
Specifies the output resolution when converting from Word to TIFF.
WordTiffCompression
Specifies the compression algorithm used when converting from Word
to TIFF.
DCCLIP
Use this action to clip a portion of each page image and save it as a separate TIF file.
Action
Description
dci_clipfield
Uses the current field’s recognition zone coordinates and clips the
specified region of each page to a separate TIFF file.
DCIMAGEFIX
Use these actions to clean up and enhance page images.
Action
Description
ImageEnhance
Performs image processing using preconfigured image enhancement
settings (typically from imagefix.ini).
LoadSettings
Loads the image enhancement settings used by ImageEnhance.
LoadSettings_FingerprintID
Loads fingerprint-specific image enhancement settings.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
463
ACTION LIBRARY SUMMARIES
DCO
Use these actions to set up and modify the runtime batch hierarchy (runtime DCO) information.
464
Action
Description
ChkConfidence
Assigns a page status based on whether or not all field data meets the
specified confidence level.
ChkDCOStatus
Determines if the current object’s “STATUS” matches the specified
value.
ChkDCOType
Determines if the current object’s “TYPE” matches the specified value.
ChkIntegrity
Determines if the batch meets the document integrity requirements as
specified in the document hierarchy (setup DCO).
ChkLastDCOType
Determines if the “TYPE” of the previous object matches the specified
value.
ClearAltText
Clears the character and confidence values from the specified position
in the field’s character array.
ClearDCO
Removes all child objects and variables from the current object.
CopyPD2DD
Assigns the value of the object’s PD (Page Data) variable to the object’s
DD (Doc Data) variable.
CreateDocuments
Arranges pages into documents based on the document integrity rules
specified in the application’s document hierarchy.
CreateFields
Creates a page data (XML) file for the current page.
DeleteFields
Removes all child fields and character data from the current object.
Removes the page data file if called from a page object.
PropagateToAltText
Copies the character and confidence values from the first position in
the field’s character array to the specified position.
SetDCOStatus
Sets the current object’s “STATUS” property.
SetDCOType
Sets the current object’s “TYPE” property.
SetDocStatus
Sets the current document’s “STATUS” property.
SetDocumentType
Sets the current document’s “TYPE” property.
SetFldConfidence
Sets the confidence level for all characters to the specified value.
SetPageFingerprintID
Sets the FingerprintID property of the current page.
SetPageStatus
Sets the current page’s “STATUS” property.
SetPageTemplateID
Sets the FingerprintID property of the current page.
SetPageType
Sets the current page’s “TYPE” property.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
DCPDF
Use these actions to convert PDF files to TIFF at the start of the workflow, or to convert the TIFF files in a
document into a PDF file.
Action
Description
dcpdf_CreateTiffFromPDF
Converts each page of a PDF file into a TIFF file.
dcpdf_CreateTiffFromPDF_CreateDocs
Same as dcpdf_CreateTiffFromPDF but also creates a runtime hierarchy.
dcpdf_MakePDFDoc
Creates a PDF document containing one or more pages of the current
document.
dcpdf_MaxSizeToReconvert
Causes dcpdf_CreateTiffFromPDF to use an alternate conversion
algorithm if the file resulting from the default algorithm exceeds the
size specified.
dcpdf_SetApplication
Specifies the Application ID property for PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_SetAuthor
Specifies the Author property for PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_SetImageBitcount
Sets the bit count (bits per pixel) for images in the PDF document
generated by dcpdf_MakePDFDoc.
dcpdf_SetImageCompression
Specifies the compression method to use when converting a PDF file to
TIFF.
dcpdf_SetImageGrayscale
Specifies how gray areas of a grayscale image are handled when
converting from PDF to TIFF.
dcpdf_SetImageQuality
Specifies the image quality to use when converting from PDF to TIFF.
dcpdf_SetImageResolution
Specifies the output resolution to use when converting from PDF to
TIFF.
dcpdf_SetKeywords
Specifies a keyword to assign to PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_SetProducer
Specifies the Producer property for PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_SetSubject
Specifies the Subject property for PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_SetTitle
Specifies the Title property for PDF documents generated by
dcpdf_MakePDFDoc.
dcpdf_UseAltConversionMethod
Causes dcpdf_CreateTiffFromPDF to use an alternate conversion
algorithm.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
465
ACTION LIBRARY SUMMARIES
DOCUMENTUM
Use these actions to upload documents to Documentum®.
EMAIL
Use these actions to compose and then send an email using CDOSYS and an SMTP server. These actions
also support Outlook, but this is not suitable for unattended operation since it requires the Outlook user to
be logged on to the computer and security prompts may be displayed for each message.
Action
Description
SendEMail
Sends an email using CDOSYS or Outlook. If using CDOSYS, you must
specify the SMTP server using SetMailServer.
SetAttachment
Specifies the path and filename for an attachment.
SetBlindCarbonCopyRcpts
Specifies any bcc recipients.
SetCarbonCopyRcpts
Specifies any cc recipients.
SetEmailBody
Specifies the email body.
SetMailServer
Specifies the IP or DNS address of the SMTP mail server if using CDOSYS.
SetRecipients
Specifies the email recipients.
SetSender
Specifies the sender’s email address if using CDOSYS.
SetSubject
Specifies the email subject.
EQUALIZE
Use this action to equalize the x and y resolutions of an image.
466
Action
Description
EqualizeUnbalancedImage
Converts an image with different x and y resolutions to one with the
same x and y resolutions.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
EWSMAIL
Use these actions to import image file attachments from an Exchange Server into the current batch using
Exchange Web Service (EWS). The ex_scan action polls the Exchange Server and imports image attachments
until the batch reaches the specified size (ex_max_docs) or until the wait time (ex_wait_time) expires.
Action
Description
ex_abort_time
Specifies how long to wait before aborting the current batch if, for
example, the Exchange Server is unavailable.
ex_done_folder
Specifies the mailbox subfolder to which email messages are moved
after the attachment has been imported.
ex_EMLOption
If enabled, the ex_scan action creates a one page document per email
containing the email message and its attachments – no separate image
pages are created.
ex_ews_version
Specifies the Exchange Server version.
ex_login
Connects to the Exchange Server using the specified account.
ex_logout
Closes the connection to the Exchange Server.
ex_max_docs
Specifies the maximum number of emails to include in a single batch.
ex_problem_folder
Specifies the mailbox subfolder to which email messages are moved if
the attachment is not one of the expected types.
ex_scan
Begins polling the Exchange Server for emails with valid image
attachments and imports the attachments into the current batch.
ex_types
Specifies valid image attachment file extensions.
ex_wait_time
Specifies the maximum time to wait for additional emails before closing
the current batch.
EXPORT
Use these actions to set up and write information to the export text file.
Action
Description
BatchVariable_ExportValue
Writes the value of the specified batch-level variable to the export file.
BlankFields
Writes the specified number of blank fields to the export file.
BlankLines
Writes the specified number of blank lines to the export file.
BPilot
Writes the value of the specified Batch Pilot property (BatchDir,
BatchID, etc.) to the export file.
CloseExportFile
Closes the export file.
DCOProperty
Writes the value of the specified DCO property (ID, STATUS, etc.) to the
export file.
DocumentVariable_ExportValue
Writes the value of the specified document-level variable to the export
file.
ExportAllFields
Writes all field values on the current page to the export file.
ExportFieldValue
Writes the value of the specified field to the export file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
467
ACTION LIBRARY SUMMARIES
ExportMYValue
Writes the value of the current field to the export file.
ExportSmartParameter
Writes the value of the specified smart parameter to the export file.
ExportToBatchDir
Used during export file setup to specify that the file be created in the
current batch directory.
Filler
Writes a string of identical filler characters to the export file.
FixedLenLJ
Writes the specified number of characters from the left of the specified
field (shorter values are padded with the SetFill character).
FixedLenRJ
Writes the specified number of characters from the right of the
specified field (shorter values are padded with the SetFill character).
GetDATE
Writes the current date to the export file.
GetProfileString
Writes a value from the specified settings (.ini) file to the export file.
GetTime
Writes the current time to the export file.
LineItem_AddElement
LineItem_BlankFields
LineItem_ClearElements
LineItem_ExportElements
LineItem_SmartParameter
468
NewLine
Writes a new line character to the export file.
PageVariable_ExportValue
Writes the value of the specified page variable to the export file.
ResetFieldVariables
Resets the export settings (FixedLength, Justified, ZeroFill, SpaceFill, and
IgnoreFieldStatus) to their default values.
SaveFilePathAsVariable
Writes the path and filename for the export file to the specified
variable.
SetCSV
Enables or disables comma separation when writing values to the
export file.
SetElementSeparator
Specifies a custom separator character to use when writing values to
the export file.
SetExportPath
Specifies the path to the export file location.
SetExtensionName
Specifies the extension for the export file (defaults to .txt).
SetFileName
Specifies the name for the export file.
SetFill
Specifies the filler character used to expand values to the maximum
length when writing them to the export file.
SetFixedLength
Specifies the length of values written to the export file (short values are
padded with the SetFill character).
SetIgnoreFieldStatus
Fields with the specified STATUS value are not written to the export file.
SetJustified
Right-justifies or left-justifies values written to the export file. Use
SetFixedLength to set the maximum length.
SetOMR_Separator
Specifies the separator character used when writing multiple values
from multi-punch OMR fields to the export file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
SetSpaceFill
Specifies the ASCII 32 space character as the filler value (see SetFill).
SetZeroFill
Specifies the ASCII 48 zero character as the filler value (see SetFill).
Text
Writes the specified string to the export file.
Variable_ExportValue
Writes the value of the specified variable to the export file.
Variable_IsValue
Determines if the specified variable has the specified value.
EXPORTDB
Use these actions to set up and write information to an export database. You build the record in memory
before committing it to the database using AddRecord.
Action
Description
AddRecord
Writes an assembled internal data record to the export database.
ExportBatchIDToColumn
Writes the current Batch ID to the specified column in the internal data
record.
ExportCloseConnection
Closes the connection to the export database.
ExportFieldToColumn
Writes the value of the specified field to the specified column in the
internal data record.
ExportNodeXMLToColumn
Writes the value of the specified property to the specified column in the
internal data record.
ExportOpenConnection
Opens a connection to the specified export database.
ExportPropertyToColumn
Writes the value of the specified property or variable to the specified
column in the internal data record.
ExportSmartParamToColumn
Writes the value of a smart parameter to the specified column in the
internal data record.
ExportToColumn
Writes the value of the current field to the specified column in the
internal data record.
SetTableName
Specifies the name of the table to which data is to be exported.
EXPORTXML
Use these actions to set up and write information to an export XML file.
Action
Description
xml_CommitNode
Closes the specified XML node.
xml_NewNode
Creates a new child node under the specified parent node, creating the
parent node if necessary.
xml_SaveFile
Commits all unsaved nodes and saves the XML file to disk.
xml_SetAttributeValue
Assigns and attribute and value to the specific node.
xml_SetExportPath
Specifies the path to the XML file storage location.
xml_SetFileName
Specifies the name for the export XML file (do not include the .xml
extension).
xml_SetNodeValue
Sets the value of the specified node.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
469
ACTION LIBRARY SUMMARIES
FILEIO
Use these actions to perform various file system functions.
CopyFile
Copies the specified file.
DeleteFile
Deletes the specified file.
GetFileSize
Writes the size of the specified file to the specified variable.
GetProfileString
Writes a key value from a settings file to the specified variable.
IsDirectoryPresent
Determines if the specified directory exists and optionally creates it.
IsFilePresent
Determines if the specified file exists.
IsFileReadOnly
Determines if the specified file’s read only attribute is set.
IsProfilePresent
Determines if a key within a section of a settings (.ini) file exists and has
a value assigned.
RenameFile
Renames or moves the specified file.
SetFileReadOnly
Sets or clears the read only attribute on the specified file.
SetProfileString
Writes a value to a settings (.ini) file.
SplitFileName
Splits a path and filename into its component parts.
FILENETP8
Use these actions to export data to a FileNet® P8 repository.
470
Action
Description
FNP8_CreateFolder
Creates a subfolder on a specified target class and object.
FNP8_Login
Sets the user ID and password for login to the P8 system.
FNP8_SetDestinationFolder
Sets the destination folder for the documents being uploaded.
FNP8_SetDocClassId
Sets the P8 document class for the uploaded files.
FNP8_SetDocTitle
Sets the document title for documents being uploaded.
FNP8_SetFileType
Assigns the file type for the files that are uploaded.
FNP8_SetLocale
Identifies the locale on the target P8 system.
FNP8_SetMultiValueProperty
Sets the values in a multi-value property.
FNP8_SetProperty
Sets the designated FileNet property to a specified value.
FNP8_SetRetry
Sets the number of automatic upload retries.
FNP8_SetTargetClassID
Sets the F8 document class for uploaded documents.
FNP8_SetTargetObjectID
Sets the name of the object store in which documents will be stored.
FNP8_SetTimeout
Sets the timeout for the FileNet P8 web service in milliseconds.
FNP8_SetUploadMode
Set the upload mode.
FNP8_SetURL
Sets the URL for the FileNet P8 web service.
FNP8_Upload
Uploads the Batch images to the FileNet P8 repository.
FNP8_UploadDir
Uploads all images from a specific source folder.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
FINGERPRINTMAINTENANCE
Use these actions to delete fingerprints from the application‟s fingerprint library.
Action
Description
CloseDatabase
Closes the connection to the fingerprint database and saves the document
hierarchy (setup DCO).
DeleteFingerprint
Deletes the specified fingerprint. This includes the CCO, TIFF, and FPXML
files, as well as the fingerprint database record and the position
information in the document hierarchy.
DeleteFingerprints
Deletes all fingerprints returned by the specified SQL statement.
OpenDatabase
Opens a connection to the fingerprint database.
SetFingerprintFolder
Specifies the folder containing the application’s fingerprint files.
FPXML
Use these actions to store zone coordinates in an external XML file instead of the document hierarchy (setup
DCO). This is useful for porting fingerprints between systems or to avoid making frequent modifications to
the document hierarchy.
Action
Description
ReadZonesFPX
Loads the zone position information for the current fingerprint.
SetDetailsAndLineitemPairFPX
Sets the type of the Details and Lineitem fields.
SetDirectoryFPX
Sets the location for the fingerprint XML files.
WriteZonesFPX
Writes the position information for all fields on the current page.
GRAYSCALE
Use this action to convert grayscale TIFF images to black-and-white.
Action
Description
ConvertGraytoBW
Converts grayscale TIFF files to black-and-white.
ICR_C
Use these actions to recognize constrained (unconnected) hand or machine printed characters. These actions
use the OpenText® RecoStarTM engine.
Action
Description
EnableLoggingICR_C
Enables or disables event logging for the ICR/C engine.
RecognizeFieldICR_C
Performs character recognition on the current field.
RecognizeFieldVoteICR_C
Performs recognition on the current field’s zone and compares the result
to the existing field value, character by character, raising the confidence
level when the characters match and lowering it when they don’t.
RecognizePageFields2CCO_ICR_C
Performs recognition for all zoned fields on the page.
RecognizePageFieldsICR_C
Performs recognition on all fields configured for ICR/C.
RecognizePageICR_C
Performs full page recognition using the ICR/C engine.
RecognizePageToPDFICR_C
Performs full page recognition and stores the results in a PDF file.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
471
ACTION LIBRARY SUMMARIES
IMAGECONVERT
Use these actions to combine image files or to convert image files to JPEG or TIFF.
Action
Description
AppendAllImages
Combines all of the current document’s page images into a single
continuous image file.
AppendAllImages_ByType
Same as AppendAllImages but includes only pages of the specified type.
AppendImage
Combines the current page image with the image specified using
AppendImage_StartAsNew.
AppendImage_StartAsNew
See AppendImage
ConvertToJPEG
Converts the current page image (BMP, GIF, PNG, or TIFF) to a JPEG file.
ConvertToTIFF
Converts the current page image (BMP, GIF, JPEG, or PNG) to a TIFF file.
SetChrominanceFactor
Specifies the amount of compression used when converting to JPEG.
SetDeleteOriginal
Specifies whether or not to delete the original image file after conversion.
SetGrayScale
Specifies whether ConvertToJPEG outputs a grayscale or color image.
SetLuminanceFactor
Specifies the image luminance or grayscale quality when converting to JPEG.
SetTIFFCompression
Specifies the compression format used when converting to TIFF.
IMAGEFIX
These actions are older versions of the DCImageFix action. Use the DCImageFix actions instead.
IMAIL
Use these actions to import image attachments from a mail server into the current batch using IMAP. The
ex_scan action polls the server and imports image attachments until the batch reaches the specified size
(ex_max_docs) or until the wait time (ex_wait_time) expires.
472
Action
Description
im_abort_time
Specifies how long to wait before aborting the current batch if, for example,
the server is unavailable.
im_done_folder
Specifies the mailbox folder to which email messages are moved after the
attachment has been imported.
im_login
Connects to the mail server using the specified account.
im_logout
Closes the connection to the mail server.
im_max_docs
Specifies the maximum number of emails to include in a single batch.
im_problem_folder
Specifies the mailbox folder to which email messages are moved if the
attachment is not one of the expected types.
im_scan
Begins polling the mail server for emails with valid image attachments and
imports the attachments into the current batch.
im_types
Specifies valid image attachment file extensions.
im_wait_time
Specifies the maximum time to wait for additional emails before closing the
current batch.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
IMPRINT
Use these actions to imprint text over an image, or for blackout or whiteout redactions.
Action
Description
AnnotateImage
Imprints the text you specify onto the current page image.
Redact
Redacts (using either black or white) the current field or the specified region of a
page image.
SetAdjustedWidth
Specifies the width of the imprinted text.
SetFontName
Specifies the font style for the imprinted text.
SetFontSize
Specifies the font size for the imprinted text.
SetOpaque
Specifies whether overlay rectangles are opaque (1) or transparent (0).
INTELLOCATE
Use these actions to update the existing field position information in the document hierarchy (setup DCO) or
to add position information for a new fingerprint. You can use these actions to generate fingerprints
automatically using a page image and the recognition zones specified manually during verification.
Action
Description
iloc_AdjustZones
Updates fingerprint-specific field position coordinates in the document
hierarchy (setup DCO) based on the locations defined for the current page.
iloc_AssignPageType
Assigns a page type value to a newly created fingerprint.
iloc_SetDetailZones
Writes the position coordinates of a new fingerprint’s Detail Line fields from a
page’s data file to the Pos properties of the corresponding Detail Line Field
objects of the document hierarchy.
iloc_SetZones
Writes the recognition zone coordinates from the current page to the “Pos”
properties of the corresponding fields in the document hierarchy.
IsPageDataMissing
Determines if the current page has a valid data file.
IOVERLAY
Use these actions to combine the current page image with a background image. You can use this to reapply a
form background that dropped out during scanning.
Action
Description
Overlay
Combines the current image with the image specified by SetBackgroundImage.
SetBackgroundImage
Specifies the image file used by the Overlay action.
SetDitheringBackground
Enables or disables dithering of the background image.
SetHaloBackground
Enables or disables a halo of white pixels around any black pixels from the
current image where they would otherwise touch pixels from the background,
making the foreground information easier to read.
LIVELINK
Use these actions to upload documents to Livelink®.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
473
ACTION LIBRARY SUMMARIES
LOCATE
Use these actions in combination with full text recognition to locate words or regular expressions on the page
and to navigate around the page by line or word. This library also includes a few format validation actions
(IsCurrency, IsDateValue, etc.). Most validation actions are in the Validate library.
474
Action
Description
DefaultValue
Sets the value (text) of the current field in the page data file to the value
specified.
FilterIt
Removes all instances of the specified character(s) from the located word.
FindDBList
Locates a word that matches one of a list of words obtained from a SQL query.
FindDBList_InZone
Same as FindDBList, except searches the current field only.
FindKeyList
Locates the first (or next) occurrence of a word or phrase that matches one of
the entries in a keyword file.
FindKeyList_InZone
Same as FindKeyList, except searches the current field only.
FindLastKeyList
Locates the last occurrence of a word or phrase that matches one of the entries
in a keyword file.
FindLastKeyList_InZone
Same as FindLastKeyList, except searches the current field only.
FindLastRegEx
Same as FindLastWord, except supports regular expressions.
FindLastRegEx_InZone
Same as FindLastWord_InZone, except supports regular expressions.
FindLastRegExList
Same as FindLastKeyList, except supports regular expressions.
FindLastRegExList_InZone
Same as FindLastRegExList, except searches the current field only.
FindLastWord
Locates the last occurrence of the specified word or phrase on the current page.
FindLastWord_InZone
Same as FindLastWord, except searches the current field only.
FindNextDBList
Same as FindDBList, except locates the next instance.
FindNextDBList_InZone
Same as FindNextDBList, except searches the current field only.
FindNextKeyList
Same as FindKeyList, except locates the next instance.
FindNextKeyList_InZone
Same as FindNextKeyList, except searches the current field only.
FindNextRegExList
Same as FindRegExList, except locates the next instance.
FindNextRegExList_InZone
Same as FindNextRegExList, except searches the current field only.
FindRegExList
Same as FindKeyList, except supports regular expressions.
FindRegExList_InZone
Same as FindRegExList, except searches the current field only.
GoAboveWord
Moves up the specified number of lines from the previously found word or
phrase.
GoBelowWord
Moves down the specified number of lines from the previously found word or
phrase.
GoDownLine
Moves down the specified number of lines from the previously found word or
phrase and selects the first word.
GoFirstLine
Moves to the first line of the current zone, or to the first line of the page if there
is no zone.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
GoFirstWord
Moves to the first word on the current line.
GoLastLine
Moves to the last line in the current zone, or to the last line on the page if a zone
is not present.
GoLastWord
Moves to the last word on the current line.
GoLeftWord
Moves the specified number of words to the left of the previously found word or
phrase.
GoRightWord
Moves the specified number of words to the right of the previously found word
or phrase.
GoUpLine
Moves up the specified number of lines from the previously found word or
phrase and selects the first word.
GroupWords
Groups words to the left and right of the previously found word if they are no
more than the specified number of character widths apart.
GroupWordsLEFT
Groups words to the left of the previously found word if they are no more than
the specified number of character widths apart.
GroupWordsRIGHT
Groups words to the right of the previously found word if they are no more than
the specified number of character widths apart.
IsAlpha
Determines if the specified percentage of characters in a located word are
alphabetic (defaults to 100%)
IsCurrency
Determines if the value of the located word is a currency value (is numeric and
includes a two-digit decimal amount).
IsDateValue
Determines if the value of the located word is in one of the supported date
formats.
IsNumber
Determines if the specified percentage of characters in a located word are
numeric (defaults to 100%)
IsValue
Determines if the value of the located word matches the value specified.
IsValue_RegEx
Determines if the value of the located word matches the regular expression
specified.
MaxLength
Determines if the number of characters in the located word is greater than or
equal to the number specified (returns True if it is).
MergeWordLF
Merges the located word with one or more words to the left, on the same line.
MergeWordRT
Merges the located word with one or more words to the right, on the same line.
MinLength
Determines if the number of characters in the located word is less than or equal
to the number specified (returns True if it is).
RegExFind
Same as WordFind, except supports regular expressions.
RegExFind_InZone
Same as RegExFind, except searches the current field only.
RegExFindNext
Same as WordFindNext, except supports regular expressions.
RegExFindNext_InZone
Same as RegExFindNext, except searches the current field only.
ScanRT
Moves the specified number of words to the right of the current word,
expanding the search area up and down slightly in case the word is a little above
or below the current word.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
475
ACTION LIBRARY SUMMARIES
SelectSnippet
SetRect
Sets the position and size of the current field in the page data file to the values
specified.
UpdateDCOField
Updates the position coordinates for the specified field in the page data file with
the position of the located word.
UpdateField
Updates the current field in the page data file with the value and position of the
located word.
ValueInField
Determines if any part of the located word matches the value specified.
ValueInField_Fuzzy
Uses “fuzzy” matching to determine if any part of the located word matches the
value specified.
ValueInField_RegEx
Determines if any part of the located word matches the regular expression
specified.
WordFind
Locates the first (or next) occurrence of the specified word or phrase on the
current page.
WordFind_InZone
Same as WordFind, except searches the current field only.
WordFind_Offset
Sets the value of the page’s Image_Offfset variable based on the difference in
the position of the specified word on the current page and on the matched
fingerprint image.
WordFindNext
Same as WordFind, except locates the next occurrence.
WordFindNext_InZone
Same as WordFindNext, except searches the current field only.
LOOKUP
Use these actions to validate field values using database lookups and populate fields with lookup results.
476
Action
Description
ClearLookupResults
Clears the results returned by the previous Lookup action.
CloseConnection
Closes an open lookup database connection.
ExecuteSQL
Executes a SQL statement on the lookup database. If a SELECT statement returns
one or more values, these are stored in an internal data record that you can
access using PopulateWithResult.
LookupCurrentValue
Queries the lookup database to determine if the current field’s value is in the
database.
LookupReturnValue
Queries the lookup database using the current field’s value. If the query returns
one or more values, these are stored in an internal data record that you can
access using PopulateWithResult.
OpenConnection
Uses a data source name or connection string to open a connection to a lookup
database.
PopulateWithResult
Populates the current field with a value returned by the previous ExecuteSQL or
LookupReturnValue action. If the previous action returned multiple values, you
can specify the one you want.
SmartSQL
Executes a SQL statement with support for smart parameters.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
NENU
NENU is Taskmaster‟s notification utility that allows you to set up batch monitoring, status notification, and
automatic deletion of completed batches.
LOGGING ACTIONS
Use these actions to write information to the NENU and Windows log files, and also to send emails. During
rule execution, NENU writes status messages to an internal log file as well as the Rulerunner log file:

The internal log file is maintained in memory and is used by the SendEmail action.

The Rulerunner log file is located in the NENU folder beneath the application‟s “batches” folder.
Action
Description
LogClear
Clears the internal log file (does not affect the Rulerunner log file)
LogSendEmail
Sends an e-mail with the internal log file to one or more email recipients.
LogWriteEventLog
Writes an information, warning, or error message to the Datacap section
of Windows event log.
LogWriteRecordSet
Writes the returned recordset to the NENU log file (see “Batch processing”
on page 477 for an example).
LogWriteSQLQuery
Writes the current query string to the NENU log file.
BATCH PROCESSING ACTIONS
Use these actions to execute the SQL query and perform actions on the selected database records and
(optionally) the corresponding batches.
The ProcessRunSqlQuery action executes the current query string and generates a recordset containing
information about all matching batches. For example, if you build a query using QuerySetStatus("hold")
and your query locates one batch with status “hold,” the returned recordset will contain the following
information aggregated from the tmbatch, qstats, and queue tables:
<rs:data xmlns:rs="urn:schemas-microsoft-com:rowset"><z:row pb_adjustdocs="0"
pb_adjustpages="0" pb_batch="20100260.001"
pb_batchdir="C:\Datacap\APT\batches\20100260.001" pb_expectdocs="0" pb_expectpgs="8"
pb_headertable=" " pb_ndocs="0" pb_needMeet="0" pb_pagefile="rrsvscan.xml"
pb_pages="8" qs_elaps="8" qs_op="admin" qs_qid="3" qs_start="2010-09-17T07:35:26"
qs_station="2" qs_stop="2010-09-17T07:35:34" qs_taskid="VScan" qs_tsorder="0"
qu_admDB="156" qu_batch="20100260.001" qu_counter="0" qu_done="2010-09-17T07:35:34"
qu_elaps="8" qu_id="3" qu_job="Demo" qu_lock="none" qu_parent="0" qu_priority="5"
qu_spawntype="0" qu_start="2010-09-17T07:35:26" qu_status="hold" qu_task="VScan"
qu_tsorder="0" xmlns:z="#RowsetSchema" /></rs:data>
The other actions in the batch processing category let you manipulate the batches identified in the query
results recordset.
Action
Description
ProcessChangeBatchStatus
Changes the status attribute (qs_status) for each batch in the returned
recordset. The database connection is then closed.
ProcessChangeBatchStatusOrder
Changes the status (qs_status) and order (qu_tsorder) attributes for each
batch in the returned recordset. The database connection is then closed.
ProcessChangeBatchStatusTaskOrder
Changes the status (qs_status), order (qu_tsorder), and task (qu_task)
attributes for each batch in the returned recordset. The database
connection is then closed.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
477
ACTION LIBRARY SUMMARIES
ProcessClearAuditTable
Deletes all records from the audit table in the application’s admin
database.
ProcessClearDebugTable
Deletes all records from the debug table in the application’s engine
database.
ProcessDeleteBatches
Deletes the folder specified by the batch attribute (pb_batchdir) for each
batch in the returned recordset.
ProcessMoveBatches
Moves the folder specified by the batch attribute (pb_batchdir) to a new
location for each batch in the returned recordset. Always move the
batches before moving the database records.
ProcessMoveDBRecords
For each batch identified in the returned recordset, this action moves the
associated database records from queue, qstats, and tmbatch to a
different application database. You can optionally delete the records from
the source database.
ProcessResetPendingOrNotify
This is typically used to reset aborted batches and changes the batch
status to “pending,” but only for the maximum number of times specified
by parameter 1. For example, if you specify ‘3,’ NENU will reset the batch
status a maximum of three times. After the third attempt it will send an
email to the specified recipient.
ProcessRunSqlQuery
Executes the current query string.
QUERY SETUP ACTIONS
Use these actions to build the query string that is executed against the application databases you connected to
in application setup.
Before you can execute a query against the application databases you must build a SQL query string. As you
execute the actions in this category, NENU appends the corresponding SQL code to the current query string.
For example, executing QuerySetStatus("hold") followed by QuerySetOperator("admin") generates
the following query string:
Select * FROM JobMonitor WHERE queue.qu_status IN ('hold') AND qstats.qs_op IN
('admin')
478
Action
Description
Table Queried
Column
QueryClear
Clears the current query string.
QuerySetAge
Specifies the batch age in the query string – returns
batches started or finished within the last x seconds,
or before or after a specific date.
queue
qu_start or
qu_done
QuerySetBranch
Specifies a minimum number of child batches –
returns batches with at least the number of children
specified.
queue
qu_id and
qu_parent
QuerySetDateRange
Specifies a date range in the query string – returns
batches started or finished within the specified date
range.
queue
qu_start or
qu_done
QuerySetGeneric
Specifies generic attribute value – the first
parameter specifies the column name and second
parameter specifies the desired value.
queue or qstats
-any-
QuerySetJobID
Specifies the job ID in the query string.
queue
qu_job
QuerySetOperator
Specifies the operator name in the query string.
qstats
qs_op
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
QuerySetPriority
Specifies the job priority name in the query string.
queue
qu_priority
QuerySetSeparator
Specifies the SQL date/time separator used in the
query string SQL query. The default value is ‘#’.
QuerySetStation
Specifies the station ID in the query string.
qstats
qs_station
QuerySetStatus
Specifies the task status in the query string.
queue
qu_status
QuerySetTaskID
Specifies the task ID in the query string.
queue
qu_task
REPORTING ACTIONS
Use these actions to write information to the report tables in the engine database for use by RV2.
Action
Description
ReportQueryTMUsage
Writes a record for each user connected to the current application to the
application’s reportUsers table.
For example, if you used SetupOpenApplication to connect to the APT
application, ReportQueryTMUsage writes information about current APT
users to the reportUsers table in the APT engine database.
ReportSetReportingTable
Reserved for future use (do not use)
ReportSetUsageDBTable
Reserved for future use (do not use)
APPLICATION SETUP ACTIONS
Use these actions to obtain connections to the admin and engine databases for the specified application.
Action
Description
SetApplication
Specifies the name of the application used by SetupOpenApplication.
SetUser
Specifies username used by SetupOpenApplication.
SetPassword
Specifies station ID used by SetupOpenApplication.
SetStation
Specifies station ID used by SetupOpenApplication.
SetServer
Specifies the name of the server used by SetupOpenApplication. This is
specified in the “name” attribute in the application’s configuration file.
This action is only required if the application you are connecting to has
multiple servers defined.
SetupDisconnectAll
Closes the connections to all Taskmaster databases opened by NENU
actions. Some NENU actions close the connection automatically after
executing a query.
SetupOpenApplication
Connects to an application using the information specified earlier with
SetApplication, SetUser, SetPassword, and SetStation.
SetupOpenApplicationEx
Connects to the application specified by the first parameter using the login
information specified by the remaining parameters. This provides an
alternative to SetupOpenApplication.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
479
ACTION LIBRARY SUMMARIES
OCR_A
Use these actions to perform text recognition using the ABBYY® FineReaderTM OCR engine.
480
Action
Description
EnableEngineLogsOCR_A
Enables logging of OCR engine messages to a file in the current batch
folder.
OCRA_ConvertImage2BW
Converts a color or grayscale image to black-and-white.
RecognizeBarcodeOCR_A
Retrieves barcode information.
RecognizeFieldOCR_A
Performs recognition on the current field’s zone and writes the result to
the runtime page data file.
RecognizeFieldVoteOCR_A
Performs recognition on the current field’s zone and compares the result
to the existing field value, character by character, raising the confidence
level when the characters match and lowering it when they don’t.
RecognizePageFieldsOCR_A
Performs recognition on all field zones defined for the current page and
writes the results to the runtime page data file.
RecognizePageOCR_A
Performs full page recognition and populates the page’s fingerprint (CCO)
file with the results.
RecognizeToPDFOCR_A
Performs full page recognition and combines the image and the
recognition results into a searchable PDF file.
ReleaseEngineOCR_A
Releases the ABBYY engine.
RotateImageOCR_A
Use in conjunction with Recog_Shared > RotateTIO to update the CCO file
with the correct position coordinates after image rotation.
SetAutoRotationOCR_A
Enables or disables automatic image orientation detection and rotation
(enabled by default).
SetConfCalculationParamsOCR_A
Specifies how to map ABBYY confidence values to Taskmaster confidence
values.
SetFastModeOCR_A
Enables or disables fast OCR, which increases recognition speed but may
also increase the error rate (disabled by default)
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
OCR_S
Use these actions to perform recognition using the Nuance® OmniPage® OCR engine.
Action
Description
RecognizeDocToPDF
Saves all pages in the current document as a PDF file.
RecognizeFieldOCR_S
Performs recognition on the current field’s zone and writes the result to
the runtime page file.
RecognizeFieldVoteOCR_S
Performs recognition on the current field’s zone and compares the result
to the existing field value, character by character, raising the confidence
level when the characters match and lowering it when they don’t.
RecognizePageFields2CCO_OCR_S
Performs recognition on all field zones defined for the current page and
writes the results to the page’s CCO file.
RecognizePageFieldsOCR_S
Performs recognition on all field zones defined for the current page and
writes the results to the runtime page data file.
RecognizePageOCR_S
Performs full page recognition and populates the page’s fingerprint (CCO)
file with the results.
RecognizePageOCR_S_2TextFile
Performs full page recognition and writes the recognition results to a text
file in the batch folder.
RecognizeToFile_OCR_S
Performs full page recognition and writes the recognition results to one of
several available output file types (.doc, .rtf, .html, etc.).
RecognizeToPDF
Performs full page recognition and saves the current page as a PDF file.
RotateImage
Use in conjunction with Recog_Shared > RotateTIO to update the CCO file
with the correct position coordinates after image rotation.
SetEngineTimeout
Specifies the number of seconds to wait before determining that a
recognition action is not running properly.
SetFastTradeOffOCR_S
Enables or disables fast OCR, which increases recognition speed but may
also increase the error rate (disabled by default)
SetLegacyDecompositionOCR_S
Enhances an image in preparation for recognition (typically not required).
OCR_SR
Use these actions to perform recognition using the updated Nuance OmniPage OCR engine.
Action
Description
RecognizeFieldOCR_S
See same action under OCR_S
RecognizeFieldVoteOCR_S
See same action under OCR_S
RecognizePageFieldsOCR_S
See same action under OCR_S
RecognizePageOCR_S
See same action under OCR_S
RecognizeToFileOCR_S
See same action under OCR_S
RecognizeToPDFOCR_S
See RecognizeToPDF action under OCR_S
RotateImageOCR_S
See RotateImage action under OCR_S
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
481
ACTION LIBRARY SUMMARIES
OPENTEXTFAXSERVER
Use these actions to import faxes from an OpenText® fax server.
PATTERNMATCH
Use these actions for pattern-based page identification and for page registration (alignment). Proper
registration is especially important when working with OMR checkboxes.
Action
Description
MatchPattern
Searches the current page image in a zone surrounding the current field for a
match to the anchor patterns specified for this field in the fingerprint library.
pat_RecogMatch_Id
Identifies a page using text-based pattern matching and then sets the page
type and image offsets. This action uses the patterns (anchor objects) from all
fingerprints in the fingerprint library.
pat_RegisterZones
Use after running PatternMatch_Identify if you have multiple anchors on the
page. This action adjusts the positions of all fields based on the positions of
multiple anchor fields.
pat_ReleasePageAnchors
Releases the page’s anchor fields.
PatternMatch_Fingerprint
Same as PatternMatch_Identify, except tries to find a match using only the
fingerprints specified.
PatternMatch_Identify
Identifies a page using geometric pattern matching, sets the page type and
image offsets, and creates the page data file.
PatternMatch_PageType
Same as PatternMatch_Identify, except tries to find a match using only
fingerprints of the page type specified.
SetMatchConfidence
Sets the confidence threshold for pattern matching.
PICTURE
Use these actions to perform field validations using “picture strings.” Picture strings define the permitted
format of a field such as a social security number, phone number, date, etc. A social security number, for
example, is always <three digits>-<two digits>-<four digits>, so you can define a picture string to represent this
format and then use it to make sure social security number fields contain conforming values.
482
Action
Description
PIC_ApplyPictureString
Validates the current field using the specified picture string (see the
embedded help for PIC_FormatFields for picture string details).
PIC_FilterFields
Validates the format of the current field (when called from a field) or all fields
on the current page (when called from a page) using the picture string stored
in the field’s PictureString variable.
PIC_FormatFields
Same as PIC_FilterFields, except for the action taken when replacing problem
characters (see the embedded help for details).
PIC_ReplaceBlankField
If the current field is blank, sets the field value to the character specified.
PIC_SetPictureCharacter
Lets you define up to 10 custom picture strings (0–9) that you can reference
from PIC_ApplyPictureString.
PIC_ValidateField
Validates the format of the current field using the picture string stored in the
field’s PictureString variable.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
RECOG_SHARED
Use these actions to perform a variety of fingerprint and recognition related functions, including recognizing
checkbox options and writing recognition results to the page data files.
Action
Description
AnalyzeImage
Converts the TIF image file for the current page to a fingerprint CCO file.
CCONormalization_OFF
Disables automatic CCO normalization after full-page recognition.
CreateTextFile
Creates a text file containing the recognized text for the current page.
IsBlankPage
Determines if a page is blank by counting the number of words in the CCO file
and comparing to the specified threshold.
RecogContinueOnFailure
Determines whether or not the batch will abort if recognition fails.
RecogOMRThreshold
Performs OMR checkbox recognition by counting black pixels within each
OMR region.
RotateTio
Determines if an image requires rotation and rotates by 90, 180, or 270
degrees if required. Use with RotateImageOCR_* to update the CCO.
SetAdjustFieldToChars
Optional setting you can specify before running SnapCCOtoDCO to adjust the
field positions the character positions.
SetFingerprintRecogPriority
Determines if full-page recognition creates a new CCO file or adds the
recognition data to the existing CCO file if there is one.
SetFullPageRecogArea
Specifies the area of the page on which to perform recognition when using
full-page recognition action (defaults to 1 for full page).
SetOutOfProcessRecogTimeout
Specifies the number of seconds to wait before determining that a
recognition action is no longer running properly.
SetRecogFailureRetryDelay
Specifies the number of seconds to wait before restarting a failed recognition
action.
SnapCCOtoDCO
Copies the recognition results for each field from the runtime fingerprint CCO
file to the document hierarchy (setup DCO).
SnapDCOtoCCO
Copies the recognition results stored in the runtime hierarchy to the runtime
fingerprint CCO file.
SnapFieldtoChars
Adjusts the position information for the current field in the page data file
(runtime DCO) based on the field’s character positions.
UseOutOfProcessRecog
Causes recognition to run within a separate process.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
483
ACTION LIBRARY SUMMARIES
RRUNNER
Use these actions to perform miscellaneous utility functions. This library includes actions to check batch
integrity, manipulate the values of fields and variables, raise condition flags, and control rule execution.
Action
Description
AbortOnError
Determines if a task that encounters an error should abort or continue.
CheckAllIntegrity
Checks all documents in the batch to determine if they meet the document
integrity requirements specified in the document hierarchy (setup DCO).
CheckDocCount
Determines if the number of documents in the runtime hierarchy matches
the expected document count as specified by the scan operator.
CheckPageCount
Determines if the number of pages in the runtime hierarchy matches the
expected page count as specified by the scan operator.
DebugMode_OFF
Disables enhanced logging.
DebugMode_ON
Enables enhanced logging (disabled by default).
GoToNextFunction
Returns False, causing the next function in the ruleset to execute.
PilotMessage_Clear
Removes the MESSAGE variable from the current object.
PilotMessage_Set
Assigns a message to the current object’s MESSAGE variable.
ProcessChildren
484
rr_AbortBatch
Stops processing the current batch and sets its status to Abort.
rr_Get
Assigns the value of the specified variable to the current object’s Text
property.
rr_WriteNode
Creates a separate XML data file for the current object.
rrAppend
Appends a value to the specified field.
rrCompare
Compares the values of two variables and returns True if they are the same.
rrCompareNot
Compares the values of two variables and returns False if they are the same.
rrCopy
Copies the value, confidence levels, and positions from one field to another.
rrPrepend
Inserts a value at the beginning of the specified field.
rrSet
Assigns a value to a variable or field.
SetReturnValue
Returns True or False depending on the parameter specified.
SetTaskStatus
Specifies the task status (Abort, Cancelled, Finished, Hold, or Pending)
returned to application when the current task completes.
SkipChildren
Prevents the running of rules on child objects of the current object.
Status_Preserve_OFF
Allows rules to changes the STATUS value of fields (for example, to assign a
problem status).
Status_Preserve_ON
Prevents rules from changing the STATUS value of fields.
Task_NumberOfSplits
Specifies the number of jobs the batch is sent to when a condition is raised
before returning to the main workflow.
Task_RaiseCondition
Specifies the group index and the index of the condition to raise from the list
on the Taskmaster Client Workflow tab (where 0 is the first condition).
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
SPEXPORT
Use these actions to upload documents to Microsoft SharePoint®.
SPLIT
Use this action to split a batch into smaller batches so each can be processed separately.
Action
Description
SplitBatch
Splits a batch into smaller batches based on the value of the specified
document-level variable.
TIFMERGE
Use these actions to combine individual TIFF images into a multi-page TIFF file. This is typically done at the
end of the workflow so you can upload or release the batch images as a single file.
TifMerge_MergeImages
Combines the batch’s individual TIFF images into a multi-page TIFF file.
TifMerge_MyImage
Adds an individual page image to the batch’s multi-page TIFF file.
TifMerge_SetFileName
Specifies the name of the batch’s multi-page TIFF file.
TifMerge_SetFilePath
Specifies the location of the batch’s multi-page TIFF file.
TM524
These actions are for compatibility with older versions of Taskmaster and are no longer used.
VALIDATIONS
Use these actions to check and modify the content and format of the current field‟s value. Other actions in
this library let you perform arithmetic calculations, assign and copy values, check variables, etc.
Action
Description
AddLeadingZeros
Inserts zeros at the beginning of a value so the character count equals the
number specified.
AddPaddingToLeft
Inserts spaces at the beginning of a value so the character count equals the
number specified.
AddPaddingToRight
Adds spaces to the end of a value so the character count equals the number
specified.
AddTrailingZeros
Adds zeros to the end of a value so the character count equals the number
specified.
AllowOnlyChars
Removes all characters that are not specified as permitted.
AppendFromField
Appends the value of the specified field to the current field.
AppendToField
Appends the value of the current field to the specified field.
AssignFieldDefault
Assigns a value to the current field.
Calculate
Returns True if the specified arithmetic expression is valid; returns False
otherwise.
CalculateFields
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
485
ACTION LIBRARY SUMMARIES
486
CheckSubFields
Determines if the values of the specified child fields meet the specified
criteria and deletes the parent field if they don’t.
CompareFields
Compares the values of two fields using the specified matching criteria that
supports “fuzzy” matching.
ConvertFieldToCurrency
Converts the value of the current field to a currency value.
ConvertToLowerCase
Converts any upper case characters in the current field to lower case.
ConvertToUpperCase
Converts any lower case characters in the current field to upper case.
CopyField
Copies the value of the current field to the specified field.
CopyFieldToField
Copies the value of the current field to the specified field.
DateStampField
Writes today’s date to the current field’s value.
DeleteAllAlpha
Deletes all alphabetic characters from the current field’s value.
DeleteAllMiscChars
Deletes all miscellaneous characters from the current field’s value.
DeleteAllNumeric
Deletes all numeric characters from the current field’s value.
DeleteAllPunct
Deletes all punctuation from the current field’s value.
DeleteAllSysChars
Removes all characters with ASCII values 0 through 31 from the current field’s
value.
DeleteChildType
Deletes all child objects of the specified type from the runtime hierarchy.
DeleteLCSpaces
Deletes all low confidence spaces from the current field’s value.
DeleteParentObj
Deletes the parent object of the current object.
DeleteSelectedChars
Deletes the specified characters from the current field’s value (a more flexible
version of FilterFieldSelectedChars).
EmptyFieldValue
Clears the value of the specified field.
FailRuleSet
Causes the entire ruleset to fail.
FieldContainsValue
Determines if the current field value contains some or all of the specified text
but no additional text.
FilterFieldSelectedChars
Deletes the specified characters from the current field’s value.
GetJobID
Assigns the current job ID to the current object’s Text property.
HasChildOfType
Determines if the current object has a child of the specified type.
InsertChars
Inserts one or more characters into the current field’s value.
InsertDecimalPoint
Inserts a decimal point into the current field’s value at the specified position.
IsFieldCurrency
Returns True if the current field’s value is numeric and includes a 2-digit
decimal amount.
IsFieldDate
Returns True if the current field’s value is a date.
IsFieldDateEqualOrAfter
Returns True if the current field’s value is a date equal to or after the
specified date.
IsFieldDateEqualOrBefore
Returns True if the current field’s value is a date equal to or before the
specified date.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
IsFieldDateUpToToday
Returns True if the current field’s value is a date equal to or before today’s
date.
IsFieldDateWithinRange
Returns True if the current field’s value is a date within the specified range.
IsFieldDateWithinXDays
Returns True if the current field’s value is a date within the specified number
of days of today’s date.
IsFieldDateWithReformat
Returns True if the current field’s value is a date and reformats the value to
the date format specified.
IsFieldEmpty
Returns True if the current field’s value is empty.
IsFieldFilled
Returns True if the current field’s value is not empty.
IsFieldGreaterOrEqual
Returns True if the current field’s value is greater than or equal to the value
specified.
IsFieldHidden
Returns True if the STATUS value of the current field is -1 (Hidden).
IsFieldLengthMax
Returns True if the length of the current field’s value is less than or equal to
the length specified.
IsFieldLengthMin
Returns True if the length of the current field’s value is greater than or equal
to the length specified.
IsFieldLessOrEqual
Returns True if the current field’s value is less than or equal to the value
specified.
IsFieldMatching
Returns True if the current field’s value matches the value specified.
IsFieldPercentAlpha
Returns True if the current field’s value contains at least the specified
percentage of alphabetic characters.
IsFieldPercentNonNumeric
Returns True if the current field’s value contains at least the specified
percentage of non-numeric characters.
IsFieldPercentNumeric
Returns True if the current field’s value contains at least the specified
percentage of numeric characters.
IsMatchingJobID
Returns True if the current job ID matches the value specified.
IsMaxOMRChecked
Returns True if the number of OMR boxes checked is less than or equal to the
number specified.
IsMinOMRChecked
Returns True if the number of OMR boxes checked is greater than or equal to
the number specified.
IsPatternInField
Returns True if the current field’s value contains the specified VBScript
regular expression.
IsSupportedImageFile
Returns True if the image file attached to the current page is in a supported
image format.
IsThisFieldEmpty
Returns True if the current field’s value is empty.
IsThisFieldFilled
Returns True if the current field’s value is not empty.
IsVariableEmpty
Returns True if the specified variable is empty.
IsVariableFilled
Returns True if the specified variable is not empty.
LeftTruncate
Deletes characters from the end of the current field’s value until the length
equals the length specified.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
487
ACTION LIBRARY SUMMARIES
MessageBox
Displays a message box with Yes and No buttons during verification. Do not
include this action in background tasks.
ParseMultilineAddress
Splits the current field’s value at each comma and saves the sub-strings to the
specified fields. Typically used for address fields.
ParseName
Splits the current field’s 3-word value and saves the sub-strings to the
specified fields. Typically used for name fields (first name, middle
name/initial, last name).
ReadCurrentObjVariable
488
ReadFieldValue
Assigns the value of the specified field to the current field.
ReadPageVariableValue
Assigns the value of the specified page variable to the current field.
ReplaceChars
Replaces the specified characters within the current field’s value with
different characters.
ReplaceValueAtPosition
Replaces the value at the specified position within the current field with a
replacement string, or deletes the value.
ResetField
Deletes the current field’s value and sets the field’s Position attribute to
0,0,0,0.
RightTruncate
Deletes characters from the beginning of the current field’s value until the
length equals the length specified.
SaveAsPageVariable
Assigns a value to the specified page variable.
SetIsOverrideable
If set to False, specifies that if validation on the current object fails, the
operator cannot override the error; if set to True, the operator can override
the error.
SplitFieldValueLeft
Deletes all characters to right of the first instance of the specified character.
SplitFieldValueRight
Deletes all characters to left of and including the first instance of the specified
character.
SumFields
Adds together the values of all child fields of the specified type and assigns
the result to the current field. You can also use this to sum the values of the
specified variable for all child objects.
TimeStampField
Assigns the current time to the current field.
TrimSpaces
Deletes spaces at the beginning and end of the current field’s value.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
VOTE
Use this action when performing multi-pass data entry to check if the first and second passes match.
Action
Description
VoteFld
Returns True if the data entered for by the first operator matches the data
entered by the second operator.
VSCAN
Use these actions to create a batch using existing image files.
Action
Description
AddDocument
Adds a document node to the runtime hierarchy. All “scanned” pages become
children of the document node.
CopyFile
When used before the Scan action, this tells the Scan action to also copy the
files to the specified location.
DeleteImageFile
When used before the Scan action, this tells the Scan action to delete the
source files from the “images” folder.
MoveImageFileToDirectory
When used before the Scan action, this tells the Scan action to also move the
files from the “images” folder to the specified location.
Scan
Copies image files from the location specified by SetSourceDirectory to the
batch folder and creates the runtime hierarchy.
SearchInSubdirectory
When used before the Scan action, this tells the Scan action to look in
subdirectories of the “images” folder.
SetFastMode
Provided only for compatibility with older applications – no longer used.
SetImageType
Specifies the extensions of image file types to use (defaults to .tif).
SetMailSourceFolder
When used before the Scan action, this tells the Scan action to search the
“images” folder for emails with image file attachments and copy these to the
batch folder.
SetMaxImageFiles
Limits the number of files the Scan action will copy.
SetMultiPageTiff
When used before the Scan action, this tells the Scan action to split multipage TIFF files into single page TIFFs.
SetSortOrder
Specifies the order in which image files are imported.
SetSourceDirectory
Specifies the path to the “images” folder. This action must precede the Scan
action.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
489
ACTION LIBRARY SUMMARIES
ZONES
These actions enable you to work with the zones that define the position of each field on the page. You can
read zone information from a fingerprint (CCO) file, update the zone position information in the runtime
hierarchy, locate the recognition text for a specific zone, assign values to fields in the runtime hierarchy, locate
repeating data blocks, and more.
Action
Description
AdjustZonesToImageOffset
490
AnchorPage
Locates the page’s “anchor” field and uses the anchor field’s coordinates to
offset the page’s other zoned fields.
CalculateLocalOffset
Calculates the page’s Image_Offset value by comparing the position of the
current field with the position defined in the fingerprint.
CreateBlockCCO
Creates a temporary CCO object from the current page’s CCO file, containing
the words and lines within the current field’s recognition zone. Assigns the
block’s location to the current page’s CCOBlock variable.
FindBlocks_WhiteSpace
Uses vertical white space to locate repeating blocks of data on the current
page and assigns each block to a different child field (similar in concept to a
line item grid).
FindDataBlocks
Uses start and end keywords to locate repeating blocks of data on the current
page and assigns each block to a different child field (similar in concept to a
line item grid).
FindRegExBlocks
Same as FindDataBlocks except support regular expressions.
FindZoneLineItems
Assembles a separate CCO containing only the line items on the current page.
GetZoneText
Populates the current field’s value in the page data file with the recognition
data from the CCO file that lies within the current zone boundaries.
InheritParentPosition
Assigns to the current object the position information from the parent object.
LoadBlockCCO
Loads the CCO object set up by a previous CreateBlockCCO action.
LoadZones
Same as ReadZones, except loads position information for the specified
fingerprint ID.
MCCOPositionAdjust
Combines additional pages of a multi-page document into the CCO file for the
first page.
MergeZones
Merges the zone region defined for the current field with the zones of the
specified fields.
PadZone
Increases the size of the current field’s zone by the values specified.
PopulateZNField
Populates the current field’s value in the page data file with the recognition
data from the CCO file that lies within the field’s zone boundaries.
PopulateZNLineItemField
Populates the page data file with the recognized value in the zone for the
current line item child field. Assign this action to each line item child field in
the document hierarchy.
ReadZones
Loads the position information for the current object and its children from
the document hierarchy (setup DCO) and adjusts the position of each object
using any “Image_Offset” value.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
ACTION LIBRARY SUMMARIES
RegisterPage
Locates specially marked fields and adjusts their vertical zone positions to
compensate for any drift.
ScanDetails
Searches a line item grid object looking for line items. Assign this action to the
grid region in the document hierarchy.
ScanDetailsByLines
Searches a line item grid object looking for line items, where each line item
consists of the specified number of rows.
ScanDetailsByVSpace
Searches a line item grid object looking for line items, where each line item is
defined as being the specified height (in pixels).
ScanLineItem
Searches a line item object looking for fields. Assign this action to each line
item in the document hierarchy.
SetEOL
Specifies the “end of line” character used to separate lines within a zone
containing multiple lines (uses a space character by default).
SetEOL_CRLF
Specifies that “carriage return” and “line feed” characters are used to
separate lines within a zone containing multiple lines of text.
ZoneBOTTOM_ImageBottom
Uses the lower boundary of the current image to specify the bottom
boundary of the current field’s zone position in the page data file.
ZoneBOTTOM_LowerBound
Uses the lower boundary of the current word* to specify the bottom
boundary of the current field’s zone position in the page data file.
ZoneBOTTOM_UpperBound
Uses the upper boundary of the current word* to specify the bottom
boundary of the current field’s zone position in the page data file.
ZoneImage_SaveAs
Saves the region of the page image defined by the current object’s zone
boundaries as a separate image file.
ZoneLEFT_ImageLeft
Uses the left boundary of the current image to specify the left boundary of
the current field’s zone position in the page data file.
ZoneLEFT_LeftBound
Uses the left boundary of the current word* to specify the left boundary of
the current field’s zone position in the page data file.
ZoneLEFT_RightBound
Uses the right boundary of the current word* to specify the left boundary of
the current field’s zone position in the page data file.
ZoneRIGHT_ImageRight
Uses the right boundary of the current image to specify the right boundary of
the current field’s zone position in the page data file.
ZoneRIGHT_LeftBound
Uses the left boundary of the current word* to specify the right boundary of
the current field’s zone position in the page data file.
ZoneRIGHT_RightBound
Uses the right boundary of the current word* to specify the right boundary of
the current field’s zone position in the page data file.
ZoneTOP_ImageTop
Uses the top boundary of the current image to specify the top boundary of
the current field’s zone position in the page data file.
ZoneTOP_LowerBound
Uses the lower boundary of the current word* to specify the top boundary of
the current field’s zone position in the page data file.
ZoneTOP_UpperBound
Uses the upper boundary of the current word* to specify the top boundary of
the current field’s zone position in the page data file.
* The “current word” is the word located using the actions in the Locate library.
© COPYRIGHT IBM CORPORATION 1994, 2011 (REVISION 06172011)
491
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM
Intellectual Property Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing
Legal and Intellectual Property Law
IBM Japan Ltd.
1623-14, Shimotsuruma, Yamato-shi
Kanagawa 242-8502 Japan
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law: INTERNATIONAL
BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. Some states do not allow disclaimer of express or implied warranties in
certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors.
Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.
1
IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact:
IBM Corporation
Office 4360
One Rogers Street
Cambridge, MA 02142
U.S.A.
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
BalloonWindows, portions are Copyright (c) 2002-2003 Peter Rilling
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation, registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM or other
companies. A current list of IBM trademarks is available on the Web at
http://www.ibm.com/legal/copytrade.shtml
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered
trademarks or trademarks of Adobe Systems Incorporated in the United States,
and/or other countries.
Microsoft and Windows are trademarks of Microsoft Corporation in the United
States, other countries, or both.
Other product and service names might be trademarks of IBM or other companies.
2
Application Development using IBM Datacap Taskmaster Capture
Product Number: 5725-C15
Printed in USA
SC19-3251-00
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising